XML Reserved Markup Characters
Introduction
The open angle bracket (<) and ampersand (&) are reserved for markup. Element tags must begin with the < character, and entities and character references in an xml document must begin with the & character, which means that if you use any if these characters for any other purpose than mentioned here, it will generate an error. When an xml parser encounters the <character, it assumes an element or other markup statement is about to start. If it does not find the characters it is expecting next, i.e. a xml name followed directly by a right angle bracket or a comment or a processing instruction, it generates an error. Similarly, when a xml parser encounters an & character, it assumes it has encountered an entity. Here is a list of five predefined entities in xml:
< | Generates the <character in character data |
> | Generates the >character character data |
& | Generates the &character character data |
' | Generates the 'character character data |
" | Generates the " character character data |
If the characters following the & character don't consist of characters that help to build one of the preceding lists of entities, the xml parser will assume the entity was defined in the DTD or is a character reference. If the parser does not find that definition or the proper character reference, it will generate an error.
Character references are similar in appearance to entity references, but, depending on the encoding, they don't need to be declared and they refer to specific characters (such as accented letters) using a special numbering system called Unicode. Under the heading of Entities, we have included a chart of character references.
Using predefined entities in place of <, >, & and " characters is called escaping a character. This just means you are guaranteeing their safety so that you actually do end up with the characters you are hoping for. Notice that > character. Always escape this character even if you are pretty sure there is no less that character (<).
Examine the following lines of code to see if you can identify the legal and illegal uses of the < and & characters. We will identify the correct answers by referencing the code's line numbers in the paragraphs that follow.
Previous:
XML Proper nesting of elements
Next:
Well Formed XML
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics