XML - Character Entities



This chapter describes the XML Character Entities. Before we understand the Character Entities, let us first understand what an XML entity is.

As put by W3 Consortium the definition of an entity is as follows −

"The document entity serves as the root of the entity tree and a starting-point for an XML processor".

This means, entities are the placeholders in XML. These can be declared in the document prolog or in a DTD. There are different types of entities and in this chapter we will discuss Character Entity.

Both, HTML and XML, have some symbols reserved for their use, which cannot be used as content in XML code. For example, < and > signs are used for opening and closing XML tags. To display these special characters, the character entities are used.

There are few special characters or symbols which are not available to be typed directly from the keyboard. Character Entities can also be used to display those symbols/special characters.

Types of Character Entities

There are three types of character entities −

  • Predefined Character Entities
  • Numbered Character Entities
  • Named Character Entities

Predefined Character Entities

They are introduced to avoid the ambiguity while using some symbols. For example, an ambiguity is observed when less than ( < ) or greater than ( > ) symbol is used with the angle tag (<>). Character entities are basically used to delimit tags in XML. Following is a list of pre-defined character entities from XML specification. These can be used to express characters without ambiguity.

  • Ampersand − &amp;

  • Single quote − &apos;

  • Greater than − &gt;

  • Less than − &lt;

  • Double quote − &quot;

Numeric Character Entities

The numeric reference is used to refer to a character entity. Numeric reference can either be in decimal or hexadecimal format. As there are thousands of numeric references available, these are a bit hard to remember. Numeric reference refers to the character by its number in the Unicode character set.

General syntax for decimal numeric reference is −

&# decimal number ;

General syntax for hexadecimal numeric reference is −

&#x Hexadecimal number ;

The following table lists some predefined character entities with their numeric values −

Entity name Character Decimal reference Hexadecimal reference
quot " &#34; &#x22;
amp & &#38; &#x26;
apos ' &#39; &#x27;
lt < &#60; &#x3C;
gt > &#62; &#x3E;

Named Character Entity

As it is hard to remember the numeric characters, the most preferred type of character entity is the named character entity. Here, each entity is identified with a name.

For example −

  • 'Aacute' represents capital character with acute accent.

  • 'ugrave' represents the small with grave accent.

Advertisements