XHTML vs HTML
Due to the fact that XHTML is an XML application, certain practices that were perfectly legal in SGML-based HTML 4 must be changed. You already have seen XHTML syntax in previous chapter, so differences between XHTML and HTML are very obvious:
XHTML documents must be well-formed:
Well-formedness is a new concept introduced by XML. Essentially this means that all elements must have closing tags and that all the elements must nest properly.
CORRECT: nested elements.
<p>Here is an emphasized <em>paragraph</em>.</p>
INCORRECT: overlapping elements.
<p>Here is an emphasized <em>paragraph.</p></em>
Element and attribute names must be in lower case:
XHTML documents must use lower case for all HTML element and attribute names. This difference is necessary because XHTML document is assumed to be an XML document and XML is case-sensitive e.g. <li> and <LI> are different tags.
For all elements, end tags are required:
In HTML certain elements were permitted to omit the end tag. But XML does not allow end tags to be omitted.
CORRECT: terminated elements.
<p>Here is a paragraph.</p><p>here is another paragraph.</p>|
INCORRECT: unterminated elements.
<p>Here is a paragraph.<p>here is another paragraph.|
Attribute values must always be quoted:
All attribute values must be quoted, even those which appear to be numeric.
CORRECT: quoted attribute values
INCORRECT: unquoted attribute values
XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such as compact and checked cannot occur in elements without their value being specified.
CORRECT: unminimized attributes
INCORRECT: minimized attributes
White Space handling in attribute values:
When user browser process attributes, they do so as follows:
- Strip leading and trailing white space.
- Map sequences of one or more white space characters (including line breaks) to a single inter-word space.
Script and Style elements:
In XHTML, the script and style elements should not have < and & characters directly, if they exist then they will be treated as the start of markup. The entities such as < and & will be recognized as entity references by the XML processor to < and & respectively.
Wrapping the content of the script or style element within a CDATA marked section avoids the expansion of these entities.
... unescaped VB or Java Script here... ...
An alternative is to use external script and style documents.
The elements with id and name attributes:
XHTML recommends to replace name attribute by id attribute. Note that in XHTML 1.0, the name attribute of these elements is formally deprecated, and will be removed in a subsequent version of XHTML.
Attributes with pre-defined value sets:
HTML and XHTML both have some attributes that have pre-defined and limited sets of values for example type attribute of the input element. In HTML and XML, these are called enumerated attributes. Under HTML 4, the interpretation of these values was case-insensitive, so a value of TEXT was equivalent to a value of text.
Under XHTML, the interpretation of these values is case-sensitive so all of these values are defined in lower-case.
Entity references as hex values:
HTML and XML both permit references to characters by using hexadecimal value. In HTML these references could be made using either &#Xnn; or &#xnn; and they are valid but In XHTML documents, you must use the lower-case version only like &#xnn;.
The <html> Element is must:
All XHTML elements must be nested within the <html> root element. All other elements can have sub elements. Sub elements must be in pairs and correctly nested within their parent element. The basic document structure is:
<head> ... </head>
<body> ... </body>