
- HTML5 Tutorial
- HTML5 - Home
- HTML5 - Overview
- HTML5 - Syntax
- HTML5 - Attributes
- HTML5 - Events
- HTML5 - Web Forms 2.0
- HTML5 - SVG
- HTML5 - MathML
- HTML5 - Web Storage
- HTML5 - Web SQL Database
- HTML5 - Server-Sent Events
- HTML5 - WebSocket
- HTML5 - Canvas
- HTML5 - Audio & Video
- HTML5 - Geolocation
- HTML5 - Microdata
- HTML5 - Drag & drop
- HTML5 - Web Workers
- HTML5 - IndexedDB
- HTML5 - Web Messaging
- HTML5 - Web CORS
- HTML5 - Web RTC
- HTML5 Demo
- HTML5 - Web Storage
- HTML5 - Server Sent Events
- HTML5 - Canvas
- HTML5 - Audio Players
- HTML5 - Video Players
- HTML5 - Geo-Location
- HTML5 - Drag and Drop
- HTML5 - Web Worker
- HTML5 - Web slide Desk
- HTML5 Tools
- HTML5 - SVG Generator
- HTML5 - MathML
- HTML5 - Velocity Draw
- HTML5 - QR Code
- HTML5 - Validator.nu Validation
- HTML5 - Modernizr
- HTML5 - Validation
- HTML5 - Online Editor
- HTML5 - Color Code Builder
- HTML5 Useful References
- HTML5 - Quick Guide
- HTML5 - Color Names
- HTML5 - Fonts Reference
- HTML5 - URL Encoding
- HTML5 - Entities
- HTML5 - Char Encodings
- HTML5 Tag Reference
- HTML5 - Question and Answers
- HTML5 - Tags Reference
- HTML5 - Deprecated Tags
- HTML5 - New Tags
- HTML5 Resources
- HTML5 - Useful Resources
- HTML5 - Discussion
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
HTML5 - Character Encodings
A character encoding is a method of converting bytes into characters. To validate or display an HTML document, a program must choose a character encoding. HTML 5 authors have three means of setting the character encoding −
HTTP Content-Type Header
If you are writing cgi or similar program then you would use HTTP Content-Type header to set any character encoding.
Following is the simple example −
print "Content-Type: text/html; charset=utf-8\r\n";
The <meta> element
You can use a <meta> element with a charset attribute that specifies the encoding within the first 512 bytes of the HTML5 document.
Following is the simplified example −
<meta charset="UTF-8">
Above syntax replaces the need for <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> although that syntax is still allowed.
Unicode Byte Order Mark (BOM)
A byte order mark (BOM) consists of the character code U+FEFF at the beginning of a data stream, where it can be used as a signature defining the byte order and encoding form, primarily of unmarked plaintext files.
Many Windows programs (including Windows Notepad) add the bytes 0xEF, 0xBB, 0xBF at the start of any document saved as UTF-8. This is the UTF-8 encoding of the Unicode byte order mark (BOM), and is commonly referred to as a UTF-8 BOM even though it is not relevant to byte order.
For HTML5 document, you can use a Unicode Byte Order Mark (BOM) character at the start of the file. This character provides a signature for the encoding used.