Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Unicode Byte Order Mark (BOM) character in HTML5 document.
A byte order mark (BOM) consists of the character code U+FEFF at the beginning of a data stream, where it can be used as a signature defining the byte order and encoding form, primarily of unmarked plaintext files. In HTML5 documents, the BOM can help browsers automatically detect the file's encoding.
What is a BOM?
Many Windows programs (including Windows Notepad) add the bytes 0xEF, 0xBB, 0xBF at the start of any document saved as UTF-8. This is the UTF-8 encoding of the Unicode byte order mark (BOM), and is commonly referred to as a UTF-8 BOM even though it is not relevant to byte order.
BOM in HTML5 Documents
For HTML5 documents, you can use a Unicode Byte Order Mark (BOM) character at the start of the file. This character provides a signature for the encoding used and helps browsers identify the correct character encoding automatically.
Example: HTML5 Document with BOM
// File starts with BOM bytes: EF BB BF (UTF-8 BOM)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document with BOM</title>
</head>
<body>
<h1>Hello World! ?</h1>
<p>This document includes special characters: café, naïve</p>
</body>
</html>
BOM Detection in JavaScript
You can detect if a file starts with a BOM using JavaScript:
function hasBOM(text) {
// Check for UTF-8 BOM (EF BB BF)
return text.charCodeAt(0) === 0xFEFF;
}
// Example with BOM
let textWithBOM = '\uFEFF<!DOCTYPE html><html>...';
let textWithoutBOM = '<!DOCTYPE html><html>...';
console.log('Has BOM:', hasBOM(textWithBOM));
console.log('No BOM:', hasBOM(textWithoutBOM));
Has BOM: true No BOM: false
Common BOM Types
| Encoding | BOM Bytes | Usage |
|---|---|---|
| UTF-8 | EF BB BF | Most common for web documents |
| UTF-16 LE | FF FE | Windows applications |
| UTF-16 BE | FE FF | Network protocols |
Best Practices
While BOMs can help with encoding detection, modern HTML5 documents should explicitly declare their encoding using the <meta charset="UTF-8"> tag. This approach is more reliable and doesn't depend on BOM presence.
Conclusion
The BOM provides automatic encoding detection for HTML5 documents, but explicit charset declaration is the recommended approach. Use BOMs when working with legacy systems or when automatic encoding detection is required.
