Article Categories

Selected Reading

Unicode Byte Order Mark (BOM) character in HTML5 document.

Javascript Web Development Front End Scripts

A byte order mark (BOM) consists of the character code U+FEFF at the beginning of a data stream, where it can be used as a signature defining the byte order and encoding form, primarily of unmarked plaintext files. In HTML5 documents, the BOM can help browsers automatically detect the file's encoding.

What is a BOM?

Many Windows programs (including Windows Notepad) add the bytes 0xEF, 0xBB, 0xBF at the start of any document saved as UTF-8. This is the UTF-8 encoding of the Unicode byte order mark (BOM), and is commonly referred to as a UTF-8 BOM even though it is not relevant to byte order.

BOM in HTML5 Documents

For HTML5 documents, you can use a Unicode Byte Order Mark (BOM) character at the start of the file. This character provides a signature for the encoding used and helps browsers identify the correct character encoding automatically.

Example: HTML5 Document with BOM

// File starts with BOM bytes: EF BB BF (UTF-8 BOM)
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document with BOM</title>
</head>
<body>
    <h1>Hello World! ?</h1>
    <p>This document includes special characters: café, naïve</p>
</body>
</html>

BOM Detection in JavaScript

You can detect if a file starts with a BOM using JavaScript:

function hasBOM(text) {
    // Check for UTF-8 BOM (EF BB BF)
    return text.charCodeAt(0) === 0xFEFF;
}

// Example with BOM
let textWithBOM = '\uFEFF<!DOCTYPE html><html>...';
let textWithoutBOM = '<!DOCTYPE html><html>...';

console.log('Has BOM:', hasBOM(textWithBOM));
console.log('No BOM:', hasBOM(textWithoutBOM));

Has BOM: true
No BOM: false

Common BOM Types

Encoding	BOM Bytes	Usage
UTF-8	EF BB BF	Most common for web documents
UTF-16 LE	FF FE	Windows applications
UTF-16 BE	FE FF	Network protocols

Best Practices

While BOMs can help with encoding detection, modern HTML5 documents should explicitly declare their encoding using the <meta charset="UTF-8"> tag. This approach is more reliable and doesn't depend on BOM presence.

Conclusion

The BOM provides automatic encoding detection for HTML5 documents, but explicit charset declaration is the recommended approach. Use BOMs when working with legacy systems or when automatic encoding detection is required.

Samual Sam

Updated on: 2026-03-15T23:18:59+05:30

815 Views

Previous Next