Article Categories

Selected Reading

Uniquely identify files before uploading with the HTML5 file API

Javascript Web Development Front End Scripts

While making a file uploader using HTML5 file API, we want to be sure that no duplicate files are uploaded based on actual data. This prevents wasting storage space and bandwidth by uploading identical files multiple times.

Calculating a hash with MD5 is not an efficient method as all that happens on the client side and is time-consuming. There is actually no perfect shortcut for this task.

Method 1: Basic File Properties Check

The simplest approach is to compare basic file properties like name, size, and last modified date:

<input type="file" id="fileInput" multiple>
<div id="output"></div>

<script>
document.getElementById('fileInput').addEventListener('change', function(event) {
    const files = Array.from(event.target.files);
    const fileSignatures = new Set();
    const duplicates = [];
    
    files.forEach(file => {
        const signature = `${file.name}-${file.size}-${file.lastModified}`;
        
        if (fileSignatures.has(signature)) {
            duplicates.push(file.name);
        } else {
            fileSignatures.add(signature);
        }
    });
    
    const output = document.getElementById('output');
    if (duplicates.length > 0) {
        output.innerHTML = `<p>Duplicate files detected: ${duplicates.join(', ')}</p>`;
    } else {
        output.innerHTML = `<p>No duplicates found. ${files.length} unique files selected.</p>`;
    }
});
</script>

Method 2: Content-Based Hash (Partial)

For more accurate duplicate detection, we can create a hash from a subset of file blocks using a predefined window:

<input type="file" id="fileInput2" multiple>
<div id="output2"></div>

<script>
async function createFileHash(file) {
    const chunkSize = 8192; // Read first 8KB
    const chunk = file.slice(0, Math.min(chunkSize, file.size));
    const arrayBuffer = await chunk.arrayBuffer();
    
    // Simple hash function (for demonstration)
    let hash = 0;
    const bytes = new Uint8Array(arrayBuffer);
    for (let i = 0; i < bytes.length; i++) {
        hash = ((hash << 5) - hash + bytes[i]) & 0xffffffff;
    }
    return hash.toString(36) + file.size;
}

document.getElementById('fileInput2').addEventListener('change', async function(event) {
    const files = Array.from(event.target.files);
    const fileHashes = new Map();
    const duplicates = [];
    
    for (const file of files) {
        const hash = await createFileHash(file);
        
        if (fileHashes.has(hash)) {
            duplicates.push(`${file.name} (duplicate of ${fileHashes.get(hash)})`);
        } else {
            fileHashes.set(hash, file.name);
        }
    }
    
    const output = document.getElementById('output2');
    if (duplicates.length > 0) {
        output.innerHTML = `<p>Content-based duplicates: ${duplicates.join(', ')}</p>`;
    } else {
        output.innerHTML = `<p>No content duplicates found in ${files.length} files.</p>`;
    }
});
</script>

Method 3: Complete File Reading

If we need to identify duplicate files with no confusion, we have to read the complete content of each file and compare it:

async function getFullFileHash(file) {
    const arrayBuffer = await file.arrayBuffer();
    const hashBuffer = await crypto.subtle.digest('SHA-256', arrayBuffer);
    const hashArray = Array.from(new Uint8Array(hashBuffer));
    return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}

// Note: This approach is computationally expensive for large files
// and should be used carefully in production environments

Comparison

Method	Accuracy	Performance	Best For
Basic Properties	Low	Fast	Quick screening
Partial Hash	Medium	Good	Balanced approach
Full Content Hash	High	Slow	Critical accuracy needs

Conclusion

For most web applications, using partial file hashing provides a good balance between accuracy and performance. Choose the method based on your specific requirements for duplicate detection accuracy versus processing speed.

Nitya Raut

Updated on: 2026-03-15T23:18:59+05:30

266 Views

Kickstart Your Career

Get certified by completing the course

Get Started

Previous Next