Fastest Method to Check If Two Files Have Same Contents

In today's era of technological advancements, use of computers and various electronic devices has become an essential part of our daily routine. We often find ourselves in situations where we need to compare two files to check if they contain same content or not. This can be a daunting task, especially if files are large in size, and traditional comparison methods can be quite time-consuming. In this article, we will explore fastest methods to check if two files have same contents.

What is a File Comparison?

A file comparison is a process of comparing two or more files to determine whether they are identical or different in content. This is often used in software development to check differences between code versions, but can also be useful in everyday life, for instance, when comparing backup files or two versions of same document. To make this comparison, there are various file comparison tools available, but some methods are faster than others.

Method 1: File Size Comparison

One of simplest and fastest ways to check if two files have same contents is to compare their file sizes. This method assumes that if two files have same size, then they are likely to have same content. However, it is not always a guarantee, as files of different formats or encoding can have same size but different content.

File Size Comparison Process File A 1024 bytes File B 1024 bytes Size Comparison Sizes Match!

Example

Suppose we have two files A and B. We can check their sizes using ls -l command in Linux or dir command in Windows.

ls -l A B
-rw-r--r-- 1 user user 1024 Jun 10 12:22 A
-rw-r--r-- 1 user user 1024 Jun 10 12:22 B

In this example, both files A and B have same size of 1024 bytes, indicating that they might have same content. However, this is not always case, and further checks may be needed.

Method 2: Hash Comparison

Hash comparison is a popular and fast method to check if two files have same content. A hash function takes a file and generates a fixed-size string, known as a hash value, that represents content of file. If two files have same hash value, it is almost certain that they have same content. There are various hash functions available, such as MD5, SHA-1, and SHA-256.

Hash Comparison Process File A File B MD5 Hash MD5 Hash 4e7a8b6413e949896bbbfb3eaa3d3c8f 4e7a8b6413e949896bbbfb3eaa3d3c8f

Example

We can check hash values using md5sum command in Linux or certutil -hashfile command in Windows.

md5sum A B
4e7a8b6413e949896bbbfb3eaa3d3c8f  A
4e7a8b6413e949896bbbfb3eaa3d3c8f  B

In this example, both files A and B have same hash value, indicating that they have same content.

Method 3: Binary Comparison

Binary comparison is a straightforward and reliable method to check if two files have same content. It involves comparing binary representation of files byte by byte, and if there is a difference in any byte, files are considered different. This method can be time-consuming for large files, but it is one of most accurate methods.

Example

We can use cmp command in Linux or fc command in Windows to perform binary comparison.

cmp A B
(no output - files are identical)

If files are different, the command will show the first differing byte position.

Advanced Methods

Memory-mapped File Comparison

Memory-mapped file comparison maps file contents into memory and compares them byte by byte. It is faster than disk-based reading but requires more memory.

import mmap

with open("A", "rb") as file_a, open("B", "rb") as file_b:
    with mmap.mmap(file_a.fileno(), 0, access=mmap.ACCESS_READ) as mmap_a, \
         mmap.mmap(file_b.fileno(), 0, access=mmap.ACCESS_READ) as mmap_b:
        if mmap_a == mmap_b:
            print("The files are identical.")
        else:
            print("The files are different.")

Chunked Reading Comparison

For very large files, reading in chunks can be more memory-efficient than loading entire files.

def compare_files_chunked(file1, file2, chunk_size=8192):
    with open(file1, "rb") as f1, open(file2, "rb") as f2:
        while True:
            chunk1 = f1.read(chunk_size)
            chunk2 = f2.read(chunk_size)
            if chunk1 != chunk2:
                return False
            if not chunk1:  # End of both files
                return True

# Usage
if compare_files_chunked("A", "B"):
    print("Files are identical")
else:
    print("Files are different")

Performance Comparison

Method Speed Accuracy Memory Usage Best For
File Size Very Fast Low Very Low Quick initial check
Hash (MD5/SHA) Fast Very High Low Most cases
Binary Comparison Medium Perfect Low Small to medium files
Memory-mapped Fast Perfect High Large files with enough RAM
Chunked Reading Medium Perfect Very Low Very large files

Conclusion

Hash comparison using MD5 or SHA algorithms provides the best balance of speed, accuracy, and resource usage for most file comparison scenarios. For an optimal approach, start with file size comparison as a quick filter, then use hash comparison for reliable results. Binary comparison should be reserved for cases requiring absolute certainty.

Updated on: 2026-03-17T09:01:38+05:30

18K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements