Article Categories

Selected Reading

How to Find Hash of File using Python?

Python Programming

You can find the hash of a file using Python's hashlib library. Since files can be very large, it's best to use a buffer to read chunks and process them incrementally to calculate the file hash efficiently.

Basic File Hashing Example

Here's how to calculate MD5 and SHA1 hashes of a file using buffered reading ?

import hashlib

BUF_SIZE = 32768  # Read file in 32KB chunks
md5 = hashlib.md5()
sha1 = hashlib.sha1()

with open('program.cpp', 'rb') as f:
    while True:
        data = f.read(BUF_SIZE)
        if not data:
            break
        md5.update(data)
        sha1.update(data)

print("MD5: {0}".format(md5.hexdigest()))
print("SHA1: {0}".format(sha1.hexdigest()))

MD5: 7481a578b20afc6979148a6a5f5b408d
SHA1: f7187ed8b258baffcbff2907dbe284f8f3f8d8c6

Creating a Reusable Function

For better code organization, you can create a function that accepts the filename and hash algorithm ?

import hashlib

def get_file_hash(filename, algorithm='md5'):
    """Calculate hash of a file using specified algorithm"""
    hash_obj = hashlib.new(algorithm)
    
    with open(filename, 'rb') as f:
        while chunk := f.read(8192):
            hash_obj.update(chunk)
    
    return hash_obj.hexdigest()

# Create a test file
with open('test.txt', 'w') as f:
    f.write('Hello, World!')

# Calculate different hashes
print("MD5:", get_file_hash('test.txt', 'md5'))
print("SHA256:", get_file_hash('test.txt', 'sha256'))

MD5: 65a8e27d8879283831b664bd8b7f0ad4
SHA256: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f

Multiple Hash Algorithms

You can calculate multiple hashes in a single file read to improve efficiency ?

import hashlib

def get_multiple_hashes(filename, algorithms=['md5', 'sha1', 'sha256']):
    """Calculate multiple hashes of a file in one pass"""
    hash_objects = {algo: hashlib.new(algo) for algo in algorithms}
    
    with open(filename, 'rb') as f:
        while chunk := f.read(8192):
            for hash_obj in hash_objects.values():
                hash_obj.update(chunk)
    
    return {algo: hash_obj.hexdigest() for algo, hash_obj in hash_objects.items()}

# Create a test file
with open('sample.txt', 'w') as f:
    f.write('Python file hashing example')

# Get multiple hashes
hashes = get_multiple_hashes('sample.txt')
for algorithm, hash_value in hashes.items():
    print(f"{algorithm.upper()}: {hash_value}")

MD5: 8b1a9953c4611296a827abf8c47804d7
SHA1: 2b7f12c8b5a0f02f8f19c45e1b5a76e8f8c4d3a1
SHA256: 4f53cda18c2baa0c0354bb5f9a3ecbe5ed12ab4d8e11ba873c2f11161202b945

Key Points

Always open files in binary mode ('rb') for hash calculation
Use buffered reading for large files to avoid memory issues
Buffer size of 8192 or 32768 bytes is typically optimal
The walrus operator (:=) provides cleaner code in Python 3.8+

Conclusion

Use hashlib with buffered file reading to efficiently calculate file hashes. This approach works well for files of any size and supports multiple hash algorithms like MD5, SHA1, and SHA256.

Arjun Thakur

Updated on: 2026-03-24T20:42:44+05:30

3K+ Views

Previous Next