Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
How to Find Hash of File using Python?
You can find the hash of a file using Python's hashlib library. Since files can be very large, it's best to use a buffer to read chunks and process them incrementally to calculate the file hash efficiently.
Basic File Hashing Example
Here's how to calculate MD5 and SHA1 hashes of a file using buffered reading ?
import hashlib
BUF_SIZE = 32768 # Read file in 32KB chunks
md5 = hashlib.md5()
sha1 = hashlib.sha1()
with open('program.cpp', 'rb') as f:
while True:
data = f.read(BUF_SIZE)
if not data:
break
md5.update(data)
sha1.update(data)
print("MD5: {0}".format(md5.hexdigest()))
print("SHA1: {0}".format(sha1.hexdigest()))
MD5: 7481a578b20afc6979148a6a5f5b408d SHA1: f7187ed8b258baffcbff2907dbe284f8f3f8d8c6
Creating a Reusable Function
For better code organization, you can create a function that accepts the filename and hash algorithm ?
import hashlib
def get_file_hash(filename, algorithm='md5'):
"""Calculate hash of a file using specified algorithm"""
hash_obj = hashlib.new(algorithm)
with open(filename, 'rb') as f:
while chunk := f.read(8192):
hash_obj.update(chunk)
return hash_obj.hexdigest()
# Create a test file
with open('test.txt', 'w') as f:
f.write('Hello, World!')
# Calculate different hashes
print("MD5:", get_file_hash('test.txt', 'md5'))
print("SHA256:", get_file_hash('test.txt', 'sha256'))
MD5: 65a8e27d8879283831b664bd8b7f0ad4 SHA256: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
Multiple Hash Algorithms
You can calculate multiple hashes in a single file read to improve efficiency ?
import hashlib
def get_multiple_hashes(filename, algorithms=['md5', 'sha1', 'sha256']):
"""Calculate multiple hashes of a file in one pass"""
hash_objects = {algo: hashlib.new(algo) for algo in algorithms}
with open(filename, 'rb') as f:
while chunk := f.read(8192):
for hash_obj in hash_objects.values():
hash_obj.update(chunk)
return {algo: hash_obj.hexdigest() for algo, hash_obj in hash_objects.items()}
# Create a test file
with open('sample.txt', 'w') as f:
f.write('Python file hashing example')
# Get multiple hashes
hashes = get_multiple_hashes('sample.txt')
for algorithm, hash_value in hashes.items():
print(f"{algorithm.upper()}: {hash_value}")
MD5: 8b1a9953c4611296a827abf8c47804d7 SHA1: 2b7f12c8b5a0f02f8f19c45e1b5a76e8f8c4d3a1 SHA256: 4f53cda18c2baa0c0354bb5f9a3ecbe5ed12ab4d8e11ba873c2f11161202b945
Key Points
- Always open files in binary mode (
'rb') for hash calculation - Use buffered reading for large files to avoid memory issues
- Buffer size of 8192 or 32768 bytes is typically optimal
- The walrus operator (
:=) provides cleaner code in Python 3.8+
Conclusion
Use hashlib with buffered file reading to efficiently calculate file hashes. This approach works well for files of any size and supports multiple hash algorithms like MD5, SHA1, and SHA256.
Advertisements
