Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to compare two different files line by line in Python?
Comparing two files line by line is a common task in Python programming. This tutorial explores different methods to compare files, from basic line-by-line comparison to using specialized modules like filecmp and difflib.
Basic Line-by-Line Comparison
The simplest approach uses the open() function to read both files and compare them manually. This method gives you full control over the comparison logic.
Example
Here's how to compare two files and identify differences ?
# Create sample files for demonstration
with open('file1.txt', 'w') as f1:
f1.write("Line 1\nLine 2\nLine 3\nLine 4")
with open('file2.txt', 'w') as f2:
f2.write("Line 1\nDifferent Line 2\nLine 3\nLine 5")
# Compare files line by line
with open('file1.txt', 'r') as file1, open('file2.txt', 'r') as file2:
lines1 = file1.readlines()
lines2 = file2.readlines()
max_lines = max(len(lines1), len(lines2))
for i in range(max_lines):
line1 = lines1[i].strip() if i < len(lines1) else ""
line2 = lines2[i].strip() if i < len(lines2) else ""
if line1 != line2:
print(f"Line {i+1} doesn't match:")
print(f"File1: {line1}")
print(f"File2: {line2}")
print("-" * 30)
Line 2 doesn't match: File1: Line 2 File2: Different Line 2 ------------------------------ Line 4 doesn't match: File1: Line 4 File2: Line 5 ------------------------------
Using the filecmp Module
The filecmp module provides a quick way to check if two files are identical. The filecmp.cmp() function returns True if files match, False otherwise.
Example
Here's how to use filecmp for file comparison ?
import filecmp
# Create test files
with open('identical1.txt', 'w') as f:
f.write("Same content\nSecond line")
with open('identical2.txt', 'w') as f:
f.write("Same content\nSecond line")
with open('different.txt', 'w') as f:
f.write("Different content\nSecond line")
def compare_files(file1_path, file2_path):
result = filecmp.cmp(file1_path, file2_path)
if result:
print(f"{file1_path} and {file2_path} are identical.")
else:
print(f"{file1_path} and {file2_path} are different.")
# Test comparisons
compare_files('identical1.txt', 'identical2.txt')
compare_files('identical1.txt', 'different.txt')
identical1.txt and identical2.txt are identical. identical1.txt and different.txt are different.
Using the difflib Module
The difflib module offers advanced text comparison features, providing detailed information about differences between files.
Using unified_diff()
The unified_diff() function creates a unified diff output similar to Unix diff tools ?
import difflib
# Create sample files
with open('original.txt', 'w') as f:
f.write("import os\nimport sys\nprint('Hello World')")
with open('modified.txt', 'w') as f:
f.write("import os\nimport datetime\nprint('Hello Python')")
with open('original.txt', 'r') as file1, open('modified.txt', 'r') as file2:
file1_lines = file1.readlines()
file2_lines = file2.readlines()
diff = difflib.unified_diff(
file1_lines, file2_lines,
fromfile='original.txt',
tofile='modified.txt',
lineterm=''
)
for line in diff:
print(line)
--- original.txt
+++ modified.txt
@@ -1,3 +1,3 @@
import os
-import sys
-print('Hello World')
+import datetime
+print('Hello Python')
Using Differ Class
The Differ class provides a more detailed line-by-line comparison ?
from difflib import Differ
# Create test files
with open('text1.txt', 'w') as f:
f.write("Python programming\nData analysis\nMachine learning")
with open('text2.txt', 'w') as f:
f.write("Python programming\nWeb development\nMachine learning")
with open('text1.txt', 'r') as file1, open('text2.txt', 'r') as file2:
differ = Differ()
lines1 = file1.readlines()
lines2 = file2.readlines()
for line in differ.compare(lines1, lines2):
if line.startswith('- ') or line.startswith('+ '):
print(line.strip())
- Data analysis + Web development
Comparison of Methods
| Method | Best For | Output Detail | Performance |
|---|---|---|---|
| Manual comparison | Custom logic | Custom format | Good |
filecmp.cmp() |
Quick identity check | Boolean result | Excellent |
difflib.unified_diff() |
Unix-style diffs | Unified format | Good |
difflib.Differ |
Detailed analysis | Line-by-line details | Moderate |
Conclusion
Python offers multiple approaches for file comparison. Use filecmp for quick identity checks, difflib for detailed difference analysis, and manual comparison for custom requirements. Choose the method based on your specific needs and performance requirements.
