Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Program to Find Unique Lines From Two Text Files
Finding unique lines between two text files is a common task in data processing and file comparison. Python provides several approaches to accomplish this, from basic iteration to using specialized libraries. In this article, we'll explore three different methods to identify lines that exist in one file but not the other.
Sample Data
For our examples, we'll work with two text files containing course names. Here's the content comparison ?
| Course Name | In a.txt | In b.txt |
|---|---|---|
| Introduction to Computers | Yes | Yes |
| Introduction to Programming Concepts | Yes | Yes |
| Introduction to Windows, its Features, Application | Yes | Yes |
| C++ Programming | No | Yes |
| Computer Organization Principles | Yes | Yes |
| Database Management Systems | Yes | Yes |
| Introduction to Embedded Systems | Yes | Yes |
| Fundamentals of PHP | Yes | Yes |
| Mathematical Foundation For Computer Science | Yes | No |
| Java Programming | Yes | Yes |
| Functions | Yes | Yes |
| Arrays | Yes | Yes |
| Disk Operating System | Yes | Yes |
| Introduction to Number system and codes | No | Yes |
| Data Mining | Yes | Yes |
| Software Engineering | Yes | No |
| Computer Networks | Yes | Yes |
| Control Structures | Yes | Yes |
Using Basic File Iteration
This approach reads both files and compares lines using simple iteration ?
# Create sample files for demonstration
with open('a.txt', 'w') as f:
f.write("Introduction to Computers\n")
f.write("Mathematical Foundation For Computer Science\n")
f.write("Software Engineering\n")
f.write("Java Programming\n")
with open('b.txt', 'w') as f:
f.write("Introduction to Computers\n")
f.write("C++ Programming\n")
f.write("Introduction to Number system and codes\n")
f.write("Java Programming\n")
# Find unique lines
af = open('a.txt', 'r')
afile = af.readlines()
bf = open('b.txt', 'r')
bfile = bf.readlines()
unique_lines = []
# Lines in b.txt but not in a.txt
for line in bfile:
if line not in afile:
unique_lines.append(line)
# Lines in a.txt but not in b.txt
for line in afile:
if line not in bfile:
unique_lines.append(line)
# Write results
with open('result1.txt', 'w') as result_file:
for line in unique_lines:
result_file.write(line)
# Display results
print("Unique lines found:")
for line in unique_lines:
print(line.strip())
af.close()
bf.close()
Unique lines found: C++ Programming Introduction to Number system and codes Mathematical Foundation For Computer Science Software Engineering
Using difflib Library
The difflib module provides tools for comparing sequences and can identify differences with detailed formatting ?
from difflib import Differ
# Read files
with open('a.txt', 'r') as af:
afile = af.readlines()
with open('b.txt', 'r') as bf:
bfile = bf.readlines()
# Compare files using Differ
differ = Differ()
result = list(differ.compare(afile, bfile))
# Write comparison result
with open('result2.txt', 'w') as result_file:
for line in result:
result_file.write(line)
# Display formatted output
print("File comparison (- means only in a.txt, + means only in b.txt):")
for line in result:
if line.startswith('- ') or line.startswith('+ '):
print(line.strip())
File comparison (- means only in a.txt, + means only in b.txt): + C++ Programming + Introduction to Number system and codes - Mathematical Foundation For Computer Science - Software Engineering
Using Set Operations
This method uses Python sets to efficiently remove common lines and retain unique ones ?
# Read first file into a set
with open('a.txt', 'r') as af:
afile = set(af.read().splitlines())
# Process second file
unique_b = []
unique_lines = []
with open('b.txt', 'r') as bf:
bfile = set(bf.read().splitlines())
# Find lines unique to each file
unique_a = afile - bfile # Lines only in a.txt
unique_b = bfile - afile # Lines only in b.txt
# Combine all unique lines
all_unique = unique_a.union(unique_b)
print("Lines only in a.txt:")
for line in sorted(unique_a):
print(f" {line}")
print("\nLines only in b.txt:")
for line in sorted(unique_b):
print(f" {line}")
# Write results to file
with open('result3.txt', 'w') as result_file:
for line in sorted(all_unique):
result_file.write(line + '\n')
Lines only in a.txt: Mathematical Foundation For Computer Science Software Engineering Lines only in b.txt: C++ Programming Introduction to Number system and codes
Comparison
| Method | Performance | Output Format | Best For |
|---|---|---|---|
| Basic Iteration | O(n²) | Simple list | Small files |
| difflib | O(n²) | Detailed diff | Visual comparison |
| Set Operations | O(n) | Clean unique lines | Large files |
Conclusion
Use set operations for efficient processing of large files. Choose difflib when you need detailed comparison output with context. Basic iteration works well for simple cases with small files.
