Article Categories

Selected Reading

Python Program to Find Unique Lines From Two Text Files

Python Server Side Programming Programming

Finding unique lines between two text files is a common task in data processing and file comparison. Python provides several approaches to accomplish this, from basic iteration to using specialized libraries. In this article, we'll explore three different methods to identify lines that exist in one file but not the other.

Sample Data

For our examples, we'll work with two text files containing course names. Here's the content comparison ?

Course Name	In a.txt	In b.txt
Introduction to Computers	Yes	Yes
Introduction to Programming Concepts	Yes	Yes
Introduction to Windows, its Features, Application	Yes	Yes
C++ Programming	No	Yes
Computer Organization Principles	Yes	Yes
Database Management Systems	Yes	Yes
Introduction to Embedded Systems	Yes	Yes
Fundamentals of PHP	Yes	Yes
Mathematical Foundation For Computer Science	Yes	No
Java Programming	Yes	Yes
Functions	Yes	Yes
Arrays	Yes	Yes
Disk Operating System	Yes	Yes
Introduction to Number system and codes	No	Yes
Data Mining	Yes	Yes
Software Engineering	Yes	No
Computer Networks	Yes	Yes
Control Structures	Yes	Yes

Using Basic File Iteration

This approach reads both files and compares lines using simple iteration ?

# Create sample files for demonstration
with open('a.txt', 'w') as f:
    f.write("Introduction to Computers\n")
    f.write("Mathematical Foundation For Computer Science\n") 
    f.write("Software Engineering\n")
    f.write("Java Programming\n")

with open('b.txt', 'w') as f:
    f.write("Introduction to Computers\n")
    f.write("C++ Programming\n")
    f.write("Introduction to Number system and codes\n")
    f.write("Java Programming\n")

# Find unique lines
af = open('a.txt', 'r')
afile = af.readlines()
bf = open('b.txt', 'r')
bfile = bf.readlines()

unique_lines = []

# Lines in b.txt but not in a.txt
for line in bfile:
    if line not in afile:
        unique_lines.append(line)

# Lines in a.txt but not in b.txt  
for line in afile:
    if line not in bfile:
        unique_lines.append(line)

# Write results
with open('result1.txt', 'w') as result_file:
    for line in unique_lines:
        result_file.write(line)

# Display results
print("Unique lines found:")
for line in unique_lines:
    print(line.strip())

af.close()
bf.close()

Unique lines found:
C++ Programming
Introduction to Number system and codes
Mathematical Foundation For Computer Science
Software Engineering

Using difflib Library

The difflib module provides tools for comparing sequences and can identify differences with detailed formatting ?

from difflib import Differ

# Read files
with open('a.txt', 'r') as af:
    afile = af.readlines()

with open('b.txt', 'r') as bf:
    bfile = bf.readlines()

# Compare files using Differ
differ = Differ()
result = list(differ.compare(afile, bfile))

# Write comparison result
with open('result2.txt', 'w') as result_file:
    for line in result:
        result_file.write(line)

# Display formatted output
print("File comparison (- means only in a.txt, + means only in b.txt):")
for line in result:
    if line.startswith('- ') or line.startswith('+ '):
        print(line.strip())

File comparison (- means only in a.txt, + means only in b.txt):
+ C++ Programming
+ Introduction to Number system and codes
- Mathematical Foundation For Computer Science
- Software Engineering

Using Set Operations

This method uses Python sets to efficiently remove common lines and retain unique ones ?

# Read first file into a set
with open('a.txt', 'r') as af:
    afile = set(af.read().splitlines())

# Process second file
unique_b = []
unique_lines = []

with open('b.txt', 'r') as bf:
    bfile = set(bf.read().splitlines())

# Find lines unique to each file
unique_a = afile - bfile  # Lines only in a.txt
unique_b = bfile - afile  # Lines only in b.txt

# Combine all unique lines
all_unique = unique_a.union(unique_b)

print("Lines only in a.txt:")
for line in sorted(unique_a):
    print(f"  {line}")

print("\nLines only in b.txt:")  
for line in sorted(unique_b):
    print(f"  {line}")

# Write results to file
with open('result3.txt', 'w') as result_file:
    for line in sorted(all_unique):
        result_file.write(line + '\n')

Lines only in a.txt:
  Mathematical Foundation For Computer Science
  Software Engineering

Lines only in b.txt:
  C++ Programming
  Introduction to Number system and codes

Comparison

Method	Performance	Output Format	Best For
Basic Iteration	O(n²)	Simple list	Small files
difflib	O(n²)	Detailed diff	Visual comparison
Set Operations	O(n)	Clean unique lines	Large files

Conclusion

Use set operations for efficient processing of large files. Choose difflib when you need detailed comparison output with context. Basic iteration works well for simple cases with small files.

Saba Hilal

Updated on: 2026-03-27T07:19:48+05:30

643 Views

Previous Next