Python Program to Find Unique Lines From Two Text Files

Finding unique lines between two text files is a common task in data processing and file comparison. Python provides several approaches to accomplish this, from basic iteration to using specialized libraries. In this article, we'll explore three different methods to identify lines that exist in one file but not the other.

Sample Data

For our examples, we'll work with two text files containing course names. Here's the content comparison ?

Course Name In a.txt In b.txt
Introduction to Computers Yes Yes
Introduction to Programming Concepts Yes Yes
Introduction to Windows, its Features, Application Yes Yes
C++ Programming No Yes
Computer Organization Principles Yes Yes
Database Management Systems Yes Yes
Introduction to Embedded Systems Yes Yes
Fundamentals of PHP Yes Yes
Mathematical Foundation For Computer Science Yes No
Java Programming Yes Yes
Functions Yes Yes
Arrays Yes Yes
Disk Operating System Yes Yes
Introduction to Number system and codes No Yes
Data Mining Yes Yes
Software Engineering Yes No
Computer Networks Yes Yes
Control Structures Yes Yes

Using Basic File Iteration

This approach reads both files and compares lines using simple iteration ?

# Create sample files for demonstration
with open('a.txt', 'w') as f:
    f.write("Introduction to Computers\n")
    f.write("Mathematical Foundation For Computer Science\n") 
    f.write("Software Engineering\n")
    f.write("Java Programming\n")

with open('b.txt', 'w') as f:
    f.write("Introduction to Computers\n")
    f.write("C++ Programming\n")
    f.write("Introduction to Number system and codes\n")
    f.write("Java Programming\n")

# Find unique lines
af = open('a.txt', 'r')
afile = af.readlines()
bf = open('b.txt', 'r')
bfile = bf.readlines()

unique_lines = []

# Lines in b.txt but not in a.txt
for line in bfile:
    if line not in afile:
        unique_lines.append(line)

# Lines in a.txt but not in b.txt  
for line in afile:
    if line not in bfile:
        unique_lines.append(line)

# Write results
with open('result1.txt', 'w') as result_file:
    for line in unique_lines:
        result_file.write(line)

# Display results
print("Unique lines found:")
for line in unique_lines:
    print(line.strip())

af.close()
bf.close()
Unique lines found:
C++ Programming
Introduction to Number system and codes
Mathematical Foundation For Computer Science
Software Engineering

Using difflib Library

The difflib module provides tools for comparing sequences and can identify differences with detailed formatting ?

from difflib import Differ

# Read files
with open('a.txt', 'r') as af:
    afile = af.readlines()

with open('b.txt', 'r') as bf:
    bfile = bf.readlines()

# Compare files using Differ
differ = Differ()
result = list(differ.compare(afile, bfile))

# Write comparison result
with open('result2.txt', 'w') as result_file:
    for line in result:
        result_file.write(line)

# Display formatted output
print("File comparison (- means only in a.txt, + means only in b.txt):")
for line in result:
    if line.startswith('- ') or line.startswith('+ '):
        print(line.strip())
File comparison (- means only in a.txt, + means only in b.txt):
+ C++ Programming
+ Introduction to Number system and codes
- Mathematical Foundation For Computer Science
- Software Engineering

Using Set Operations

This method uses Python sets to efficiently remove common lines and retain unique ones ?

# Read first file into a set
with open('a.txt', 'r') as af:
    afile = set(af.read().splitlines())

# Process second file
unique_b = []
unique_lines = []

with open('b.txt', 'r') as bf:
    bfile = set(bf.read().splitlines())

# Find lines unique to each file
unique_a = afile - bfile  # Lines only in a.txt
unique_b = bfile - afile  # Lines only in b.txt

# Combine all unique lines
all_unique = unique_a.union(unique_b)

print("Lines only in a.txt:")
for line in sorted(unique_a):
    print(f"  {line}")

print("\nLines only in b.txt:")  
for line in sorted(unique_b):
    print(f"  {line}")

# Write results to file
with open('result3.txt', 'w') as result_file:
    for line in sorted(all_unique):
        result_file.write(line + '\n')
Lines only in a.txt:
  Mathematical Foundation For Computer Science
  Software Engineering

Lines only in b.txt:
  C++ Programming
  Introduction to Number system and codes

Comparison

Method Performance Output Format Best For
Basic Iteration O(n²) Simple list Small files
difflib O(n²) Detailed diff Visual comparison
Set Operations O(n) Clean unique lines Large files

Conclusion

Use set operations for efficient processing of large files. Choose difflib when you need detailed comparison output with context. Basic iteration works well for simple cases with small files.

Updated on: 2026-03-27T07:19:48+05:30

603 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements