Python Program to Find Unique Lines From Two Text Files


Many times we see two files that look similar but had certain differences. If the files are big or have lots of content, searching for that difference or finding the uniqueness in that file manually is not easy. However, this problem of finding the unique lines in two text files can be done easily using Python programs. In this article, using three different examples, three different ways of finding the unique lines in two text files are given. The text files used are a.txt and b.txt while in the end the result is stored in another txt file.

For these example, the content or lines differences in the txt files are given here −

Lines Given In text files In a.txt In b.txt

Introduction to Computers

Yes

Yes

Introduction to Programming Concepts

Yes

Yes

Introduction to Windows, its Features, Application

Yes

Yes

C++ Programming

No

Yes

Computer Organization Principles

Yes

Yes

Database Management Systems

Yes

Yes

Introduction to Embedded Systems

Yes

Yes

Fundamentals of PHP

Yes

Yes

Mathematical Foundation For Computer Science

Yes

No

Java Programming

Yes

Yes

Functions

Yes

Yes

Arrays

Yes

Yes

Disk Operating System

Yes

Yes

Introduction to Number system and codes

No

Yes

Data Mining

Yes

Yes

Software Engineering

Yes

No

Computer Networks

Yes

Yes

Control Structures

Yes

Yes

Example 1 - Find Unique Lines From Two Text Files by iterating and comparing the individual lines in both files.

Algorithm

Step 1 − Open both text files in read mode.

Step 2 − Read lines in a.txt in afile and readlines in b.txt and store it in bfile

Step 3 − Make an empty list called cfile. Go through line by line in bfile. If a line is not present in afile append it to cfile.

Step 4 − Now go through line by line in afile. If a line is not present in bfile append it to cfile. Write the cfile to finalRes.txt.

Step 5 − Run the program and then check the result.

The Python File Contains this

af = open('a.txt', 'r')
afile = af.readlines()
bf = open('b.txt', 'r')
bfile = bf.readlines()
cfile=[]
for ln in bfile:
   if ln not in afile:
      cfile.append(ln)

for ln in afile:
   if ln not in bfile:
      cfile.append(ln)        

resultFile= open('finalRes.txt', 'w')
for lin in cfile:
   resultFile.write(lin)

Viewing The Result - Example 1

For seeing the unique lines in both the txt files as result run the Python file in the cmd window.

C++ Programming
Mathematical Foundation For Computer Science
Software Engineering

Fig 1: Content of the result file called finalRes.txt.

Example 2: Find Unique Lines From Two Text Files by using difflib library module.

Algorithm

Step 1 − First import Differ module from the difflib.

Step 2 − Open both text files in read mode.

Step 3 − Read lines in a.txt in afile and readlines in b.txt and store them in bfile.

Step 4 − Compare the file differences using Differ module. Write the result to finalRes1.txt.

Step 5 − Run the program and then check the result.

The Python File Contains this

from difflib import Differ

af = open('a.txt', 'r')
afile = af.readlines()
bf = open('b.txt', 'r')
bfile = bf.readlines()

result = list(Differ().compare(afile, bfile))  

resultFile= open('finalRes1.txt', 'w')

for lin in result:
   resultFile.write(lin)

Viewing The Result - Example 2

Open the cmd window and run the python file to see the result. The result file will show – or + infront of the unique lines in bothe the files. + sign means that that line is not given in first txt file while - means that that line is not present in the second txt file.

Introduction to Computers
  Introduction to Programming Concepts
  Introduction to Windows, its Features, Application
+ C++ Programming
  Computer Organization Principles
  Database Management Systems
  Introduction to Embedded Systems
  Fundamentals of PHP
- Mathematical Foundation For Computer Science
  Java Programming
  Functions
  Arrays
  Disk Operating System
  Introduction to Number system and codes
  Data Mining
- Software Engineering
  Computer Networks
  Control Structures

Fig 2: Content of the result file called finalRes1.txt

Example 3: Find Unique Lines From Two Text Files by using removing similar lines and retaining unique lines.

Algorithm

Step 1 − Open both text files in read mode.

Step 2 − Read lines in a.txt in afile and open b.txt and store it in bf.

Step 3 − For all lines in bf, if that line is in afile, remove it from a file. If it is not in afile, append it to another list called uniqueB

Step 4 − Append the lines left in afile and those in uniqueB to cfile. Write the cfile to finalRes2.txt.

Step 5 − Deploy the program and then check the result.

The Python File Contains this

with open('a.txt', 'r') as af:
   afile = set(af) 
uniqueB = []
cfile=[]
with open('b.txt', 'r') as bf:
   for ln in bf:
      if ln in afile:
         afile.remove(ln)
      else:
         uniqueB.append(ln)
print("\nPrinting all unique lines in both a.txt and b.txt : ")
print('\nAll the lines in a.txt file that are not in b.txt: \n')

for ln in sorted(afile):
   print(ln.rstrip())
   cfile.append(ln)  
print()

print('\nAll the lines in b.txt file that are not in a.txt: \n')

for lin in uniqueB:
   print(lin.rstrip())
   cfile.append(lin)  
print()

resultFile= open('finalRes2.txt', 'w')
for lin in cfile:
   resultFile.write(lin)

Viewing The Result - Example 3

For seeing the unique lines in both the txt files as result, run the Python file in the cmd window.

Mathematical Foundation For Computer Science
Software Engineering
C++ Programming

Fig 3: Content of the result file called finalRes2.txt.

Conclusion

In this Python article, using three different examples, the ways to show how to find unique lines in two text files are given. In example1, simple iteration and comparision is used by going line by line in both the txt files. In example 2, a library module called Differ from difflib is used. In example 3, the similar lines are removed while retaining the unique lines using Python lists.

Updated on: 10-Jul-2023

140 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements