How to eliminate repeated lines in a python function?


In this article we will discuss how to delete multiple lines that are repeated in Python. If the file is small and only has a few lines, the process of removing repeated lines from it could be performed manually. However, when dealing with huge files, Python can assist.

Using File Handling Method

Python has built-in methods for creating, opening, and closing files, which makes handling files easier. Python also enables doing several file actions, such as reading, writing, and appending data, while files are open.

To remove duplicate lines from a Python text file or function, we use file handling methods in Python. The text file or function must be in the same directory as the.py file that contains the Python program.

Algorithm

Following is an approach to eliminate repeated lines in a Python function

  • Since we will only be reading the content of this file, open the input file in read-only mode first.

  • Now, to write the content to this file, open the output file in write mode.

  • Read the input file line by line, then examine the output file to see if any lines similar to this one was written there.

  • If not, add this line to the output file and save the line's hash value in a set. Instead of inspecting and storing the entire line, we will instead check the hash value of each line. This is more effective and takes less space when dealing with large files.

  • Skip that line if the hash value has already been added to the set.

  • When everything is done, the output file will contain every line from the input file without duplicating anything.

In here, the input file i.e. "File.txt" contains the following data −

Welcome to TutorialsPoint.
Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
eliminate repeated lines.
eliminate repeated lines.
Skip the line.

Example

Following is an example to eliminate repeated lines in a Python function −

import hashlib # path of the input and output files OutFile = 'C:\Users\Lenovo\Downloads\Work TP\pre.txt' InFile = r'C:\Users\Lenovo\Downloads\Work TP\File.txt' # holding the line which is already seen lines_present = set() # opening the output file in write mode to write in it The_Output_File = open(OutFile, "w") # loop for opening the file in read mode for l in open(InFile, "r"): # finding the hash value of the current line # Before performing the hash, we remove any blank spaces and new lines from the end of the line. # Using hashlib library determine the hash value of a line. hash_value = hashlib.md5(l.rstrip().encode('utf-8')).hexdigest() if hash_value not in lines_present: The_Output_File.write(l) lines_present.add(hash_value) # closing the output text file The_Output_File.close()

Output

We can see in the following output that all the repeated lines from the input file is eliminated in the output file which contains the unique data as shown below −

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

Example

Following is another example to eliminate repeated lines in a Python function −

# path of the input and output files # Create the output file in write mode OutFile = open('C:\Users\Lenovo\Downloads\Work TP\pre.txt',"w") 11 # Create an input file in read mode InFile = open('C:\Users\Lenovo\Downloads\Work TP\File.txt', "r") # holding the line which is already seen lines_present = set() # iterate every line present in the file for l in InFile: # check whether the lines are unique if l not in lines_present: # writing all the unique lines in the output file OutFile.write(l) # adding unique lines in the lines_present lines_present.add(l) # closing the output text files OutFile.close() InFile.close()

Output

We can see in the following output that all the repeated lines from the input file is eliminated in the output file which contains the unique data as shown below

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

Updated on: 09-Sep-2023

5K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements