Remove Lines Which Appear in File B From Another File A in Linux

Removing lines from one file that appear in another file is a common task in Linux system administration and data processing. This operation, also known as set difference, can be accomplished using several command-line utilities, each with its own advantages and use cases.

Using the grep Command

The grep command is the most straightforward approach for this task. It uses pattern matching to filter lines.

grep -v -f fileB.txt fileA.txt > outputFile.txt

This command uses the -v option to invert the match (show non-matching lines) and -f to specify the file containing patterns to exclude. For exact line matching, use the -x option:

grep -vxf fileB.txt fileA.txt > outputFile.txt

Example

Given two files:

# fileA.txt
apple
banana
cherry
date
elderberry

# fileB.txt
banana
date
apple
cherry
elderberry

Using the comm Command

The comm command compares two sorted files line by line. Both files must be sorted before using this method.

sort fileA.txt > fileA_sorted.txt
sort fileB.txt > fileB_sorted.txt
comm -23 fileA_sorted.txt fileB_sorted.txt > outputFile.txt

The -23 option suppresses columns 2 and 3, showing only lines unique to the first file. This approach is highly efficient for large files.

Using the awk Command

The awk command provides a powerful solution that reads both files and creates an associative array for comparison.

awk 'FNR==NR{a[$0];next} !($0 in a)' fileB.txt fileA.txt > outputFile.txt

This command first reads fileB.txt and stores all lines in array a, then processes fileA.txt and prints lines not found in the array.

Using the join Command

The join command can also perform this operation on sorted files:

sort fileA.txt > fileA_sorted.txt
sort fileB.txt > fileB_sorted.txt
join -v 1 fileA_sorted.txt fileB_sorted.txt > outputFile.txt

The -v 1 option prints only lines unique to the first file.

Performance Comparison

Method Preprocessing Best For Memory Usage
grep None Small to medium files Low
comm Sort both files Large files Low
awk None Complex processing Medium
join Sort both files Structured data Low

Key Points

  • File order matters Always specify the exclusion file (fileB) before the source file (fileA) in awk commands.

  • Sorting requirement comm and join commands require pre-sorted input files.

  • Exact matching Use grep -x for whole-line matching to avoid partial matches.

  • Duplicate handling All methods preserve duplicates from fileA unless the duplicate also exists in fileB.

Conclusion

Multiple Linux commands can remove lines from one file that appear in another. The grep command offers simplicity for most use cases, while comm provides better performance for large sorted files. Choose the method that best fits your data size and processing requirements.

Updated on: 2026-03-17T09:01:38+05:30

7K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements