Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Remove Lines Which Appear in File B From Another File A in Linux
Removing lines from one file that appear in another file is a common task in Linux system administration and data processing. This operation, also known as set difference, can be accomplished using several command-line utilities, each with its own advantages and use cases.
Using the grep Command
The grep command is the most straightforward approach for this task. It uses pattern matching to filter lines.
grep -v -f fileB.txt fileA.txt > outputFile.txt
This command uses the -v option to invert the match (show non-matching lines) and -f to specify the file containing patterns to exclude. For exact line matching, use the -x option:
grep -vxf fileB.txt fileA.txt > outputFile.txt
Example
Given two files:
# fileA.txt apple banana cherry date elderberry # fileB.txt banana date
apple cherry elderberry
Using the comm Command
The comm command compares two sorted files line by line. Both files must be sorted before using this method.
sort fileA.txt > fileA_sorted.txt sort fileB.txt > fileB_sorted.txt comm -23 fileA_sorted.txt fileB_sorted.txt > outputFile.txt
The -23 option suppresses columns 2 and 3, showing only lines unique to the first file. This approach is highly efficient for large files.
Using the awk Command
The awk command provides a powerful solution that reads both files and creates an associative array for comparison.
awk 'FNR==NR{a[$0];next} !($0 in a)' fileB.txt fileA.txt > outputFile.txt
This command first reads fileB.txt and stores all lines in array a, then processes fileA.txt and prints lines not found in the array.
Using the join Command
The join command can also perform this operation on sorted files:
sort fileA.txt > fileA_sorted.txt sort fileB.txt > fileB_sorted.txt join -v 1 fileA_sorted.txt fileB_sorted.txt > outputFile.txt
The -v 1 option prints only lines unique to the first file.
Performance Comparison
| Method | Preprocessing | Best For | Memory Usage |
|---|---|---|---|
| grep | None | Small to medium files | Low |
| comm | Sort both files | Large files | Low |
| awk | None | Complex processing | Medium |
| join | Sort both files | Structured data | Low |
Key Points
File order matters Always specify the exclusion file (fileB) before the source file (fileA) in awk commands.
Sorting requirement
commandjoincommands require pre-sorted input files.Exact matching Use
grep -xfor whole-line matching to avoid partial matches.Duplicate handling All methods preserve duplicates from fileA unless the duplicate also exists in fileB.
Conclusion
Multiple Linux commands can remove lines from one file that appear in another. The grep command offers simplicity for most use cases, while comm provides better performance for large sorted files. Choose the method that best fits your data size and processing requirements.
