Remove Lines Which Appear in File B From Another File A in Linux


You can use the grep command in Linux to remove the lines from file A that appear in file B.

The basic syntax is −

grep -v -f fileB.txt fileA.txt > outputFile.txt

This command uses the -v option to invert the match, so that it returns lines that do not match those in file B. The -f option specifies the file containing the patterns to match. The output is redirected to a new file called outputFile.txt.

Alternatively, you can use sed command

sed -i '/$(grep -f fileB.txt fileA.txt)/d' fileA.txt

This command uses the -i option to edit the file in-place, the /.../d specifies that lines matching the pattern should be deleted.

You can also use awk command

awk 'FNR==NR{a[$0];next} !($0 in a)' fileB.txt fileA.txt > outputFile.txt

This command compares fileB.txt and fileA.txt and prints the lines from fileA.txt that does not exist in fileB.txt into outputFile.txt

Using the comm and sort Commands

You can use the comm and sort commands in Linux to remove the lines from file A that appear in file B.

First, you need to sort both files −

sort fileA.txt > fileA_sorted.txt
sort fileB.txt > fileB_sorted.txt

Then, use the comm command to compare the two sorted files −

comm -23 fileA_sorted.txt fileB_sorted.txt > outputFile.txt

The -23 option tells comm to print only the lines that are unique to file A (lines that do not appear in file B). The output is redirected to a new file called outputFile.txt.

Alternatively, you can also use

comm -13 fileA_sorted.txt fileB_sorted.txt > outputFile.txt

this will print only the lines that appear in file A but not in file B.

It's important to note that both of the files need to be sorted before using the comm command.

Using the join and sort Commands

You can use the join and sort commands in Linux to remove the lines from file A that appear in file B.

First, you need to sort both files −

sort fileA.txt > fileA_sorted.txt
sort fileB.txt > fileB_sorted.txt

Then, use the join command to compare the two sorted files −

join -v 1 fileA_sorted.txt fileB_sorted.txt > outputFile.txt

The -v 1 option tells join to print only the lines that are unique to file A (lines that do not appear in file B). The output is redirected to a new file called outputFile.txt.

Alternatively, you can also use

join -v 2 fileA_sorted.txt fileB_sorted.txt > outputFile.txt

this will print only the lines that appear in file B but not in file A

It's important to note that both of the files need to be sorted before using the join command and that the join command needs to have a common field, if the files do not have any common field you need to add it before using the command.

Using the grep Command

You can use the grep command in Linux to remove the lines from file A that appear in file B.

The basic syntax is −

grep -v -f fileB.txt fileA.txt > outputFile.txt

This command uses the -v option to invert the match, so that it returns lines that do not match those in file B. The -f option specifies the file containing the patterns to match. The output is redirected to a new file called outputFile.txt.

Alternatively, you can also use

grep -vxf fileB.txt fileA.txt > outputFile.txt

This command also uses the -v option to invert the match, and the -x option to match the whole line, and the -f option to specify the file containing the patterns to match.

It's important to note that this command works best if the lines in both files are unique, if the files contain duplicate lines, you might end up removing lines that you want to keep in the output file.

Using the awk Command

You can use the awk command in Linux to remove the lines from file A that appear in file B.

The basic syntax is −

awk 'FNR==NR{a[$0];next} !($0 in a)' fileB.txt fileA.txt > outputFile.txt

This command compares fileB.txt and fileA.txt and prints the lines from fileA.txt that does not exist in fileB.txt into outputFile.txt

Alternatively, you can also use −

awk 'NR==FNR{a[$0];next} !($0 in a)' fileB.txt fileA.txt > outputFile.txt

This command also compares fileB.txt and fileA.txt and prints the lines from fileA.txt that does not exist in fileB.txt into outputFile.txt.

It's important to note that this command works best if the lines in both files are unique, if the files contain duplicate lines, you might end up removing lines that you want to keep in the output file.

Conclusion

There are several ways to remove the lines from file A that appear in file B in Linux, such as using the grep, comm, join, sed, and awk commands. The grep command uses the -v option to invert the match and the -f option to specify the file containing the patterns to match. The comm, join command requires both files to be sorted before using the command. The sed command uses -i option to edit the file in-place, the /.../d specifies that lines matching the pattern should be deleted. The awk command uses FNR==NR{a[$0];next} !($0 in a) or NR==FNR{a[$0];next} !($0 in a) to compare file B and file A and print the lines from file A that do not exist in file B into outputFile.txt. It is important to note that all of these commands work best if the lines in both files are unique, if the files contain duplicate lines, you might end up removing lines that you want to keep in the output file.

Updated on: 24-Jan-2023

4K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements