How to compare two sorted files line by line in the Linux system?

To compare two sorted files line by line in Linux, we use the comm command. The comm command compares two sorted files and displays the differences and similarities in a structured three-column format.

The comm command writes output to three tab-separated columns: the first column contains lines unique to the first file, the second column contains lines unique to the second file, and the third column contains lines common to both files. Both input files must be sorted for comm to work correctly.

Syntax

The general syntax of the comm command is −

comm [OPTION]... FILE1 FILE2

Options

Option Description
-1 Suppress first column (lines unique to FILE1)
-2 Suppress second column (lines unique to FILE2)
-3 Suppress third column (lines common to both files)
--check-order Check that input files are correctly sorted
--nocheck-order Don't verify that input files are sorted
--output-delimiter=STR Use custom string STR to separate columns instead of tabs
--total Display summary statistics
-z, --zero-terminated Use NULL character as line delimiter instead of newline

Examples

Basic Comparison

Compare two sorted files and display all three columns −

comm file_first file_second
Abhishek
Anand
		Annie
		Bidu
Bruce
Celesy
		Chiku
		Sayani
Vikash
		Wandra

In this output, lines in the first column are unique to file_first, lines in the second column are unique to file_second, and lines in the third column appear in both files.

Suppressing Columns

Show only lines unique to the second file and common lines (suppress first column) −

comm -1 file_first file_second
	Annie
	Bidu
	Chiku
	Sayani
	Wandra

Show only lines unique to the first file and common lines (suppress second column) −

comm -2 file_first file_second
Abhishek
Anand
	Annie
	Bidu
Bruce
Celesy
	Chiku
	Sayani
Vikash
	Wandra

Finding Common Lines Only

Display only lines that appear in both files (suppress first and second columns) −

comm -12 file_first file_second

Key Points

  • Sorted input required − Both files must be sorted for accurate comparison

  • Three-column output − Column 1 (unique to file1), Column 2 (unique to file2), Column 3 (common)

  • Tab separation − Columns are separated by tab characters by default

  • Case sensitivity − Comparison is case-sensitive

Comparison with diff

Feature comm diff
Input requirement Files must be sorted No sorting required
Output format Three-column structured Unified or context format
Primary use Set operations on sorted data Line-by-line differences
Performance Fast for sorted files Works with any files

Conclusion

The comm command is an efficient tool for comparing sorted files in Linux, providing a clear three-column output showing unique and common lines. It's particularly useful for set operations and analyzing differences between sorted datasets, complementing the more general-purpose diff command.

Updated on: 2026-03-17T09:01:38+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements