Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to compare two sorted files line by line in the Linux system?
To compare two sorted files line by line in Linux, we use the comm command. The comm command compares two sorted files and displays the differences and similarities in a structured three-column format.
The comm command writes output to three tab-separated columns: the first column contains lines unique to the first file, the second column contains lines unique to the second file, and the third column contains lines common to both files. Both input files must be sorted for comm to work correctly.
Syntax
The general syntax of the comm command is −
comm [OPTION]... FILE1 FILE2
Options
| Option | Description |
|---|---|
| -1 | Suppress first column (lines unique to FILE1) |
| -2 | Suppress second column (lines unique to FILE2) |
| -3 | Suppress third column (lines common to both files) |
| --check-order | Check that input files are correctly sorted |
| --nocheck-order | Don't verify that input files are sorted |
| --output-delimiter=STR | Use custom string STR to separate columns instead of tabs |
| --total | Display summary statistics |
| -z, --zero-terminated | Use NULL character as line delimiter instead of newline |
Examples
Basic Comparison
Compare two sorted files and display all three columns −
comm file_first file_second
Abhishek Anand Annie Bidu Bruce Celesy Chiku Sayani Vikash Wandra
In this output, lines in the first column are unique to file_first, lines in the second column are unique to file_second, and lines in the third column appear in both files.
Suppressing Columns
Show only lines unique to the second file and common lines (suppress first column) −
comm -1 file_first file_second
Annie Bidu Chiku Sayani Wandra
Show only lines unique to the first file and common lines (suppress second column) −
comm -2 file_first file_second
Abhishek Anand Annie Bidu Bruce Celesy Chiku Sayani Vikash Wandra
Finding Common Lines Only
Display only lines that appear in both files (suppress first and second columns) −
comm -12 file_first file_second
Key Points
Sorted input required − Both files must be sorted for accurate comparison
Three-column output − Column 1 (unique to file1), Column 2 (unique to file2), Column 3 (common)
Tab separation − Columns are separated by tab characters by default
Case sensitivity − Comparison is case-sensitive
Comparison with diff
| Feature | comm | diff |
|---|---|---|
| Input requirement | Files must be sorted | No sorting required |
| Output format | Three-column structured | Unified or context format |
| Primary use | Set operations on sorted data | Line-by-line differences |
| Performance | Fast for sorted files | Works with any files |
Conclusion
The comm command is an efficient tool for comparing sorted files in Linux, providing a clear three-column output showing unique and common lines. It's particularly useful for set operations and analyzing differences between sorted datasets, complementing the more general-purpose diff command.
