Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Linux comm Command
The comm command is a powerful Linux utility used to compare two sorted files line by line. It displays the comparison results in three columns: lines unique to the first file, lines unique to the second file, and lines common to both files. This command is essential for file analysis, data comparison, and finding differences between datasets.
Syntax
comm [OPTION]... FILE1 FILE2
Where FILE1 and FILE2 are the two sorted files to be compared.
Common Options
-1 Suppress column 1 (lines unique to FILE1)
-2 Suppress column 2 (lines unique to FILE2)
-3 Suppress column 3 (lines common to both files)
-i Ignore case distinctions in comparisons
--check-order Verify that input files are correctly sorted
How It Works
The comm command produces output in three tab-separated columns:
| Column 1 | Column 2 | Column 3 |
|---|---|---|
| Lines unique to FILE1 | Lines unique to FILE2 | Lines common to both files |
Example Basic Comparison
Consider two sorted files with the following content:
file1.txt:
apple banana grape mango orange
file2.txt:
apple banana cherry mango watermelon
Comparing these files:
comm file1.txt file2.txt
Output:
apple banana cherry grape mango orange watermelon
In this output, grape and orange are unique to file1.txt, cherry and watermelon are unique to file2.txt, while apple, banana, and mango are common to both.
Suppressing Columns
You can suppress specific columns to focus on particular comparisons:
# Show only lines unique to file1 comm -23 file1.txt file2.txt
grape orange
# Show only common lines comm -12 file1.txt file2.txt
apple banana mango
Handling Unsorted Files
Important: Files must be sorted for comm to work correctly. For unsorted files, use sort first:
# Sort files before comparison sort file1.txt > sorted_file1.txt sort file2.txt > sorted_file2.txt comm sorted_file1.txt sorted_file2.txt # Or use process substitution comm <(sort file1.txt) <(sort file2.txt)
Case-Insensitive Comparison
To ignore case differences when comparing files, use the -i option:
comm -i file1.txt file2.txt
This treats "Apple" and "apple" as identical lines.
Practical Use Cases
Finding unique entries: Identify items present in one file but not the other
Data validation: Verify completeness of datasets by finding missing records
Set operations: Perform union, intersection, and difference operations on sorted lists
Log analysis: Compare log files to identify changes or differences
Conclusion
The comm command is an efficient tool for comparing sorted files and identifying unique or common lines. By understanding its column-based output and various options, you can perform sophisticated file comparisons and data analysis tasks in Linux environments.
