Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Fastest way to tell if two files have the same contents in Unix/Linux
Let's say that we have two files inside a directory called dir1, and at first both these files are different. Different in the sense that the text they contain isn't the same.
The files in the folder −
immukul@192 dir1 % ls -ltr total 16 -rw-r--r-- 1 immukul staff 7 Jul 7 10:37 2.txt -rw-r--r-- 1 immukul staff 8 Jul 8 19:05 3.txt
The contents inside the first file (2.txt) looks something like this −
immukul@192 dir1 % cat 2.txt orange
The contents inside the second file (3.txt) looks something like this −
immukul@192 dir1 % cat 3.txt uorange
Methods to Compare Files
Using diff Command
We can easily make use of the diff command to check if they have something different. Consider the command shown below −
diff 2.txt 3.txt
Output
1c1 < orange --- > uorange
But in case where the contents of the file are exactly the same, then the diff command won't return any output.
Using cmp Command (Fastest Method)
In that case it is recommended to make use of the cmp command. The cmp command is a Linux utility command that is used to compare two files byte by byte. It is the fastest way to determine if two files have identical contents because it stops at the first difference.
cmp --silent 2.txt 3.txt || echo "Difference in Files"
Output
immukul@192 dir1 % cmp --silent 2.txt 3.txt || echo "Difference in Files" Difference in Files
Alternative Methods
Using Checksums
For large files, you can use checksums to quickly compare files without reading the entire content −
md5sum file1.txt file2.txt sha256sum file1.txt file2.txt
Using test Command
You can also use the test command with cmp for conditional checking −
if cmp --silent file1.txt file2.txt; then
echo "Files are identical"
else
echo "Files are different"
fi
Comparison of Methods
| Method | Speed | Memory Usage | Best For |
|---|---|---|---|
| cmp | Fastest | Low | Quick identical check |
| diff | Moderate | Higher | Detailed differences |
| Checksums | Fast | Low | Large files |
Conclusion
The cmp command with the --silent flag is the fastest way to determine if two files have identical contents in Unix/Linux. It performs a byte-by-byte comparison and stops immediately when it finds the first difference, making it more efficient than diff for simple identical content checking.
