Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
The uniq Command in Linux
The uniq command in Linux is a powerful text processing utility used to filter out duplicate lines from sorted text files. It works by comparing adjacent lines and removing consecutive duplicates, making it an essential tool for data cleaning and text manipulation tasks.
Syntax
The basic syntax of the uniq command is straightforward:
uniq [options] [input_file] [output_file]
Where options are command-line switches that modify the behavior, input_file is the file to process (defaults to stdin), and output_file is where results are written (defaults to stdout).
Important Note
Critical: The uniq command only removes adjacent duplicate lines. For unsorted data, you typically need to sort first:
sort file.txt | uniq
Options
| Option | Description | Example Output |
|---|---|---|
| -c | Prefix lines with occurrence count | 3 apple |
| -d | Show only duplicate lines | Shows repeated lines once |
| -u | Show only unique lines (non-duplicated) | Lines that appear exactly once |
| -i | Ignore case when comparing | Treats "Apple" and "apple" as same |
| -f N | Skip first N fields | Compare starting from field N+1 |
| -s N | Skip first N characters | Compare starting from character N+1 |
Examples
Example 1: Basic Duplicate Removal
Given a sorted file fruits.txt:
cat fruits.txt
apple apple banana banana orange
Remove duplicates:
uniq fruits.txt
apple banana orange
Example 2: Counting Occurrences
uniq -c fruits.txt
2 apple 2 banana 1 orange
Example 3: Show Only Duplicates
uniq -d fruits.txt
apple banana
Example 4: Show Only Unique Lines
uniq -u fruits.txt
orange
Example 5: Case-Insensitive Processing
Given a file with mixed case:
uniq -i -c mixed_case.txt
3 Apple 2 Banana
Practical Use Cases
Log Analysis: Remove duplicate entries from log files for cleaner analysis
Data Cleaning: Eliminate duplicate records from CSV files or datasets
System Administration: Find unique IP addresses in access logs
Text Processing: Remove repeated lines from configuration files
Shell Scripting: Create unique lists for automated processing
Common Patterns
Process Unsorted Data
sort data.txt | uniq -c | sort -nr
This sorts the data, counts duplicates, then sorts by frequency (highest first).
Find Most Common Lines
sort access.log | uniq -c | sort -nr | head -10
Skip Headers When Processing
tail -n +2 file.csv | sort | uniq
Common Errors
"uniq: missing operand" Provide an input file or use stdin
"uniq: output file is same as input file" Use different output filename
"uniq: cannot open file" Check file path and permissions
Unexpected results Remember to sort data first for non-adjacent duplicates
Conclusion
The uniq command is essential for text processing and data analysis in Linux. It efficiently removes adjacent duplicate lines and provides options for counting, filtering, and case-insensitive processing. Remember that uniq only works on adjacent lines, so combine it with sort for complete duplicate removal from unsorted data.
