Count Duplicate Lines in a Text File on Linux

There are several reasons why you might want to count duplicate lines in a text file on Linux. You may need to identify data inconsistencies, optimize files by removing duplicates, or analyze log files for repeated entries. Linux provides multiple powerful command-line tools to accomplish this task efficiently.

Preparation

Let's create a sample text file to demonstrate the different methods. Open a terminal and create a test file ?

$ touch test.txt

Add the following content to the file using your preferred text editor ?

Hello
World
Hello
Linux
Linux

Method 1: Using uniq Command

The uniq command filters out duplicate adjacent lines and can count occurrences with the -c flag. However, uniq only works on adjacent duplicates, so the input must be sorted first for accurate results.

$ sort test.txt | uniq -c
      2 Hello
      2 Linux
      1 World

The output shows each unique line prefixed with its occurrence count. Lines appearing only once are not duplicates.

Method 2: Using awk Command

The awk command provides a more flexible approach using associative arrays to track line occurrences ?

$ awk '{count[$0]++} END {for (line in count) if (count[line] > 1) print count[line], line}' test.txt
2 Hello
2 Linux

To count only the total number of duplicate lines ?

$ awk '{seen[$0]++} END {duplicates=0; for (line in seen) if (seen[line] > 1) duplicates += seen[line]-1; print duplicates}' test.txt
2

Method 3: Using sort, uniq, and wc Commands

To count only the lines that appear more than once, combine multiple commands ?

$ sort test.txt | uniq -d | wc -l
2

The uniq -d flag displays only duplicate lines (one copy of each), and wc -l counts them.

Method 4: Advanced awk for Detailed Analysis

For more detailed duplicate analysis, use this awk command ?

$ awk '{count[$0]++} END {
    total_duplicates = 0
    unique_duplicate_lines = 0
    for (line in count) {
        if (count[line] > 1) {
            unique_duplicate_lines++
            total_duplicates += count[line] - 1
            print """ line "" appears " count[line] " times"
        }
    }
    print "Total duplicate occurrences: " total_duplicates
    print "Unique lines with duplicates: " unique_duplicate_lines
}' test.txt
"Hello" appears 2 times
"Linux" appears 2 times
Total duplicate occurrences: 2
Unique lines with duplicates: 2

Comparison of Methods

Method Advantages Best Use Case
sort | uniq -c Simple, shows all line counts Quick overview of all line frequencies
awk Flexible, programmable Complex duplicate analysis
sort | uniq -d | wc -l Returns simple count Just need total number of duplicate types

Conclusion

Linux offers multiple approaches to count duplicate lines in text files, each suited for different scenarios. The sort | uniq -c combination provides a quick overview, while awk offers maximum flexibility for complex analysis. Choose the method that best fits your specific duplicate counting needs.

Updated on: 2026-03-17T09:01:38+05:30

17K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements