Count occurrences of a char in a text file

We'll learn how to use Linux commands to count occurrences of a specific character in a text file. This tutorial covers three different approaches using grep, tr, and awk commands, along with their performance characteristics.

We're assuming familiarity with common Linux commands including grep, awk, and tr. For demonstration, we'll use a sample file tpoint.txt containing:

$ cat tpoint.txt
"I Love Tpoint!!!"
"Tpoint is great!!!"

Using the grep Command

The grep command searches for specific patterns in text files. To count character occurrences, we use the -o option to output each match on a separate line, then pipe to wc -l to count lines:

$ grep -o 'e' tpoint.txt | wc -l
4

This command searches for the letter 'e' in tpoint.txt. The -o option displays each matched character on a separate line, and wc -l counts the total number of lines (occurrences).

Case-Insensitive Searching

Use the -i option for case-insensitive character counting:

$ grep -o -i 'l' tpoint.txt | wc -l
3

Multiple Input Files

The grep command can process multiple files simultaneously:

$ cat > dummy.txt
This is dummy text.
$ grep -o -i 'e' tpoint.txt dummy.txt | wc -l
5

This counts the letter 'e' across both files and returns the combined total.

Using the tr Command

The tr command performs character-based transformations. By combining the -c (complement) and -d (delete) options, we can isolate specific characters:

$ tr -c -d 'l' < tpoint.txt | wc -c
2

Key options explained:

  • -c ? Takes the complement of the specified character set

  • -d ? Deletes all characters in the specified set

When combined, -cd deletes everything except the specified character, leaving only occurrences of 'l'. The result is piped to wc -c to count characters.

Case-Insensitive Searching with tr

Include both uppercase and lowercase versions in the character set:

$ tr -cd 'lL' < tpoint.txt | wc -c
3

Using the awk Command

The awk command uses field separation to count character occurrences. It treats the target character as a field separator and counts the resulting fields:

$ awk -F 'e' '{s+=(NF-1)} END {print s}' tpoint.txt
4

This approach works by:

  • Using -F 'e' to set 'e' as the field separator

  • Counting fields (NF) on each line and subtracting 1

  • Accumulating the count across all lines

  • Printing the final sum using END

Performance Comparison

Performance differences become significant with large files. Here's a comparison using a 1.1 GB test file:

Method Real Time Performance Rank
grep 40.733s 3rd (slowest)
tr 2.542s 1st (fastest)
awk 11.080s 2nd (moderate)
$ time tr -c -d 'e' < large.txt | wc -c
82256735

real 0m2.542s
user 0m1.892s
sys 0m0.433s

The tr command significantly outperforms both grep and awk for large files, making it the optimal choice for performance-critical character counting operations.

Key Points

  • grep ? Most intuitive approach, supports regex patterns and multiple files

  • tr ? Fastest performance, ideal for simple character counting

  • awk ? Most flexible, allows complex text processing logic

  • All methods support case-insensitive searches with appropriate options

Conclusion

Linux provides multiple approaches for counting character occurrences in text files. While grep offers intuitive pattern matching and awk provides programming flexibility, the tr command delivers superior performance for large files, making it the preferred choice for high-volume character counting tasks.

Updated on: 2026-03-17T09:01:38+05:30

934 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements