How to find all the distinct file extensions in a folder hierarchy (Linux)?

LinuxOperating SystemOpen Source

While there are plenty of ways to find the extension of a particular file in Linux using different utility commands, if we need to find all the distinct file extensions in a folder hierarchy we need to first understand the uses of the find and the sed command as these commands will be used to print all the distinct file extensions in a folder or a folder hierarchy.

The two Linux utility commands that we must be aware of are −

• find − used to locate a particular file or directory

• sed − short for stream editor and is used to perform functions like searching, editing and replacing.

When we are talking about a single folder then we don’t even need the find command as we can simply do that by iterating over all the files and then making use of the sort command as well.

Let’s say that I have a directory called dir1, and I want to know the file extensions in this folder that are distinct.

For that I will type the command shown below inside that directory.

for file in *.*; do printf "%s\n" "${file##*.}"; done | sort -u Output immukul@192 dir1 % for file in *.*; do printf "%s\n" "${file##*.}"; done | sort -u
app
c
dmg
doc
docx
epub
go
h
htm
jnlp
jpeg
jpg
json
mp4
o
odt
pdf
png
srt
torrent
txt
webm
xlsx
zip

As you can notice, all the extensions listed out in the above examples are distinct. Now, if we want to list all the distinct file extensions in a file hierarchy then we need to add a recursion to the above command.

Command

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u

Output

immukul@192 dir1 % find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u
app
c
dmg
bz2
callgrind
case-hosts
cc
cfg
cgi
conf
config
contention
cov
cpu
crash
crt
css
csv
dat
debug_rnglists
demangle-expected
dep
description
Published on 29-Jul-2021 11:27:37