Calculate an MD5 Checksum of a Directory in Linux


Introduction

During our daily use of Linux, we may want to check if there are any changes to any of the files in a directory. Or we might want to confirm that the contents of one directory are the same as those of another directory on a different location, disk, or system. In this tutorial we will learn how to compute an MD5 checksum of an entire directory tree on Linux. We will compute a single hash value of all directory contents for comparison purposes.

Get the list of all files in a directory tree

To find out the collective hash of all files in a directory tree, we first need to get a list of these files. We will use the search command for this activity.

Let's run the tree command to see our example directory structure −

├──file1.png
├──folder1
│  ├── file2.jpg
│  └── folder3
│      └── file3.txt
└──folder2
   └── file4.sh

As we can see, we have files in multiple subdirectories. Now we can use the research command with the type F topic to obtain a list of all files in our directory and their subdirectories, excluding folders and symbolic connections −

$ find . -type f
./folder2/file4.sh
./folder1/folder3/file3.txt
./folder1/file2.jpg
./file1.png

Now we can get a list of all files in a directory and its deputy director who runs a single command.

Sorting Using sorting and the "local problem"

Now that we can get a list of all our files, our next steps are −

  • Run the md5sum command on each file in that list

  • Create a string containing the list of file paths along with their hashes

  • And finally, run md5sum on this string we just created to get a single hash value

So if anything in our directory changes, including file paths or file listing, the hash will also change. But we have a problem with this approach. The search command does not sort the output by default. For the sake of efficiency, the find command simply prints the individual results it gets as it traverses the file system. So the order can change between different systems, locations, or even different runs. As a result of this, the hash value will change, even if the two directories are exactly the same.

We can fix this by sorting our search results using the sort command −

$ find . -type f | sort
./file1.png
./folder1/file2.jpg
./folder1/folder3/file3.txt
./folder2/file4.sh

But we are still missing something. The sorting operation is more complex than it seems. The letters, numbers, dates and how they are supposed to be arranged can change from place to place. This can change our results for directories that reside in two systems with different local configurations. We can solve this problem by canceling our city using the surrounding LC_All variable −

$ find . -type f | LC_ALL=C sort
./file1.png
./folder1/file2.jpg
./folder1/folder3/file3.txt
./folder2/file4.sh

By using the Cale standard for our sorting operations, we eliminate sorting problems.

Put it all Together

We can use the -exec parameter of the find command to run the md5sum command on each file found −

$ find . -type f -exec md5sum {} +
7d2186aaeed78b24f00f782f2346e5f9 ./folder2/file4.sh
d41d8cd98f00b204e9800998ecf8427e ./folder1/folder3/file3.txt
c6aa7ce9967680b77ea7e72d96949303 ./folder1/file2.jpg
46ffe26d56fe5164570ad43cc79b59d3 ./file1.png

We use curly braces ({}) to specify where “filenames” will be passed to the md5sum command as arguments. We also added the plus sign (+) to the end so that our files are passed as arguments to a single md5sum command (md5sum file1 file2 file3...) instead of running a separate md5sum process for each file. To get the final hash, we can create the string containing all file paths and corresponding hash values, then pass it to the md5sum command −

$ $ find . -type f -exec md5sum {} + | md5sum
1d0e4d4ed4e4f3c3d0d9a3900b13f3e7  -

The final hash of our directory tree is 1d0e4d4ed4e4f3c3d0d9a3900b13f3e7.

Conclusion

In this tutorial, we learned how to compute an MD5 checksum of an entire directory tree on Linux. We use the find and md5sum commands to list the files in a directory and its subdirectories, sorting the output to eliminate problems caused by different locales. We then used the “-exec” parameter of the find command to run the md5sum command on each file found and finally, creating a string containing all the file paths and their hash values ​​and passing it to the md5sum command to get the final hash of the tree of directories. This approach can be very useful in situations where we want to verify the integrity and authenticity of our files.

Updated on: 20-Jan-2023

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements