File and Directory Comparisons in Python


Python’s standard library has filecmp module that defines functions for comparison of files and directories. This comparison takes into consideration the properties of files in addition to data in them.

Example codes in this article use following file and directory structure.

Two directories dir1 and dir2 are first created under current working directory. They contain following files.

--dir1/newfile.txt--
This is a file in dir1
--dir1/file1.txt--
Hello Python
--dir1/file2.txt--
Python Standard Library
--dir2/file1.txt--
Hello Python
--dir2/file2.txt--
Python Library

Let us now describe various comparison functions in filecmp module.

filecmp.cmp(f1, f2, shallow=True)

This function compares the two files and returns True if they are identical, False otherwise. The shallow parameter is True by default. Hence the file metadata is considered for comparison in addition to contents. If shallow is set to False, only the contents are compared.

Based on our file structure, following code yields the output as shown −

Differing files : ['file2.txt']
>>> filecmp.cmp('dir1/file1.txt', 'dir2/file1.txt')
True
>>> filecmp.cmp('dir1/file1.txt', 'dir2/file1.txt', shallow = False)
True
>>> filecmp.cmp('dir1/file2.txt', 'dir2/file2.txt')
False

filecmp.cmpfiles(dir1, dir2, shallow)

This function makes comparison of files in two directories and returns a three item tuple. First item in the tuple is list of matched files, second shows list of unmatched files, and third one is the list of common files.

>>> match, mismatch,errors = filecmp.cmpfiles('dir1','dir2',['file1.txt', 'file2.txt'])
>>> match
['file1.txt']
>>> mismatch
['file2.txt']
>>> errors
[]

The filecmp module also defines dircmp class. Its object is directory comparison object. It compares files in two directories, identified as left and right directories. The object can execute various methods as described below −

filecmp.dircmp(a,b)

This is the constructor. a and b are directories to be compared. By default system files in the directories are hidden and ignored in comparison.

>>> result = filecmp.dircmp('dir1', 'dir2')

Other methods in dircmp class are as follows −

report()

This method prints result of comparison between directories.

>>> result = filecmp.dircmp('dir1', 'dir2')
>>> result.report()
diff dir1 dir2
Only in dir1 : ['newfile.txt']
Identical files : ['file1.txt']
Differing files : ['file2.txt']

left, right

These properties print names of first and second directories in dircmp constructor

>>> result.left
'dir1'
>>> result.right
'dir2'

left_list, right_list

These attributes return list of files in both directories

>>> result.left_list
['file1.txt', 'file2.txt', 'newfile.txt']
>>> result.right_list
['file1.txt', 'file2.txt']

common, common_files, common_dirs

These attributes return common files and directories, common files only and common directories only.

>>> result.common
['file1.txt', 'file2.txt']
>>> result.common_files
['file1.txt', 'file2.txt']
>>> result.common_dirs
[]

same_file, diff_files

The attributes return list of identical files and different filesusing comparison operator defined in dircmp class.

>>> result.same_files
['file1.txt']
>>> result.diff_files
['file2.txt']

This article discusses dircmp class, its methods and file comparison functions defined in filecmp module.

Updated on: 25-Jun-2020

5K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements