How to compare files in Python


Problem.

You need to compare files in Python.

Solution..

The filecmp module in python can be used to compare files and directories. 1.

cmp(file1, file2[, shallow])

filecmp Compares the files file1 and file2 and returns True if identical, False if not. By default, files that have identical attributes as returned by os.stat() are considered to be equal. If shallow is not provided (or is True), files that have the same stat signature are considered equal.

cmpfiles(dir1, dir2, common[, shallow])

Compares the contents of the files contained in the list common in the two directories dir1 and dir2. cmpfiles returns a tuple containing three lists - match, mismatch, errors of filenames.

  • match - lists the files that are the same in both directories.

  • mismatch - lists the files that dont match.

  • errors - lists the files that could not be compared for some reason.

dircmp(dir1, dir2 [, ignore[, hide]])

Creates a directory comparison object that can be used to perform various comparison operations on the directories dir1 and dir2.

  • ignore - ignores a list of filenames to ignore, default value of ['RCS','CVS','tags'].

  • hide - list of filenames to hide, defaults list [os.curdir, os.pardir] (['.', '..'] on UNIX.

Instances of filecmp.dircmp implement the following methods that print elaborated reports to sys.stdout:

  • report() : Prints a comparison between the two directories.

  • report_partial_closure() : Prints a comparison of the two directories as well as of the immediate subdirectories of the two directories.

  • report_full_closure() :Prints a comparison of the two directories, all of their subdirectories, all the subdirectories of those subdirectories, and so on (i.e., recursively).

  • left_list: files and subdirectories found in directory path1, not including elements of hidelist.

  • right_list: files and subdirectories found in directory path2, not including elements of hidelist.

  • common: files and subdirectories that are in both directory path1 and directory path2.

  • left_only: files and subdirectories that are in directory path1 only.

  • right_only: files and subdirectories that are in directory path2 only.

  • common_dirs: subdirectories that are in both directory path1 and directory path2.

  • common_files: files that are in both directory path1 and directory path2.

  • same_files: Paths to files whose contents are identical in both directory path1 and directory path2.

  • diff_files: Paths to files that are in both directory path1 and directory path2 but whose contents differ.

  • funny_files: paths to files that are in both directory path1 and directory path2 but could not be compared for some reason.

  • subdirs: A dictionary that maps names in common_dirs to dircmp objects.

Preparing test data for comparsion.

import os
# prepare test data
def makefile(filename,text=None):
"""
Function: make some files
params : input file, body
"""

with open(filename, 'w') as f:
f.write(text or filename)

return

# prepare test data
def makedirectory(directory_name):
"""
Function: make directories
params : input directory
"""
if not os.path.exists(directory_name):
os.mkdir(directory_name)


# Get current working directory
present_directory = os.getcwd()

# change to directory provided
os.chdir(directory_name)

# Make two directories
os.mkdir('dir1')
os.mkdir('dir2')

# Make two same subdirectories
os.mkdir('dir1/common_dir')
os.mkdir('dir2/common_dir')

# Make two different subdirectories
os.mkdir('dir1/dir_only_in_dir1')
os.mkdir('dir2/dir_only_in_dir2')

# Make a unqiue file one each in directory
makefile('dir1/file_only_in_dir1')
makefile('dir2/file_only_in_dir2')

# Make a unqiue file one each in directory
makefile('dir1/common_file', 'Hello, Writing Same Content')
makefile('dir2/common_file', 'Hello, Writing Same Content')

# Make a non unqiue file one each in directory
makefile('dir1/not_the_same')
makefile('dir2/not_the_same')

makefile('dir1/file_in_dir1', 'This is a file in dir1')

os.mkdir('dir2/file_in_dir1')

os.chdir(present_directory)

return

if __name__ == '__main__':
os.chdir(os.getcwd())
makedirectory('example')
makedirectory('example/dir1/common_dir')
makedirectory('example/dir2/common_dir')
  • filecmp example Running filecmp example. The shallow argument tells cmp() whether to look at the contents of the file, in addition to its metadata.

The default is to perform a shallow comparison using the information available from os.stat(). If the results are the same, the files are considered the same. Thus, files of the same size that were created at the same time are reported as the same, even if their contents differ.

When shallow is False, the contents of the file are always compared.

import filecmp

print('Output \n *** Common File :', end=' ')

print(filecmp.cmp('example/dir1/common_file',
'example/dir2/common_file'), end=' ')

print(filecmp.cmp('example/dir1/common_file',
'example/dir2/common_file', shallow=False))

print(' *** Different Files :', end=' ')

print(filecmp.cmp('example/dir1/not_the_same',
'example/dir2/not_the_same'), end=' ')

print(filecmp.cmp('example/dir1/not_the_same',
'example/dir2/not_the_same', shallow=False))

print(' *** Identical Files :', end=' ')

print(filecmp.cmp('example/dir1/file_only_in_dir1',
'example/dir1/file_only_in_dir1'), end=' ')

print(filecmp.cmp('example/dir1/file_only_in_dir1',
'example/dir1/file_only_in_dir1', shallow=False))

Output

*** Common File : True True
*** Different Files : False False
*** Identical Files : True True
  • cmpfiles Example:

Use cmpfiles() to compare a set of files in two directories without recursing.

import filecmp

import os

# Determine the items that exist in both directories.
dir1_contents = set(os.listdir('example/dir1'))
dir2_contents = set(os.listdir('example/dir2'))
common = list(dir1_contents & dir2_contents)

common_files = [f for f in common if os.path.isfile(os.path.join('example/dir1', f))]

print(f' *** Common files are : {common_files}')

# Now, let us compare the directories
match, mismatch, errors = filecmp.cmpfiles(
'example/dir1',
'example/dir2',
common_files,)

print(f' *** Matched files are : {match}')
print(f' *** mismatch files are : {mismatch}')
print(f' *** errors files are : {errors}')
*** Common files are : ['file_in_dir1', 'not_the_same', 'common_file']
*** Matched files are : ['common_file']
*** mismatch files are : ['file_in_dir1', 'not_the_same']
*** errors files are : []

7. Comparing directories.

import filecmp
dc = filecmp.dircmp('example/dir1', 'example/dir2')
print(f"output \n *** Printing detaile report: \n ")
print(dc.report())
print(f"\n")
print(dc.report_full_closure())

Output

*** Printing detaile report:

diff example/dir1 example/dir2
Only in example/dir1 : ['dir_only_in_dir1', 'file_only_in_dir1']
Only in example/dir2 : ['dir_only_in_dir2', 'file_only_in_dir2']
Identical files : ['common_file']
Differing files : ['not_the_same']
Common subdirectories : ['common_dir']
Common funny cases : ['file_in_dir1']
None

diff example/dir1 example/dir2
Only in example/dir1 : ['dir_only_in_dir1', 'file_only_in_dir1']
Only in example/dir2 : ['dir_only_in_dir2', 'file_only_in_dir2']
Identical files : ['common_file']
Differing files : ['not_the_same']
Common subdirectories : ['common_dir']
Common funny cases : ['file_in_dir1']

diff example/dir1\common_dir example/dir2\common_dir
Common subdirectories : ['dir1', 'dir2']

diff example/dir1\common_dir\dir1 example/dir2\common_dir\dir1
Identical files : ['common_file', 'file_in_dir1', 'file_only_in_dir1', 'not_the_same']
Common subdirectories : ['common_dir', 'dir_only_in_dir1']

diff example/dir1\common_dir\dir1\common_dir example/dir2\common_dir\dir1\common_dir

diff example/dir1\common_dir\dir1\dir_only_in_dir1 example/dir2\common_dir\dir1\dir_only_in_dir1

diff example/dir1\common_dir\dir2 example/dir2\common_dir\dir2
Identical files : ['common_file', 'file_only_in_dir2', 'not_the_same']
Common subdirectories : ['common_dir', 'dir_only_in_dir2', 'file_in_dir1']

diff example/dir1\common_dir\dir2\common_dir example/dir2\common_dir\dir2\common_dir

diff example/dir1\common_dir\dir2\dir_only_in_dir2 example/dir2\common_dir\dir2\dir_only_in_dir2

diff example/dir1\common_dir\dir2\file_in_dir1 example/dir2\common_dir\dir2\file_in_dir1
None

You can further try all the commands mentioned in Point1 to see how each method behaves.

Updated on: 09-Nov-2020

8K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements