Read and write tar archive files using Python (tarfile)


The ‘tar’ utility was originally introduced for UNIX operating system. Its purpose is to collect multiple files in a single archive file often called tarball which makes it easy to distribute the files. Functions in tarfile module of Python’s standard library help in creating tar archives and extracting from the tarball as required. The archives can be constructed with gzip, bz2 and lzma compressions or without any compression at all.

Main function defined in this module is main() using which writing to tar file or reading from it is accomplished.

Open()

This function returns a TarFile object corresponding to file name which is provided to it as parameter. The function requires another parameter called mode, which by default is ‘r’ indicating no compression. Other modes are listed below

Sr.No.Mode & action
1'r' or 'r:*'
Open for reading with transparent compression.
2'r:'
Open for reading without compression.
3'r:gz'
Open for reading with gzip compression.
4'r:bz2'
Open for reading with bzip2 compression.
5'r:xz'
Open for reading with lzma compression.
6'x' or 'x:'
Create a tarfile exclusively without compression.
7'x:gz'
Create a tarfile with gzip compression.
8'x:bz2'
Create a tarfile with bzip2 compression.
9'x:xz'
Create a tarfile with lzma compression.
10'a' or 'a:'
Open for appending with no compression.
11'w' or 'w:'
Open for uncompressed writing.
12'w:gz'
Open for gzip compressed writing.
13'w:bz2'
Open for bzip2 compressed writing.
14'w:xz'
Open for lzma compressed writing.

The module defines TarFile class. Instead of open() function, TarFile object can be instantiated by calling constructor.

TarFile()

This constructor also needs a file name and mode parameter. Possible values of mode parameter are as above.

Other methods in this class are as follows

add()

This method adds a file to the archive. The method needs a name which can be name of file, directory, symbolic link,shortcut etc. Directories are recursively added by default. To prevent recursive addition set recursive parameter to False.

addfile()

This method adds TarInfo object to the archive.

extractall()

This method extracts all members of archive into current path if any other path is not explicitly provided.

extract()

This method extracts specified member to given path, default is current path.

Following example opens a tar file for compression with gzip algorithm and adds a file in it.

>>> fp = tarfile.open("zen.tar.gz","w:gz")
>>> fp.add("zen.txt")
>>> fp.close()

Assuming that ‘zen.txt’ file is present in current working directory, it will be added in ‘zen.tar.gz’ file.

Following code extracts the files from the tar archive and extracts all files (in this case there is only on) and puts them in current folder. To verify the result, you may delete or rename ‘zen.txt’ in current folder.

>>> fp = tarfile.open("zen.tar.gz","r:gz")
>>> fp.extractall()
>>> fp.close()

You will find that ‘zen.txt’ file will appear in the current directory.

To create a tar consisting of all files in current directory, use following code

import tarfile, glob
>>> fp=tarfile.open('file.tar','w')
>>> for file in glob.glob('*.*'):
fp.add(file)
>>> fp.close()

Command line interface

Creation and extraction of tar files can be achieved through command line interface. For example ‘lines.txt’ file is added in a tar file by following command executed in command window

C:\python36 >python -m tarfile -c line.tar lines.txt

Following command line options can be used.

-l or --listList files in a tarfile.
-c or --createCreate tarfile from source files.
-e or --extractExtract tarfile into the current directory if output_dir is not specified.
-t or --testTest whether the tarfile is valid or not.
-v or --verboseVerbose output.

Following command line will extract line.tar in newdir folder under current directory.

C:\python36>python -m tarfile -e line.tar newdir/

Following command line will list all files in the tar archive.

C:\python36>python -m tarfile -l files.tar

This article on tarfile module explained classes and functions defined in it.

Updated on: 26-Jun-2020

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements