Python Support for gzip files (gzip)


GZip application is used for compression and decompression of files. It is a part of GNU project. Python’s gzip module is the interface to GZip application. The gzip data compression algorithm itself is based on zlib module.

The gzip module contains definition of GzipFile class along with its methods. It also caontains convenience function open(), compress() and decompress().

Easiest way to achieve compression and decompression is by using above mentioned functions.

open()

This function opens a gzip-compressed file in binary or text mode and returns a file like object, which may be physical file, a string or byte object. By default, the file is opened in ‘rb’ mode i.e. reading binary data, however, the mode parameter to this function can take other modes as listed below.

binary mode: 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x', 'xb'
text mode : 'rt', 'at', 'wt', or 'xt'

This function also defines compression level whose acceptable value is between 0 to 9. When the file is opened in text mode, the GzipFile object is wrapped in TextIOWrapper object.

compress()

This function applies compression on the data given to it as argument and returns compressed byte object. By default compression level is 9.

decompress()

This function decompresses the byte object and returns uncompressed data.

Following example creates a gzip file by writing compressed data in it.

>>> import gzip
>>> data = b'Python - Batteries included'
>>> with gzip.open("test.txt.gz", "wb") as f:
f.write(data)

This will create “test.txt.gz” file in current directory. This gzip archive contains “test.txt” which you can verify using any unzipping utility.

To programmatically read this compressed file

>>> with gzip.open("test.txt.gz", "rb") as f:
data = f.read()
>>> data
b'Python - Batteries included'

To compress an existing file to a gzip archive, read text in it and convert it to a bytearray. This bytearray object is then written to a gzip file. In the example below, ‘zen.txt’ file is assumed to be present in current directory.

fp = open("zen.txt","rb")
>>> data = fp.read()
>>> bindata = bytearray(data)
>>> with gzip.open("zen.txt.gz", "wb") as f:
f.write(bindata)

To retrieve the uncompressed file from gzip archive

>>> fp = open("zen1.txt", "wb")
>>> with gzip.open("zen.txt.gz", "rb") as f:
bindata = f.read()
>>> fp.write(bindata)
>>> fp.close()

Above code will create ‘zen1.txt’ in current directory which contain same data as in ‘zen.txt’

In addition to these convenience functions, gzip module also has GzipFile class which defines the compress() and decompress() methods. The constructor of this class takes file, mode and compressionlevel arguments exactly with same meaning as above.

When mode parameter is given as ‘w’ or ‘wb’ or ‘wt’, the GipFile object will provide write() method to compress the given data and write to a gzip file.

>>> f = gzip.GzipFile("testnew.txt.gz","wb")
>>> data = b'Python - Batteries included'
>>> f.write(data)
>>> f.close()

This will create a testnew.txt.gz file. You can unzip it using any utility to see that it contains testnew.txt with ‘Python – Batteries included’ text in it.

To uncompress the gzip file using GzipFile object,create it with ‘rb’ value to mode parameter and read the uncompressed data by read() method

>>> f = gzip.GzipFile("testnew.txt.gz","rb")
>>> data = f.read()
>>> data
b'Python - Batteries included'

In this article we learned how gzip library can be implemented by Python’s gzip module.

Updated on: 25-Jun-2020

11K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements