Compression using the LZMA algorithm using Python (lzma)


The Lempel–Ziv–Markov chain algorithm(LZMA) performs lossless data compression using a dictionary compression scheme featuring a higher compression ratio than other compression algorithms. Python’s lzma module consists of classes and convenience functions for compression and decompression of data with LZMA algorithm.

Although the functionality in this module is similar to that of bz2 module, the LZMAFile class is not thread safe as compared to BZ2File class.

Here again, open() function in lzma module is a very easiest way to open lzma-compressed file object.

open()

This function opens a LZMA-compressed file and returns a file object. The function requires two main parameters – file name and mode. The mode parameter is by default “rb” but can take any of following values

binary mode - "r", "rb", "w", "wb", "x", "xb", "a" or "ab"
text mode - "rt", "wt", "xt", or "at"

compress()

This function compresses given data using LZMA algorithm and returns a byte object. This function can optionally hava a format argument that decides the container format. Possible values are FORMAT_XZ (default) and FORMAT_ALONE.

decompress()

This function decompresses the data and returns uncompressed byte object.

Above functions are used in following examples. To write LZMA compressed data to file

>>> import lzma
>>> data = b"Welcome to TutorialsPoint"
>>> f = lzma.open("test.xz","wb")
>>>f.write(data)
>>>f.close()

A ‘test.xz’ file will be created in current working directory. To fetch uncompressed data from this file use following code.

>>> import lzma
>>> f = lzma.open("test.xz","rb")
>>> data = f.read()
>>> data
b'Welcome to TutorialsPoint'

To perform compression using object oriented API of lzma module, we have to use LZMAFile class

LZMAFile()

This is the constructor for LZMAFile class. It requires file and mode to be specified. The object with ‘w’ or ‘wb’ mode makes write() method available to it.

write()

This method compress given data and write it into the file underneath it.

>>> data = b'Welcome to TutorialsPoint'
>>>obj = lzma.LZMAFile("test.xz", mode="wb")
>>>obj.write(data)
>>>obj.close()

The compressed file is read and uncompressed data is rerieved by read() method of LZMAFile object created with mode=’rb’ parameter.

read()

This method reads data from compressed file and returns uncompressed data.

>>>obj = lzma.LZMAFile("test.xz", mode="rb")
>>> data=obj.read()
>>> data
b'Welcome to TutorialsPoint'

The LZMA algorithm allows writing compressed data to an already open file also. In following example, ‘test.txt’ is opened normally (using built-in open() function) in ‘wb’ mode and some text is written to it. Afterwards, the same file is used to write compressed data.

>>> f = open("test.txt","wb")
>>>f.write(b"Hello world")
>>>fp = lzma.open(f,"wb")
>>>fp.write(b"Welcome to Python")
>>>f.write(b"Thank you")
>>>f.close()
>>>fp.flush()
>>>fp.close()

When above code is executed, ‘test.txt’ appears in current directory. It contains mix of compressed and uncompressed data as below

Hello worldý7zXZ æÖ´F!t/å£Thank you

As in bz2 module, the lzma module too has incremental compressor and decompressor classes.

LZMACompressor()

This is a constructor that returns incremental compressor object. Multiple chunks can be individually compressed and their concatenated data is written to file

compress()

This method compresses given data and returns byte object

flush()

This method empties the buffer and returns a byte object.

Following example compresses a list object using incremental compressor object.

>>> data = [b'Hello World', b'How are you?', b'welcome to Python']
>>> obj = lzma.LZMACompressor()
>>> bindata = []
>>> for i in data:
bindata.append(obj.compress(i))
>>> bindata.append(obj.flush())
>>> bindata
[b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x02\x00!\x01\x16\x00\x00\x00t/\xe5\xa3', b'', b'', b"\x01\x00'Hello WorldHow are you?welcome to Python\x00\xf5\xc6\xc1d|\xf3\x8ey\x00\x01@(\xd4RJ\xe5\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZ"]

Above code builds bindata as list of compressed byte representations of each item in original list. To retrieve the uncompressed data using LZMADecompressor object, use following statement

>>> obj = lzma.LZMADecompressor()
>>> binstr = b''.join(bindata)
>>> obj.decompress(binstr)
b'Hello WorldHow are you?welcome to Python'

In this article the classes and functions in lzma module have been explained with examples.

Updated on: 26-Jun-2020

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements