Python - Serialization



The term "object serialization" refers to process of converting state of an object into byte stream. Once created, this byte stream can further be stored in a file or transmitted via sockets etc. On the other hand, reconstructing the object from the byte stream is called deserialization.

Python's terminology for serialization and deserialization is pickling and unpickling respectively. The pickle module available in Python's standard library provides functions for serialization (dump() and dumps()) and deserialization (load() and loads()).

The pickle module uses very Python specific data format. Hence, programs not written in Python may not be able to deserialize the encoded (pickled) data properly. Also it is not considered to be secure to unpickle data from un-authenticated source.

Pickle Protocols

Protocols are the conventions used in constructing and deconstructing Python objects to/from binary data. Currently pickle module defines 5 different protocols as listed below −

Sr.No. Protocol & Description
1

Protocol version 0

Original "human-readable" protocol backwards compatible with earlier versions.

2

Protocol version 1

Old binary format also compatible with earlier versions of Python.

3

Protocol version 2

Introduced in Python 2.3 provides efficient pickling of new-style classes.

4

Protocol version 3

Added in Python 3.0. recommended when compatibility with other Python 3 versions is required.

5

Protocol version 4

Was added in Python 3.4. It adds support for very large objects.

To know the highest and default protocol version of your Python installation, use the following constants defined in the pickle module −

>>> import pickle
>>> pickle.HIGHEST_PROTOCOL
4
>>> pickle.DEFAULT_PROTOCOL
3

The dump() and load() functions of the pickle module perform pickling and unpickling Python data. The dump() function writes pickled object to a file and load() function unpickles data from file to Python object.

dump() and load()

Following program pickle a dictionary object into a binary file.

import pickle
f=open("data.txt","wb")
dct={"name":"Ravi", "age":23, "Gender":"M","marks":75}
pickle.dump(dct,f)
f.close()

When above code is executed, the dictionary object's byte representation will be stored in data.txt file.

To unpickle or deserialize data from a binary file back to dictionary, run following program.

import pickle
f=open("data.txt","rb")
d=pickle.load(f)
print (d)
f.close()

Python console shows the dictionary object read from file.

{'age': 23, 'Gender': 'M', 'name': 'Ravi', 'marks': 75}

dumps() and loads()

The pickle module also consists of dumps() function that returns a string representation of pickled data.

>>> from pickle import dump
>>> dct={"name":"Ravi", "age":23, "Gender":"M","marks":75}
>>> dctstring=dumps(dct)
>>> dctstring
b'\x80\x03}q\x00(X\x04\x00\x00\x00nameq\x01X\x04\x00\x00\x00Raviq\x02X\x03\x00\x00\x00ageq\x03K\x17X\x06\x00\x00\x00Genderq\x04X\x01\x00\x00\x00Mq\x05X\x05\x00\x00\x00marksq\x06KKu.'

Use loads() function to unpickle the string and obtain original dictionary object.

from pickle import load
dct=loads(dctstring)
print (dct)

It will produce the following output

{'name': 'Ravi', 'age': 23, 'Gender': 'M', 'marks': 75}

Pickler Class

The pickle module also defines Pickler and Unpickler classes. Pickler class writes pickle data to file. Unpickler class reads binary data from file and constructs Python object.

To write Python object's pickled data −

from pickle import pickler
f=open("data.txt","wb")
dct={'name': 'Ravi', 'age': 23, 'Gender': 'M', 'marks': 75}
Pickler(f).dump(dct)
f.close()

Unpickler Class

To read back data by unpickling binary file −

from pickle import Unpickler
f=open("data.txt","rb")
dct=Unpickler(f).load()
print (dct)
f.close()

Objects of all Python standard data types are picklable. Moreover, objects of custom class can also be pickled and unpickled.

from pickle import *
class person:
   def __init__(self):
      self.name="XYZ"
      self.age=22
   def show(self):
      print ("name:", self.name, "age:", self.age)
p1=person()
f=open("data.txt","wb")
dump(p1,f)
f.close()
print ("unpickled")
f=open("data.txt","rb")
p1=load(f)
p1.show()

Python library also has marshal module that is used for internal serialization of Python objects.

Advertisements