What is the difference between encode/decode in Python?



To represent a unicode string as a string of bytes is known as encoding. To convert a string of bytes to a unicode string is known as decoding. You typically encode a unicode string whenever you need to use it for IO, for instance transfer it over the network, or save it to a disk file. You typically decode a string of bytes whenever you receive string data from the network or from a disk file.

 To encode a string using a given encoding you can do the following:

 >>>u'æøå'.encode('utf8')
'\xc3\x83\xc2\xa6\xc3\x83\xc2\xb8\xc3\x83\xc2\xa5'

To decode astring(using the same encoding used to encode it), you need to call decode(encoding). For example:

>>>'\xc3\x83\xc2\xa6\xc3\x83\xc2\xb8\xc3\x83\xc2\xa5'.decode('utf8')
u'\xc3\xa6\xc3\xb8\xc3\xa5'

This string in utf8 encoding is equivalent to u'æøå'


Advertisements