Python Unicode Database


The unicodedata module is used to access all of the Unicode characters using Unicode character databases. In this database, there are character properties of all characters.

To use this modules, we need to import the unicodedata module in our code.

import unicodedata

Unicode Database Methods

Some modules of the unicodedata module are described here.

Module (unicodedata.lookup(name)) −

This method is used to lookup the characters by name. When the name is valid, it should return the character. Otherwise it will raise the KeyError.

Module (unicodedata.name(chr[, default]))−

This method is used to return the name of the given character as string. If the default value is given, it may return default, when the character is not present in the database, otherwise it will raise the ValueError.

Module (unicodedata.digit(chr[, default])) −

This method is used to return the integer digit of the given character. If the default value is given, it may return default, when the character is not present or not in correct manner in the database, otherwise it will raise the ValueError.

Module (unicodedata.category(chr)) −

This method is used to return the general category which is assigned with the character. Like for letters it will return ‘L’, for uppercase letter, it will be ‘u’, for Opening brackets, it will return Ps (Punctuation Starting) etc.

Module (unicodedata.mirrored(chr))−

This method is used to check whether the character has any mirrored character or not. Some characters have mirrored character like ‘(’ and ‘)’ etc. When it matches the mirrored character, it will return 1, otherwise 0.

Example Code

import unicodedata as ud
print(ud.lookup('ASTERISK'))
print(ud.lookup('Latin Capital letter G'))

#The Unicode name from the characters
print(ud.name(u'x'))
print(ud.name(u'°'))

#The Unicode character to decimal and numerics
print(ud.decimal(u'6'))
print(ud.numeric(u'9'))

#The Unicode character categoty
print(ud.category(u'A'))
print(ud.category(u'9'))
print(ud.category(u'[')) #Punctuation Start

#Unicode character to check whether mirrored or not
print(ud.mirrored(u'A'))
print(ud.mirrored(u'<'))

Output

*
G
LATIN SMALL LETTER X
DEGREE SIGN
6
9.0
Lu
Nd
Ps
0
1

Updated on: 30-Jul-2019

285 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements