Python Program to Determine the Unicode Code Point at a given index

A Unicode code point is a unique number that represents a character in the Unicode standard. Unicode supports over 130,000 characters including letters, symbols, and emojis. Python provides several methods to determine the Unicode code point at a specific index: ord() function, codecs module, unicodedata module, and array module.

What is a Unicode Code Point?

Every Unicode character has a unique numeric identifier called a code point. Code points are represented in hexadecimal notation with a "U+" prefix followed by a four or more digit hexadecimal number (e.g., U+0065 for 'e').

Method 1: Using ord() Function

The ord() function is the simplest way to get the Unicode code point of a character at a given index ?

Syntax

code_point = ord(string[index])

Example

def get_unicode_code_point(string, index):
    char = string[index]
    code_point = ord(char)
    return code_point

# Test the function
string = "Hello, World!"
index = 1
code_point = get_unicode_code_point(string, index)
print(f"Character '{string[index]}' at index {index} has Unicode code point U+{code_point:04X}")
Character 'e' at index 1 has Unicode code point U+0065

Method 2: Using codecs Module

The codecs module provides encoding/decoding functionality that can help extract Unicode code points ?

Example

import codecs

string = "Hello, World!"
index = 1
char = string[index]

# Get the code point using ord() (most straightforward with codecs)
code_point = ord(char)
print(f"Character '{char}' at index {index} has Unicode code point U+{code_point:04X}")

# Alternative: Using UTF-8 encoding
byte_string = char.encode('utf-8')
print(f"UTF-8 bytes: {byte_string}")
Character 'e' at index 1 has Unicode code point U+0065
UTF-8 bytes: b'e'

Method 3: Using unicodedata Module

The unicodedata module provides additional Unicode character information ?

Example

import unicodedata

string = "Hello, World! ?"
index = 14  # Star emoji

char = string[index]
code_point = ord(char)

try:
    name = unicodedata.name(char)
    print(f"Character '{char}' at index {index}")
    print(f"Unicode code point: U+{code_point:04X}")
    print(f"Character name: {name}")
except ValueError:
    print(f"Character '{char}' has no Unicode name")
Character '?' at index 14
Unicode code point: U+1F31F
Character name: GLOWING STAR

Method 4: Using array Module

The array module can store Unicode code points efficiently for multiple characters ?

Example

import array

string = "Hello, World!"
index = 1

# Create array of Unicode code points for all characters
code_points = array.array('I', [ord(char) for char in string])
code_point = code_points[index]

print(f"Character '{string[index]}' at index {index} has Unicode code point U+{code_point:04X}")
print(f"All code points: {[f'U+{cp:04X}' for cp in code_points[:5]]}...")
Character 'e' at index 1 has Unicode code point U+0065
All code points: ['U+0048', 'U+0065', 'U+006C', 'U+006C', 'U+006F']...

Comparison

Method Complexity Best For
ord() Simple Single character lookup
codecs module Complex Encoding/decoding operations
unicodedata module Medium Character names and properties
array module Medium Bulk operations on multiple characters

Conclusion

The ord() function is the most direct method to get Unicode code points at a given index. Use unicodedata for character names and properties, and array module for processing multiple characters efficiently.

Updated on: 2026-03-27T07:28:36+05:30

5K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements