Article Categories

Selected Reading

Write a Python program to separate a series of alphabets and digits and convert them to a dataframe

Python Pandas Server Side Programming Programming

When working with mixed alphanumeric data in Pandas, you often need to separate alphabetic and numeric parts into different columns. This is commonly done using the str.extract() method with regular expressions.

Problem Statement

Given a Pandas Series containing strings with both letters and digits, we need to separate them into two columns in a DataFrame ?

Original Series:
0    abx123
1    bcd25
2    cxy30
dtype: object

Expected DataFrame:
    0    1
0  abx  123
1  bcd   25
2  cxy   30

Solution Using str.extract()

The str.extract() method uses regular expressions with capturing groups to extract parts of strings. Each group in parentheses becomes a separate column ?

import pandas as pd

# Create a series with mixed alphanumeric data
series = pd.Series(['abx123', 'bcd25', 'cxy30'])
print("Original series:")
print(series)

# Extract alphabets and digits using regex
df = series.str.extract(r'([a-z]+)(\d+)')
print("\nDataFrame after extraction:")
print(df)

Original series:
0    abx123
1     bcd25
2     cxy30
dtype: object

DataFrame after extraction:
     0    1
0  abx  123
1  bcd   25
2  cxy   30

Understanding the Regular Expression

The pattern r'([a-z]+)(\d+)' consists of two capturing groups ?

([a-z]+) - Captures one or more lowercase letters
(\d+) - Captures one or more digits

Adding Column Names

You can assign meaningful column names to make the DataFrame more readable ?

import pandas as pd

series = pd.Series(['abx123', 'bcd25', 'cxy30'])

# Extract with column names
df = series.str.extract(r'([a-z]+)(\d+)', expand=True)
df.columns = ['Letters', 'Numbers']

print("DataFrame with column names:")
print(df)

DataFrame with column names:
  Letters Numbers
0     abx     123
1     bcd      25
2     cxy      30

Alternative Approach

For more complex patterns, you can use named groups in the regex ?

import pandas as pd

series = pd.Series(['abx123', 'bcd25', 'cxy30'])

# Using named groups
df = series.str.extract(r'(?P<text>[a-z]+)(?P<number>\d+)')
print("DataFrame with named groups:")
print(df)

DataFrame with named groups:
  text number
0  abx    123
1  bcd     25
2  cxy     30

Conclusion

Use str.extract() with regex capturing groups to separate mixed alphanumeric data into DataFrame columns. Named groups provide more descriptive column names automatically.

Vani Nalliappan

Updated on: 2026-03-25T16:19:45+05:30

250 Views

Previous Next