Fetch only capital words from DataFrame in Pandas

In Pandas, you can extract only capital words from a DataFrame using regular expressions. The re module provides pattern matching capabilities to identify words containing uppercase letters.

Setting Up the Data

First, let's create a sample DataFrame with mixed case words ?

import re
import pandas as pd

# Create sample data with mixed case words
data = [['computer', 'mobile phone', 'ELECTRONICS', 'electronics'],
        ['KEYBOARD', 'charger', 'SMARTTV', 'camera']]

df = pd.DataFrame(data, columns=['Col1', 'Col2', 'Col3', 'Col4'])
print("Original DataFrame:")
print(df)
Original DataFrame:
        Col1         Col2         Col3         Col4
0   computer  mobile phone  ELECTRONICS  electronics
1   KEYBOARD      charger      SMARTTV       camera

Method 1: Using Nested Loops with Regex

This approach iterates through each cell and checks if it contains uppercase letters ?

import re
import pandas as pd

data = [['computer', 'mobile phone', 'ELECTRONICS', 'electronics'],
        ['KEYBOARD', 'charger', 'SMARTTV', 'camera']]

df = pd.DataFrame(data)

print("Capital words found:")
for i in range(df.shape[1]):
    for element in df[i]:
        if bool(re.match(r'\w*[A-Z]\w*', str(element))):
            print(element)
Capital words found:
KEYBOARD
ELECTRONICS
SMARTTV

Method 2: Using applymap() with Regex

A more Pandas-friendly approach using applymap() to apply the regex pattern ?

import re
import pandas as pd

data = [['computer', 'mobile phone', 'ELECTRONICS', 'electronics'],
        ['KEYBOARD', 'charger', 'SMARTTV', 'camera']]

df = pd.DataFrame(data)

# Create boolean mask for capital words
mask = df.applymap(lambda x: bool(re.match(r'\w*[A-Z]\w*', str(x))))

# Extract capital words using the mask
capital_words = df[mask].stack().dropna().tolist()
print("Capital words:", capital_words)
Capital words: ['KEYBOARD', 'ELECTRONICS', 'SMARTTV']

Understanding the Regex Pattern

The pattern r'\w*[A-Z]\w*' breaks down as follows:

  • \w* - Matches zero or more word characters before
  • [A-Z] - Matches any uppercase letter
  • \w* - Matches zero or more word characters after

Comparison of Methods

Method Readability Performance Best For
Nested Loops Simple Slower Small DataFrames
applymap() More Complex Faster Large DataFrames

Conclusion

Use nested loops for simple cases and immediate output. For better performance with large DataFrames, use applymap() with regex patterns to efficiently filter capital words.

Updated on: 2026-03-26T02:57:33+05:30

224 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements