Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Fetch only capital words from DataFrame in Pandas
In Pandas, you can extract only capital words from a DataFrame using regular expressions. The re module provides pattern matching capabilities to identify words containing uppercase letters.
Setting Up the Data
First, let's create a sample DataFrame with mixed case words ?
import re
import pandas as pd
# Create sample data with mixed case words
data = [['computer', 'mobile phone', 'ELECTRONICS', 'electronics'],
['KEYBOARD', 'charger', 'SMARTTV', 'camera']]
df = pd.DataFrame(data, columns=['Col1', 'Col2', 'Col3', 'Col4'])
print("Original DataFrame:")
print(df)
Original DataFrame:
Col1 Col2 Col3 Col4
0 computer mobile phone ELECTRONICS electronics
1 KEYBOARD charger SMARTTV camera
Method 1: Using Nested Loops with Regex
This approach iterates through each cell and checks if it contains uppercase letters ?
import re
import pandas as pd
data = [['computer', 'mobile phone', 'ELECTRONICS', 'electronics'],
['KEYBOARD', 'charger', 'SMARTTV', 'camera']]
df = pd.DataFrame(data)
print("Capital words found:")
for i in range(df.shape[1]):
for element in df[i]:
if bool(re.match(r'\w*[A-Z]\w*', str(element))):
print(element)
Capital words found: KEYBOARD ELECTRONICS SMARTTV
Method 2: Using applymap() with Regex
A more Pandas-friendly approach using applymap() to apply the regex pattern ?
import re
import pandas as pd
data = [['computer', 'mobile phone', 'ELECTRONICS', 'electronics'],
['KEYBOARD', 'charger', 'SMARTTV', 'camera']]
df = pd.DataFrame(data)
# Create boolean mask for capital words
mask = df.applymap(lambda x: bool(re.match(r'\w*[A-Z]\w*', str(x))))
# Extract capital words using the mask
capital_words = df[mask].stack().dropna().tolist()
print("Capital words:", capital_words)
Capital words: ['KEYBOARD', 'ELECTRONICS', 'SMARTTV']
Understanding the Regex Pattern
The pattern r'\w*[A-Z]\w*' breaks down as follows:
-
\w*- Matches zero or more word characters before -
[A-Z]- Matches any uppercase letter -
\w*- Matches zero or more word characters after
Comparison of Methods
| Method | Readability | Performance | Best For |
|---|---|---|---|
| Nested Loops | Simple | Slower | Small DataFrames |
| applymap() | More Complex | Faster | Large DataFrames |
Conclusion
Use nested loops for simple cases and immediate output. For better performance with large DataFrames, use applymap() with regex patterns to efficiently filter capital words.
Advertisements
