Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Replacing strings with numbers in Python for Data Analysis
In data analysis, converting categorical strings to numerical values is essential for machine learning algorithms and statistical analysis. Python provides several methods to map string values to integers efficiently.
Consider this sample dataset with stock recommendations ?
| Company | Industry | Recommendation |
|---|---|---|
| HDFC Bank | Finance | Hold |
| Apollo | Healthcare | Buy |
| Hero | Automobile | Underperform |
| Yes Bank | Finance | Hold |
| M&M | Automobile | Underperform |
| Fortis | Healthcare | Buy |
We need to convert the Recommendation column to numerical values: Buy=1, Hold=2, Underperform=3.
Method 1: Using Dictionary Mapping with List Comprehension
Create a mapping dictionary and apply it using list comprehension ?
import pandas as pd
# Sample data
data = {
'Company': ['HDFC Bank', 'Apollo', 'Hero', 'Yes Bank', 'M&M', 'Fortis'],
'Industry': ['Finance', 'Healthcare', 'Automobile', 'Finance', 'Automobile', 'Healthcare'],
'Recommendation': ['Hold', 'Buy', 'Underperform', 'Hold', 'Underperform', 'Buy']
}
dataframe = pd.DataFrame(data)
# Create mapping dictionary
recommendation_map = {'Buy': 1, 'Hold': 2, 'Underperform': 3}
# Apply mapping using list comprehension
dataframe['Recommendation'] = [recommendation_map[item] for item in dataframe['Recommendation']]
print(dataframe)
Company Industry Recommendation
0 HDFC Bank Finance 2
1 Apollo Healthcare 1
2 Hero Automobile 3
3 Yes Bank Finance 2
4 M&M Automobile 3
5 Fortis Healthcare 1
Method 2: Using Pandas map() Function
The map() method is more efficient for large datasets ?
import pandas as pd
# Sample data
data = {
'Company': ['HDFC Bank', 'Apollo', 'Hero', 'Yes Bank', 'M&M', 'Fortis'],
'Industry': ['Finance', 'Healthcare', 'Automobile', 'Finance', 'Automobile', 'Healthcare'],
'Recommendation': ['Hold', 'Buy', 'Underperform', 'Hold', 'Underperform', 'Buy']
}
dataframe = pd.DataFrame(data)
# Create mapping dictionary
recommendation_map = {'Buy': 1, 'Hold': 2, 'Underperform': 3}
# Apply mapping using map() function
dataframe['Recommendation'] = dataframe['Recommendation'].map(recommendation_map)
print(dataframe)
Company Industry Recommendation
0 HDFC Bank Finance 2
1 Apollo Healthcare 1
2 Hero Automobile 3
3 Yes Bank Finance 2
4 M&M Automobile 3
5 Fortis Healthcare 1
Method 3: Using Conditional Assignment
Directly assign values based on conditions ?
import pandas as pd
# Sample data
data = {
'Company': ['HDFC Bank', 'Apollo', 'Hero', 'Yes Bank', 'M&M', 'Fortis'],
'Industry': ['Finance', 'Healthcare', 'Automobile', 'Finance', 'Automobile', 'Healthcare'],
'Recommendation': ['Hold', 'Buy', 'Underperform', 'Hold', 'Underperform', 'Buy']
}
dataframe = pd.DataFrame(data)
# Apply conditional assignments
dataframe.loc[dataframe['Recommendation'] == 'Buy', 'Recommendation'] = 1
dataframe.loc[dataframe['Recommendation'] == 'Hold', 'Recommendation'] = 2
dataframe.loc[dataframe['Recommendation'] == 'Underperform', 'Recommendation'] = 3
print(dataframe)
Company Industry Recommendation
0 HDFC Bank Finance 2
1 Apollo Healthcare 1
2 Hero Automobile 3
3 Yes Bank Finance 2
4 M&M Automobile 3
5 Fortis Healthcare 1
Comparison
| Method | Performance | Best For |
|---|---|---|
| Dictionary + List Comprehension | Good | Small datasets, clear mapping |
| map() function | Excellent | Large datasets, efficient mapping |
| Conditional Assignment | Fair | Few categories, complex conditions |
Conclusion
Use map() function for efficient string-to-number conversion in data analysis. Dictionary mapping provides clear, readable code while conditional assignment works best for complex transformations.
