Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Convert String to Integer in Pandas DataFrame?
Converting string data to integer data in Pandas DataFrames is a common task in data analysis. When working with datasets, numeric columns are often imported as strings, requiring conversion for mathematical operations and analysis.
In this tutorial, we'll explore two effective methods for converting string columns to integers in Pandas DataFrames: astype() and to_numeric().
Using the astype() Function
The astype() function is the most straightforward method for converting data types in Pandas. It directly changes the data type of a column to the specified type.
Example
import pandas as pd
# Creating sample DataFrame
df = pd.DataFrame({'Name': ['Prince', 'Mukul', 'Divyang', 'Rohit'],
'Age': ['25', '30', '35', '40'],
'Salary': ['50000', '60000', '70000', '80000']})
print("Original DataFrame:")
print(df.dtypes)
print(df)
# Converting Age column to integer using astype()
df['Age'] = df['Age'].astype(int)
print("\nAfter conversion:")
print(df.dtypes)
print(df)
Original DataFrame:
Name object
Age object
Salary object
dtype: object
Name Age Salary
0 Prince 25 50000
1 Mukul 30 60000
2 Divyang 35 70000
3 Rohit 40 80000
After conversion:
Name object
Age int64
Salary object
dtype: object
Name Age Salary
0 Prince 25 50000
1 Mukul 30 60000
2 Divyang 35 70000
3 Rohit 40 80000
Important: The astype() method will raise a ValueError if the string values cannot be converted to integers or contain missing values.
Using the to_numeric() Function
The pd.to_numeric() function provides more flexibility and error handling options when converting strings to numeric types.
Example
import pandas as pd
# Creating DataFrame with potentially problematic data
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': ['25', '30', 'unknown', '40'],
'Score': ['85', '92', '78', '']})
print("Original DataFrame:")
print(df.dtypes)
print(df)
# Converting with error handling
df['Age_numeric'] = pd.to_numeric(df['Age'], errors='coerce')
df['Score_numeric'] = pd.to_numeric(df['Score'], errors='coerce')
print("\nAfter conversion with error handling:")
print(df.dtypes)
print(df)
Original DataFrame:
Name object
Age object
Score object
dtype: object
Name Age Score
0 Alice 25 85
1 Bob 30 92
2 Charlie unknown 78
3 Diana 40
After conversion with error handling:
Name object
Age object
Score object
Age_numeric float64
Score_numeric float64
dtype: object
Name Age Score Age_numeric Score_numeric
0 Alice 25 85 25.0 85.0
1 Bob 30 92 30.0 92.0
2 Charlie unknown 78 NaN 78.0
3 Diana 40 40.0 NaN
Comparison of Methods
| Method | Error Handling | Best For | Output Type |
|---|---|---|---|
astype() |
Raises ValueError | Clean data with valid integers | Exact type specified |
to_numeric() |
Flexible (coerce, raise, ignore) | Data with potential invalid values | Float64 by default |
Converting to Integer After to_numeric()
Since to_numeric() returns float64 by default, you can convert to integer if no NaN values exist ?
import pandas as pd
df = pd.DataFrame({'Values': ['10', '20', '30', '40']})
# Convert to numeric then to integer
df['Values'] = pd.to_numeric(df['Values']).astype(int)
print(df.dtypes)
print(df)
Values int64 dtype: object Values 0 10 1 20 2 30 3 40
Conclusion
Use astype(int) for clean string data that you're confident contains only valid integers. Use pd.to_numeric() with errors='coerce' for data that may contain invalid values, as it provides better error handling and converts problematic values to NaN.
