How to Convert String to Integer in Pandas DataFrame?

Converting string data to integer data in Pandas DataFrames is a common task in data analysis. When working with datasets, numeric columns are often imported as strings, requiring conversion for mathematical operations and analysis.

In this tutorial, we'll explore two effective methods for converting string columns to integers in Pandas DataFrames: astype() and to_numeric().

Using the astype() Function

The astype() function is the most straightforward method for converting data types in Pandas. It directly changes the data type of a column to the specified type.

Example

import pandas as pd

# Creating sample DataFrame
df = pd.DataFrame({'Name': ['Prince', 'Mukul', 'Divyang', 'Rohit'],
                   'Age': ['25', '30', '35', '40'],
                   'Salary': ['50000', '60000', '70000', '80000']})

print("Original DataFrame:")
print(df.dtypes)
print(df)

# Converting Age column to integer using astype()
df['Age'] = df['Age'].astype(int)

print("\nAfter conversion:")
print(df.dtypes)
print(df)
Original DataFrame:
Name      object
Age       object
Salary    object
dtype: object
      Name Age Salary
0   Prince  25  50000
1    Mukul  30  60000
2  Divyang  35  70000
3    Rohit  40  80000

After conversion:
Name       object
Age         int64
Salary     object
dtype: object
      Name  Age Salary
0   Prince   25  50000
1    Mukul   30  60000
2  Divyang   35  70000
3    Rohit   40  80000

Important: The astype() method will raise a ValueError if the string values cannot be converted to integers or contain missing values.

Using the to_numeric() Function

The pd.to_numeric() function provides more flexibility and error handling options when converting strings to numeric types.

Example

import pandas as pd

# Creating DataFrame with potentially problematic data
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
                   'Age': ['25', '30', 'unknown', '40'],
                   'Score': ['85', '92', '78', '']})

print("Original DataFrame:")
print(df.dtypes)
print(df)

# Converting with error handling
df['Age_numeric'] = pd.to_numeric(df['Age'], errors='coerce')
df['Score_numeric'] = pd.to_numeric(df['Score'], errors='coerce')

print("\nAfter conversion with error handling:")
print(df.dtypes)
print(df)
Original DataFrame:
Name     object
Age      object
Score    object
dtype: object
      Name      Age Score
0    Alice       25    85
1      Bob       30    92
2  Charlie  unknown    78
3    Diana       40      

After conversion with error handling:
Name             object
Age              object
Score            object
Age_numeric     float64
Score_numeric   float64
dtype: object
      Name      Age Score  Age_numeric  Score_numeric
0    Alice       25    85         25.0           85.0
1      Bob       30    92         30.0           92.0
2  Charlie  unknown    78          NaN           78.0
3    Diana       40              40.0            NaN

Comparison of Methods

Method Error Handling Best For Output Type
astype() Raises ValueError Clean data with valid integers Exact type specified
to_numeric() Flexible (coerce, raise, ignore) Data with potential invalid values Float64 by default

Converting to Integer After to_numeric()

Since to_numeric() returns float64 by default, you can convert to integer if no NaN values exist ?

import pandas as pd

df = pd.DataFrame({'Values': ['10', '20', '30', '40']})

# Convert to numeric then to integer
df['Values'] = pd.to_numeric(df['Values']).astype(int)

print(df.dtypes)
print(df)
Values    int64
dtype: object
   Values
0      10
1      20
2      30
3      40

Conclusion

Use astype(int) for clean string data that you're confident contains only valid integers. Use pd.to_numeric() with errors='coerce' for data that may contain invalid values, as it provides better error handling and converts problematic values to NaN.

---
Updated on: 2026-03-27T09:37:47+05:30

8K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements