How to Convert String to Integer in Pandas DataFrame?


Python is one of the most popular programming languages for data analysis and manipulation, and for good reason. With its intuitive syntax and rich ecosystem of libraries, Python provides a powerful platform for working with data. One such library is Pandas, a highly versatile tool for data manipulation and analysis. Pandas allow us to easily manipulate and transform data in a variety of ways, making it an essential part of any data analyst or scientist's toolkit.

In this tutorial, we'll be focusing on one specific problem that often arises in data analysis: converting string data to integer data in Pandas DataFrames. As data analysts, we often encounter data that is stored as strings, even though it would be more useful as numeric data. In the next section of the article, we'll explore a few methods for converting string data to integer data in Pandas, including using the astype() and to_numeric() functions. We'll also discuss some best practices and considerations to keep in mind when working with data conversions.

How to Convert String to Integer in Pandas DataFrame?

Converting string data to integer data in Pandas can be done using a variety of methods.

Method 1: Using the astype() Function

The astype() function in Pandas allows us to change the data type of a column in a DataFrame. This method is straightforward and useful when converting string data to integer data in Pandas. We can apply the astype() function to the desired column and specify the desired data type, which in this case would be 'int'.

To use astype() function for data conversion, we can simply call this method on the DataFrame and specify the data type we want to convert the column to. For example, the following code demonstrates how to convert a string column "Age" to an integer column using astype() function:

Example

# Importing required libraries
import pandas as pd

# Creating sample DataFrame
df = pd.DataFrame({'Name': ['Prince', 'Mukul', 'Divyang', 'Rohit'],
                   'Age': ['25', '30', '35', '40'],
                   'Salary': ['50000', '60000', '70000', '80000']})

# Converting Age column to integer using astype() function
df['Age'] = df['Age'].astype(int)

# Output
print(df.dtypes)
print(df)

In the above code, we created a sample DataFrame named "df". Next, we use the "astype()" function to convert the "Age" column to an integer. The "astype()" function is used to change the data type of a column in a data frame. We specify "int" as the argument for the function to convert the column to an integer data type. The updated DataFrame is stored back in the "df" variable.

Finally, we print the data types of the columns using the "dtypes" attribute and the DataFrame using the "print()" function to see the changes made to the "Age" column.

Output

The output of the above code will look something like this:

Name       object
Age         int32
Salary     object
dtype:     object

       Name  Age Salary
0    Prince   25  50000
1     Mukul   30  60000
2   Divyang   35  70000
3     Rohit   40  80000

As we can see from the output above, the "Age" column has been successfully converted to an integer data type, represented by the "int32" value in the data types output. The DataFrame now has three columns - Name, Age, and Salary, with Age column containing integer values instead of string values.

It's important to note that when using the astype() function, the string values must be convertible to integers; otherwise, a ValueError will be raised. Additionally, if there are any non-numeric characters or missing values in the column, this method will not work as intended.

Method 2: Using the to_numeric() Function

The to_numeric() function in Pandas is another useful method for converting string columns to integer data type. This function allows us to convert various data types to numeric type, including strings. It provides more flexibility in handling conversion errors and offers additional parameters to customize the conversion process.

To use to_numeric() function, we can simply call this function on the column we want to convert and specify the data type we want to convert to. For example, the following code demonstrates how to convert a string column "Age" to an integer column using to_numeric() function:

Example

# Importing required libraries
import pandas as pd

# Creating sample DataFrame
df = pd.DataFrame({'Name': ['Prince', 'Mukul', 'Divyang', 'Rohit'],
                   'Age': ['25', '30', '35', '40'],
                   'Salary': ['50000', '60000', '70000', '80000']})

# Converting Age column to integer using to_numeric() function
df['Age'] = pd.to_numeric(df['Age'], errors='coerce', downcast='integer')

# Output
print(df.dtypes)
print(df)

In the code above, we use the pd.to_numeric() function to convert the 'Age' column of the DataFrame from string to integer. The function takes the column as the argument and additional parameters: errors='coerce' and downcast='integer'. The errors='coerce' parameter ensures that any non-convertible values are converted to NaN (Not a Number). The downcast='integer' parameter optimizes memory usage by downcasting the resulting data type to an integer if possible.

The output of the above code will look something like this:

Output

Name      object
Age        Int8
Salary    object
dtype: object

     Name  Age Salary
0  Prince   25  50000
1   Mukul   30  60000
2 Divyang   35  70000
3   Rohit   40  80000

In the output above, we can see that the 'Age' column has been converted to the Int8 data type, representing signed integers with a range from -128 to 127. The 'Name' and 'Salary' columns remain as object (string) data types.

The DataFrame itself is displayed with the updated 'Age' column, where the string values have been converted to their corresponding integer values.

So, we’ve successfully converted string to integer in Pandas Dataframe using astype() function and to_numeric() function.

Conclusion

In this tutorial, we've explored the methods for converting string data to integer data in Pandas DataFrames. The astype() function and to_numeric() functions are the two most popular methods for converting string to integer. The astype() method is useful when we know the string column contains only integers, while to_numeric() is more flexible in handling conversion errors. We have provided an example for each of the methods. When using the astype() method, ensure the string values are convertible to integers; otherwise, it will raise a ValueError. Similarly, to_numeric() function is not suitable when there are non-numeric characters or missing values in the column. Overall, the method to be used depends on the specific requirements of the project.

Updated on: 24-Jul-2023

5K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements