- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Conversion Functions in Pandas DataFrame
Pandas is one of the most potent libraries in python that provide high-performance data manipulation and analysis tools, it allows us to work with tabular data like spreadsheets, CSV, and SQL data using DataFrame.
A DataFrame is a 2-dimensional labeled data structure it represents the data in rows and columns format. Data present in each column may have different data types.
DataFrame: Integers Floats Strings Dates 0 1.0 1.300 p 2023-05-07 1 2.0 NaN y 2023-05-14 2 5.0 4.600 t 2023-05-21 3 3.0 1.020 h 2023-05-28 4 6.0 0.300 o 2023-06-04 5 NaN 0.001 n 2023-06-11
The DataFrame demonstrated above is having 6 rows and 4 columns and the data present in each row has different datatypes.
And Conversions functions are used to convert the datatype of elements present in a DataFrame object. In this article below we will discuss different type-conversion functions in Pandas DataFrame.
Input Output Scenarios
Let’s see the input-output scenarios to understand how typecasting can be done by using the conversion functions.
Assuming we have a DataFrame with a few columns of different data types, and in the output, we will see a DataFrame with updated column data types.
Input DataFrame: ints strs ints2 floats 0 1 x 10.0 NaN 1 2 y NaN 100.5 2 3 NaN 20.0 200.0 Data Types of the each column is: ints int64 strs object ints2 float64 floats float64 Output DataFrame: ints strs ints2 floats 0 1 x 10 <NA> 1 2 y <NA> 100.5 2 3 <NA> 20 200.0 Data Types of the resultant DataFrame is: ints Int64 strs string ints2 Int64 floats Float64
The DataFrame.convert_dtypes() function
The pandas DataFrame.convert_dtypes() function is used to convert the data type of the columns to the best possible types using dtypes supporting pd.NA and it returns a new DataFrame object with updated dtypes.
Syntax
DataFrame.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True)
Parameters
The default value for all the parameters is True. These all are indicates whether object dtypes should be converted to the best possible types.
Example
In this example, we will convert the datatype of the DataFrame columns using the .convert_dtypes() method.
import pandas as pd import numpy as np df = pd.DataFrame({"a":[1, 2, 3], "b": ["x", "y", "z"], "c": [True, False, np.nan], "d": ["h", "i", np.nan], "e": [10, np.nan, 20], "f": [np.nan, 100.5, 200]}) print("Input DataFrame:") print(df) print('Data Types of the each column is: ') print(df.dtypes) # Convert the data type of columns result = df.convert_dtypes() print("Output DataFrame:") print(result) print('Data Types of the resultant DataFrame is: ') print(result.dtypes)
Output
Input DataFrame: a b c d e f 0 1 x True h 10.0 NaN 1 2 y False i NaN 100.5 2 3 z NaN NaN 20.0 200.0 Data Types of the each column is: a int64 b object c object d object e float64 f float64 dtype: object Output DataFrame: a b c d e f 0 1 x True h 101 2 y False i 100.5 2 3 z 20 200.0 Data Types of the resultant DataFrame is: a Int64 b string c boolean d string e Int64 f Float64 dtype: object
Initially, we check the data types of the DataFrame columns using dtypes() method. And then the data type of column “b” is converted to the string, c is converted to Boolean, “d” is converted to the string, and “e” is converted to int64 using the convert_dtypes() method.
The DataFrame.astype() function
The pandas DataFrame.astype() function is used to convert the data type of the pandas object to a specified dtype. Following is the syntax –
DataFrame.astype(dtype, copy, errors)
Parameters
dtype: data type, or dict {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to a specific data type.
copy: The default value is True, whether to do the changes in the original DataFrame (False) or create a copy (True).
errors: The default value is ‘raise’. Whether to ignore errors or raise an exception on error.
Example
In this example, we will convert the data type of all columns to an object type using the astype() function.
import pandas as pd df = pd.DataFrame({'Integers':[1, 2, 5, 3, 6, 0], 'Floats': [1.3, None, 4.6, 1.02, 0.3, 0.001], 'Strings': ['p', 'y', 't', 'h', 'o', 'n'], 'Dates': pd.date_range('2023-05-04', periods=6, freq='W')}) print("Input DataFrame:") print(df) print('Data Types of each column is: ') print(df.dtypes) # Convert the data type of columns result = df.astype('object') print("Output DataFrame:") print(result) print('Data Types of the resultant DataFrame is: ') print(result.dtypes)
Output
Input DataFrame: Integers Floats Strings Dates 0 1 1.300 p 2023-05-07 1 2 NaN y 2023-05-14 2 5 4.600 t 2023-05-21 3 3 1.020 h 2023-05-28 4 6 0.300 o 2023-06-04 5 0 0.001 n 2023-06-11 Data Types of each column is: Integers int64 Floats float64 Strings object Dates datetime64[ns] dtype: object Output DataFrame: Integers Floats Strings Dates 0 1 1.3 p 2023-05-07 00:00:00 1 2 NaN y 2023-05-14 00:00:00 2 5 4.6 t 2023-05-21 00:00:00 3 3 1.02 h 2023-05-28 00:00:00 4 6 0.3 o 2023-06-04 00:00:00 5 0 0.001 n 2023-06-11 00:00:00 Data Types of the resultant DataFrame is: Integers object Floats object Strings object Dates object dtype: object
The datatype of all the columns converted to the object type.
Example
Let’s take another example to convert the dtype of a few columns by using a dictionary.
import pandas as pd df = pd.DataFrame({'Integers':[1, 2, 5, 3, 6, 0], 'Floats': [1.3, None, 4.6, 1.02, 0.3, 0.001], 'Strings': ['p', 'y', 't', 'h', 'o', 'n'], 'Dates': pd.date_range('2023-05-04', periods=6, freq='W')}) print("Input DataFrame:") print(df) print('Data Types of each column is: ') print(df.dtypes) # Convert the data type of columns result = df.astype({'Floats':'object', 'Strings': 'category'}) print("Output DataFrame:") print(result) print('Data Types of the resultant DataFrame is: ') print(result.dtypes)
Output
Input DataFrame: Integers Floats Strings Dates 0 1 1.300 p 2023-05-07 1 2 NaN y 2023-05-14 2 5 4.600 t 2023-05-21 3 3 1.020 h 2023-05-28 4 6 0.300 o 2023-06-04 5 0 0.001 n 2023-06-11 Data Types of each column is: Integers int64 Floats float64 Strings object Dates datetime64[ns] dtype: object Output DataFrame: Integers Floats Strings Dates 0 1 1.3 p 2023-05-07 1 2 NaN y 2023-05-14 2 5 4.6 t 2023-05-21 3 3 1.02 h 2023-05-28 4 6 0.3 o 2023-06-04 5 0 0.001 n 2023-06-11 Data Types of the resultant DataFrame is: Integers int64 Floats object Strings category Dates datetime64[ns] dtype: object
The columns Floats, Strings are converted to object and category dtypes.