Conversion Functions in Pandas DataFrame


Pandas is one of the most potent libraries in python that provide high-performance data manipulation and analysis tools, it allows us to work with tabular data like spreadsheets, CSV, and SQL data using DataFrame.

A DataFrame is a 2-dimensional labeled data structure it represents the data in rows and columns format. Data present in each column may have different data types.

DataFrame:
	Integers	Floats		Strings	Dates
0	1.0		1.300		p		2023-05-07
1	2.0		NaN		    y		2023-05-14
2	5.0		4.600		t		2023-05-21
3	3.0		1.020		h		2023-05-28
4	6.0		0.300		o		2023-06-04
5	NaN		0.001		n		2023-06-11

The DataFrame demonstrated above is having 6 rows and 4 columns and the data present in each row has different datatypes.

And Conversions functions are used to convert the datatype of elements present in a DataFrame object. In this article below we will discuss different type-conversion functions in Pandas DataFrame.

Input Output Scenarios

Let’s see the input-output scenarios to understand how typecasting can be done by using the conversion functions.

Assuming we have a DataFrame with a few columns of different data types, and in the output, we will see a DataFrame with updated column data types.

Input DataFrame:
   ints strs  ints2  floats
0     1    x   10.0     NaN
1     2    y    NaN   100.5
2     3  NaN   20.0   200.0

Data Types of the each column is: 
ints        int64
strs       object
ints2     float64
floats    float64

Output DataFrame:
   ints  strs  ints2  floats
0     1     x     10    <NA>
1     2     y   <NA>   100.5
2     3  <NA>     20   200.0

Data Types of the resultant DataFrame is: 
ints        Int64
strs       string
ints2       Int64
floats    Float64

The DataFrame.convert_dtypes() function

The pandas DataFrame.convert_dtypes() function is used to convert the data type of the columns to the best possible types using dtypes supporting pd.NA and it returns a new DataFrame object with updated dtypes.

Syntax

DataFrame.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True)

Parameters

The default value for all the parameters is True. These all are indicates whether object dtypes should be converted to the best possible types.

Example

In this example, we will convert the datatype of the DataFrame columns using the .convert_dtypes() method.

import pandas as pd
import numpy as np

df = pd.DataFrame({"a":[1, 2, 3],
   "b": ["x", "y", "z"],
   "c": [True, False, np.nan],
   "d": ["h", "i", np.nan],
   "e": [10, np.nan, 20],
   "f": [np.nan, 100.5, 200]})
print("Input DataFrame:")
print(df)
print('Data Types of the each column is: ')
print(df.dtypes)

# Convert the data type of columns
result = df.convert_dtypes()
print("Output DataFrame:")
print(result)
print('Data Types of the resultant DataFrame is: ')
print(result.dtypes)

Output

Input DataFrame:
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0

Data Types of the each column is: 
a      int64
b     object
c     object
d     object
e    float64
f    float64
dtype: object

Output DataFrame:
   a  b      c     d     e      f
0  1  x   True     h    10   
1  2  y  False     i    100.5
2  3  z         20  200.0

Data Types of the resultant DataFrame is: 
a      Int64
b     string
c    boolean
d     string
e      Int64
f    Float64
dtype: object

Initially, we check the data types of the DataFrame columns using dtypes() method. And then the data type of column “b” is converted to the string, c is converted to Boolean, “d” is converted to the string, and “e” is converted to int64 using the convert_dtypes() method.

The DataFrame.astype() function

The pandas DataFrame.astype() function is used to convert the data type of the pandas object to a specified dtype. Following is the syntax –

DataFrame.astype(dtype, copy, errors)

Parameters

  • dtype: data type, or dict {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to a specific data type.

  • copy: The default value is True, whether to do the changes in the original DataFrame (False) or create a copy (True).

  • errors: The default value is ‘raise’. Whether to ignore errors or raise an exception on error.

Example

In this example, we will convert the data type of all columns to an object type using the astype() function.

import pandas as pd
df = pd.DataFrame({'Integers':[1, 2, 5, 3, 6, 0],
   'Floats': [1.3, None, 4.6, 1.02, 0.3, 0.001],
   'Strings': ['p', 'y', 't', 'h', 'o', 'n'],
   'Dates': pd.date_range('2023-05-04', periods=6, freq='W')})
print("Input DataFrame:")
print(df)
print('Data Types of each column is: ')
print(df.dtypes)

# Convert the data type of columns
result = df.astype('object')
print("Output DataFrame:")
print(result)
print('Data Types of the resultant DataFrame is: ')
print(result.dtypes)

Output

Input DataFrame:
   Integers  Floats Strings      Dates
0         1   1.300       p 2023-05-07
1         2     NaN       y 2023-05-14
2         5   4.600       t 2023-05-21
3         3   1.020       h 2023-05-28
4         6   0.300       o 2023-06-04
5         0   0.001       n 2023-06-11

Data Types of each column is: 
Integers             int64
Floats             float64
Strings             object
Dates       datetime64[ns]
dtype: object

Output DataFrame:
  Integers Floats Strings                Dates
0        1    1.3       p  2023-05-07 00:00:00
1        2    NaN       y  2023-05-14 00:00:00
2        5    4.6       t  2023-05-21 00:00:00
3        3   1.02       h  2023-05-28 00:00:00
4        6    0.3       o  2023-06-04 00:00:00
5        0  0.001       n  2023-06-11 00:00:00

Data Types of the resultant DataFrame is: 
Integers    object
Floats      object
Strings     object
Dates       object
dtype: object

The datatype of all the columns converted to the object type.

Example

Let’s take another example to convert the dtype of a few columns by using a dictionary.

import pandas as pd
df = pd.DataFrame({'Integers':[1, 2, 5, 3, 6, 0],
   'Floats': [1.3, None, 4.6, 1.02, 0.3, 0.001],
   'Strings': ['p', 'y', 't', 'h', 'o', 'n'],
   'Dates': pd.date_range('2023-05-04', periods=6, freq='W')})
print("Input DataFrame:")
print(df)
print('Data Types of each column is: ')
print(df.dtypes)

# Convert the data type of columns
result = df.astype({'Floats':'object', 'Strings': 'category'})
print("Output DataFrame:")
print(result)
print('Data Types of the resultant DataFrame is: ')
print(result.dtypes)

Output

Input DataFrame:
   Integers  Floats Strings      Dates
0         1   1.300       p 2023-05-07
1         2     NaN       y 2023-05-14
2         5   4.600       t 2023-05-21
3         3   1.020       h 2023-05-28
4         6   0.300       o 2023-06-04
5         0   0.001       n 2023-06-11

Data Types of each column is: 
Integers             int64
Floats             float64
Strings             object
Dates       datetime64[ns]
dtype: object

Output DataFrame:
   Integers Floats Strings      Dates
0         1    1.3       p 2023-05-07
1         2    NaN       y 2023-05-14
2         5    4.6       t 2023-05-21
3         3   1.02       h 2023-05-28
4         6    0.3       o 2023-06-04
5         0  0.001       n 2023-06-11

Data Types of the resultant DataFrame is: 
Integers             int64
Floats              object
Strings           category
Dates       datetime64[ns]
dtype: object

The columns Floats, Strings are converted to object and category dtypes.

Updated on: 30-May-2023

165 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements