Get the data type of column in Pandas - Python


Pandas is a popular and powerful Python library commonly used for data analysis and manipulation. It offers a number of data structures, including the Series, DataFrame, and Panel, for working with tabular and time-series data.

Pandas DataFrame is a two-dimensional tabular data structure. In this article, we'll go through various methods for determining a column's data type in Pandas. There can be numerous cases where we have to find the data type of a column in Pandas DataFrame. Each column in a Pandas DataFrame can contain a different data type.

Before Moving forward, let's make a sample dataframe on which we have to Get the data type of column in Pandas

import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]})

print(df)

Output

This python script prints the DataFrame that we have created.

  Vehicle name    price
0        Supra  5000000
1        Honda   600000
2   Lamorghini  7000000

The approaches that can be followed to complete the task are mentioned as below

Approaches

  • Using the dtypes attribute

  • Using select_dtypes()

  • Using the info() method

  • Using the describe() function

Now let's discuss each approach and how they can be used to get the data type of column in Pandas.

Method 1: Using the dtypes attribute

We can use the dtypes attribute for getting the data type of each column present in the DataFrame. This attribute will return a series with the data type of each column. Below syntax can be used:

Syntax

df.dtypes

Return Type data type of each column present in the DataFrame.

Algorithm

  • Import the Pandas library.

  • Create a DataFrame using the pd.DataFrame() function and pass the sample as a dictionary.

  • Use the dtypes attribute to get the data types of each column in the DataFrame.

  • Print the result to check the data types of each column.

Example 1

# import the Pandas library
import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]})

# print the dataframe
print("DataFrame:\n", df)

# get the data types of each column
print("\nData types of each column:")
print(df.dtypes)

Output

DataFrame:
   Vehicle name    price
0        Supra  5000000
1        Honda   600000
2   Lamorghini  7000000

Data types of each column:
Vehicle name    object
price            int64
dtype: object

Example 2

In this example, we are getting the data type of a single column of the DataFrame

# import the Pandas library
import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]})

# print the dataframe
print("DataFrame:\n", df)

# get the data types of column named price
print("\nData types of column named price:")
print(df.dtypes['price'])

Output

DataFrame:
   Vehicle name    price
0        Supra  5000000
1        Honda   600000
2   Lamorghini  7000000

Data types of column named price:
int64

Method 2: Using select_dtypes()

We can use the select_dtypes() method for filtering out what data type columns we need. Based on the data types supplied as inputs, the select_dtypes() method returns a subset of the columns. This method allows us to choose the columns that belong to a specific data type and then determine the data type.

Algorithm

  • Import the Pandas library.

  • Create a DataFrame using pd.DataFrame() function and pass the given data as a dictionary.

  • Print the DataFrame to check the created data.

  • Use the select_dtypes() method to select the all the numeric columns from the DataFrame. Pass the list of data types that we want to select as an argument using the include parameter.

  • loop on the columns to iterate through each numeric column and print its data type.

Example

# import the Pandas library
import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]})

# print the dataframe
print("DataFrame:\n", df)

# select the numeric columns
numeric_cols = df.select_dtypes(include=['float64', 'int64']).columns

# get the data type of each numeric column
for col in numeric_cols:
    print("Data Type of column", col, "is", df[col].dtype)

Output

DataFrame:
   Vehicle name    price
0        Supra  5000000
1        Honda   600000
2   Lamorghini  7000000
Data Type of column price is int64

Method 3: Using the info() method

We can also use the info() method for our task. The info() method provides us with a concise summary of a DataFrame, including the data type of each column. Below syntax can be used:

Syntax

DataFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)

Return Value None

Algorithm

  • Import the Pandas library.

  • Create a DataFrame using the pd.DataFrame() function and pass the above data as a dictionary.

  • Print the DataFrame to check the created data.

  • Use the info() method to get information about the DataFrame.

  • Print the information obtained from the info() method.

Example

# import the Pandas library
import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]})

# print the dataframe
print("DataFrame:\n", df)

# use the info() method to get the data type of each column
print(df.info())

Output

DataFrame:
   Vehicle name    price
0        Supra  5000000
1        Honda   600000
2   Lamorghini  7000000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Vehicle name  3 non-null      object
 1   price         3 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes
None

Method 4: Using the describe() function

The describe() method is used to generate descriptive statistics of a DataFrame, including the data type of each column.

Algorithm

  • Import the Pandas library using the import statement.

  • Create a DataFrame using the pd.DataFrame() function and pass the given data as a dictionary.

  • Print the DataFrame to check the created data.

  • Use the describe() method to get the descriptive statistics of the DataFrame.

  • Use the include parameter of the describe() method to 'all' for including all the columns in the descriptive statistics.

  • Get the data type of each column in the DataFrame using the dtypes attribute.

  • Print the data type of each column.

Example

# import the Pandas library
import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]})

# print the dataframe
print("DataFrame:\n", df)

# use the describe() method to get the descriptive statistics of the dataframe
desc_stats = df.describe(include='all')

# get the data type of each column 
dtypes = desc_stats.dtypes

# print the data type of each column
print("Data type of each column in the descriptive statistics:\n", dtypes)

Output

DataFrame:
   Vehicle name    price
0        Supra  5000000
1        Honda   600000
2   Lamorghini  7000000
Data type of each column in the descriptive statistics:
 Vehicle name     object
price           float64
dtype: object

Conclusion

We can efficiently complete various data manipulation and analysis jobs by knowing how to get the data type of each column. Each approach has its own advantages and disadvantages based on the method or function used. You can choose the method you want based on the complexity of the expression you want to have and your personal preference for writing the code.

Updated on: 29-May-2023

6K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements