What are various Text data types in Python pandas?


There are two ways to store textual data in python pandas (for version 1.0.0.to Latest version 1.2.4). On this note, we can say pandas textual data have two data types which are object and StringDtype.

In the older version of pandas (1.0), only object dtype is available, in a newer version of pandas it is recommended to use StringDtype to store all textual data. To overcome some disadvantages of using objects dtype, this StringDtype is introduced in the pandas 1.0 version. Still, we can use both object and StringDtype for text data.

Let’s take an example, in that create a DataFrame using text data and see the output default dtype in pandas text data.

Object dtype

Create a pandas DataFrame with text data and verify the dtype of data.

Example

dict_ = {'A':['a','Aa'],'B':['b','Bb']} # Declaring a Dictionary

df = pd.DataFrame(dict_) # creating a DataFrame using Dictionary

print(df['A']) # printing column A values
print() # giving space between each output
print(df['B']) # Printing column B values

Explanation

In the above code, created a Dictionary with string data and assigned it to the dict_ variable, by using this dict_ we created a Pandas DataFrame. This DataFrame has 2 columns and 2 rows, and the total data present in this DataFrame is string data.

From the last 3 lines of the above code is displaying each column of data, in that output, we can see the dtype of our data. Let’s verify the output below.

Output

0     a
1    Aa
Name: A, dtype: object

0      b
1     Bb
Name: B, dtype: object

The above output is representing each column A and column B values from our DataFrame separated by a line space. Here we can see the dtype of each column representing the object by default. To define StringDtype we need to state it explicitly.

String dtype

To define String dtype we can use the dtype parameter and assign a string or StringDtype argument. Let’s see some examples below.

Example

list_ = ['python','sample', 'string']
ds = pd.Series(list_, dtype='string')
print(ds)

Explanation

Here we define a pandas Series, by using the pandas series method with a list of strings. And we pass string argument to the Parameter dtype, it will change the default object dtype to string.

Output

0     python
1     sample
2     string
dtype: string

The above block is the output of series data, here the dtype of data is a string. We can also use pd.StringDtype() to define dtype as a string. Let’s take another example.

Example

data = ['john','dev','philip'] # creating a list
ds = pd.Series(data, dtype= pd.StringDtype()) # Series creation
ds

For this example also we have taken a pandas series with a list of strings and defined pd.StringDtype argument to parameter dtype.

Output

0    John
1    Dev
2    Philip
dtype: string

Here the output of pd.StringDtype argument to dtype parameter is shown above block.

Updated on: 18-Nov-2021

288 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements