Replacing strings with numbers in Python for Data Analysis


Sometimes there is a requirement to convert a string to a number (int/float) in data analysis. For each string, we can assign a unique integer value to differentiate string values.

For this, we use the data in Comma Separated Values(CSV) files. Say we have an excel file containing CSV data as follow −

CompanyIndustryRecommendation
HDFC BankFinanceHold
ApolloHealthcareBuy
HeroAutomobileUnderperform
Yes BankFinanceHold
M&MAutomobileUnderperform
FortisHealthcareBuy
MarutiAutomobileUnderperform

Above is just a few lines from a large dataset, we need to give different recommendation .i.e. Buy, Hold, Underperform etc. integer values, which will link to our metadata. So for the above input, our expected output will be something like −

CompanyIndustryRecommendation
HDFC BankFinance2
ApolloHealthcare1
HeroAutomobile3
Yes BankFinance2
M&MAutomobile3
FortisHealthcare1
MarutiAutomobile3

Here is a way to replace our string(column values) to integers.

Code 1

#Import required library
import pandas as pd
#Import the CSV file into Python using read_csv() from pandas
dataframe = pd.read_csv("data_pandas1.csv")
#Create the dictionary of key-value pair, where key is
#your old value(string) and value is your new value(integer).
Recommendation = {'Buy': 1, 'Hold': 2, 'Underperform': 3}
#Assign these different key-value pair from above dictiionary to your table
dataframe.Recommendation = [Recommendation[item] for item in dataframe.Recommendation]
#New table
print(dataframe)

Result

          Company         Industry        Recommendation
   0    HDFC Bank          Finance         2
   1    Apollo             Healthcare      1
   2    Hero               Automobile      3
   3    Yes Bank           Finance         2
   4    M&M                Automobile      3
   5    Fortis             Healthcare      1 
   6    Maruti             Automobile      3

There is another way to write above code, where we don’t deal with a dictionary instead we directly assign another value to the columns field(Recommendations here) if condition matches.

#Import required library
import pandas as pd
#Import the CSV file into Python using read_csv() from pandas
dataf = pd.read_csv("data_pandas1.csv")
#Directly assigning individual fields of Recommendation column different integer value
#if condition matches .i.e.In the dataframe, recommendation columns we have "Buy" we'll assign
# integer 1 to it.
dataf.Recommendation[data.Recommendation =='Buy'] =1
dataf.Recommendation[data.Recommendation =='Hold'] =2
dataf.Recommendation[data.Recommendation =='Underperform'] =3
print(dataf)

Result

    Company      Industry       Recommendation
0    HDFC Bank    Finance        2
1    Apollo       Healthcare     1
2    Hero         Automobile     3
3    Yes Bank     Finance        2
4    M&M          Automobile     3
5    Fortis       Healthcare     1
6    Maruti       Automobile     3

Above I’ve mentioned the only couple of way to replacing string data in your table(csv format file) to an integer value and there are many instances come up when you have the same requirement to change your data field from string to integer.

karthikeya Boyini
karthikeya Boyini

I love programming (: That's all I know

Updated on: 30-Jul-2019

685 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements