Replacing strings with numbers in Python for Data Analysis

Sometimes there is a requirement to convert a string to a number (int/float) in data analysis. For each string, we can assign a unique integer value to differentiate string values.

For this, we use the data in Comma Separated Values(CSV) files. Say we have an excel file containing CSV data as follow −

HDFC BankFinanceHold
Yes BankFinanceHold

Above is just a few lines from a large dataset, we need to give different recommendation .i.e. Buy, Hold, Underperform etc. integer values, which will link to our metadata. So for the above input, our expected output will be something like −

HDFC BankFinance2
Yes BankFinance2

Here is a way to replace our string(column values) to integers.

Code 1

#Import required library
import pandas as pd
#Import the CSV file into Python using read_csv() from pandas
dataframe = pd.read_csv("data_pandas1.csv")
#Create the dictionary of key-value pair, where key is
#your old value(string) and value is your new value(integer).
Recommendation = {'Buy': 1, 'Hold': 2, 'Underperform': 3}
#Assign these different key-value pair from above dictiionary to your table
dataframe.Recommendation = [Recommendation[item] for item in dataframe.Recommendation]
#New table


          Company         Industry        Recommendation
   0    HDFC Bank          Finance         2
   1    Apollo             Healthcare      1
   2    Hero               Automobile      3
   3    Yes Bank           Finance         2
   4    M&M                Automobile      3
   5    Fortis             Healthcare      1 
   6    Maruti             Automobile      3

There is another way to write above code, where we don’t deal with a dictionary instead we directly assign another value to the columns field(Recommendations here) if condition matches.

#Import required library
import pandas as pd
#Import the CSV file into Python using read_csv() from pandas
dataf = pd.read_csv("data_pandas1.csv")
#Directly assigning individual fields of Recommendation column different integer value
#if condition matches .i.e.In the dataframe, recommendation columns we have "Buy" we'll assign
# integer 1 to it.
dataf.Recommendation[data.Recommendation =='Buy'] =1
dataf.Recommendation[data.Recommendation =='Hold'] =2
dataf.Recommendation[data.Recommendation =='Underperform'] =3


    Company      Industry       Recommendation
0    HDFC Bank    Finance        2
1    Apollo       Healthcare     1
2    Hero         Automobile     3
3    Yes Bank     Finance        2
4    M&M          Automobile     3
5    Fortis       Healthcare     1
6    Maruti       Automobile     3

Above I’ve mentioned the only couple of way to replacing string data in your table(csv format file) to an integer value and there are many instances come up when you have the same requirement to change your data field from string to integer.

karthikeya Boyini
karthikeya Boyini

I love programming (: That's all I know