How to convert strings in R data frame to unique integers?


To convert strings in R data frame to unique integers, we first need to extract the unique strings in the data frame and then read them inside data.frame function with as.numeric along with factor function.

Check out the below Examples to understand how it works.

Example 1

Consider the below data frame −

x1<-sample(c("Hot","Normal","Cold"),20,replace=TRUE)
x2<-sample(c("Hot","Normal","Cold"),20,replace=TRUE)
x3<-sample(c("Hot","Normal","Cold"),20,replace=TRUE)
df1<-data.frame(x1,x2,x3)
df1

The following dataframe is created

       x1      x2      x3
1     Hot  Normal     Hot
2     Hot     Hot  Normal
3     Hot  Normal     Hot
4     Hot    Cold  Normal
5     Hot    Cold     Hot
6     Hot     Hot    Cold
7     Hot    Cold    Cold
8  Normal    Cold    Cold
9     Hot  Normal    Cold
10    Hot     Hot     Hot
11 Normal  Normal     Hot
12 Normal  Normal  Normal
13    Hot     Hot    Cold
14 Normal    Cold    Cold
15    Hot     Hot     Hot
16   Cold     Hot  Normal
17    Hot     Hot     Hot
18    Hot     Hot    Cold
19    Hot    Cold    Cold
20   Cold     Hot     Hot

To extract the unique values in data frame df1 on the above created data frame, add the following code to the above snippet −

x1<-sample(c("Hot","Normal","Cold"),20,replace=TRUE)
x2<-sample(c("Hot","Normal","Cold"),20,replace=TRUE)
x3<-sample(c("Hot","Normal","Cold"),20,replace=TRUE)
df1<-data.frame(x1,x2,x3)
Unique_df1<-
unique(c(as.character(df1$x1),as.character(df1$x2),as.character(df1$x3)))
Unique_df1

Output

If you execute all the above given snippets as a single program, it generates the following Output −

[1] "Hot" "Normal" "Cold"

To convert the string values in df1 to unique numeric values on the above created data frame, add the following code to the above snippet −

x1<-sample(c("Hot","Normal","Cold"),20,replace=TRUE)
x2<-sample(c("Hot","Normal","Cold"),20,replace=TRUE)
x3<-sample(c("Hot","Normal","Cold"),20,replace=TRUE)
df1<-data.frame(x1,x2,x3)
Unique_df1<-
unique(c(as.character(df1$x1),as.character(df1$x2),as.character(df1$x3)))
df1<-
data.frame(x1=as.numeric(factor(df1$x1,levels=Unique_df1)),x2=as.numeric(factor
(df1$x2,levels=Unique_df1)),x3=as.numeric(factor(df1$x3,levels=Unique_df1)))
df1

Output

If you execute all the above given snippets as a single program, it generates the following Output −

  x1 x2 x3
1  1 2 1
2  1 1 2
3  1 2 1
4  1 3 2
5  1 3 1
6  1 1 3
7  1 3 3
8  2 3 3
9  1 2 3
10 1 1 1
11 2 2 1
12 2 2 2
13 1 1 3
14 2 3 3
15 1 1 1
16 3 1 2
17 1 1 1
18 1 1 3
19 1 3 3
20 3 1 1

Example 2

Following snippet creates a sample data frame −

y1<-sample(c("Summer","Rainy","Winter","Spring"),20,replace=TRUE)
y2<-sample(c("Summer","Rainy","Winter","Spring"),20,replace=TRUE)
y3<-sample(c("Summer","Rainy","Winter","Spring"),20,replace=TRUE)
df2<-data.frame(y1,y2,y3)
df2

The following dataframe is created

       y1     y2     y3
1   Rainy Winter  Rainy
2  Summer  Rainy Summer
3  Summer Spring Summer
4  Summer Spring Winter
5  Winter Winter  Rainy
6  Summer Rainy  Winter
7  Winter Winter  Rainy
8  Winter Summer Spring
9  Spring Summer Winter
10 Summer Summer Spring
11  Rainy  Rainy Spring
12  Rainy Winter Summer
13 Summer Spring Spring
14 Summer Summer Winter
15 Spring Spring Winter
16 Spring Spring Spring
17 Winter Spring Spring
18 Winter  Rainy Summer
19 Winter Spring Winter
20 Winter Summer Summer

To extract the unique values in data frame df2 on the above created data frame, add the following code to the above snippet −

y1<-sample(c("Summer","Rainy","Winter","Spring"),20,replace=TRUE)
y2<-sample(c("Summer","Rainy","Winter","Spring"),20,replace=TRUE)
y3<-sample(c("Summer","Rainy","Winter","Spring"),20,replace=TRUE)
df2<-data.frame(y1,y2,y3)
Unique_df2<-
unique(c(as.character(df2$y1),as.character(df2$y2),as.character(df2$y3)))
Unique_df2

Output

If you execute all the above given snippets as a single program, it generates the following Output −

[1] "Rainy" "Summer" "Winter" "Spring"

To convert the string values in df2 to unique numeric values on the above created data frame, add the following code to the above snippet −

y1<-sample(c("Summer","Rainy","Winter","Spring"),20,replace=TRUE)
y2<-sample(c("Summer","Rainy","Winter","Spring"),20,replace=TRUE)
y3<-sample(c("Summer","Rainy","Winter","Spring"),20,replace=TRUE)
df2<-data.frame(y1,y2,y3)
Unique_df2<-
unique(c(as.character(df2$y1),as.character(df2$y2),as.character(df2$y3)))
df2<-
data.frame(y1=as.numeric(factor(df2$y1,levels=Unique_df2)),y2=as.numeric(factor
(df2$y2,levels=Unique_df2)),y3=as.numeric(factor(df2$y3,levels=Unique_df2)))
df2

Output

If you execute all the above given snippets as a single program, it generates the following Output −

  y1 y2 y3
1  1 3 1
2  2 1 2
3  2 4 2
4  2 4 3
5  3 3 1
6  2 1 3
7  3 3 1
8  3 2 4
9  4 2 3
10 2 2 4
11 1 1 4
12 1 3 2
13 2 4 4
14 2 2 3
15 4 4 3
16 4 4 4
17 3 4 4
18 3 1 2
19 3 4 3
20 3 2 2

Updated on: 28-Oct-2021

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements