How to create an ID column in R based on categories?


If we have a categorical column in an R data frame then it can be used to create an ID column where each category will have its own ID defined on the basis of categories in the categorical column.

For this purpose, we would need to read the categorical column with as.factor and as.numeric function as shown in the below examples.

Example 1

Following snippet creates a sample data frame −

Group<-sample(c("Male","Female"),20,replace=TRUE)
Score<-sample(20:50,20)
df1<-data.frame(Group,Score)
df1

Output

The following dataframe is created −

   Group   Score
1  Female  20
2  Female  27
3  Female  29
4  Male    50
5  Male    42
6  Female  41
7  Male    32
8  Male    25
9  Female  21
10 Female  49
11 Female  31
12 Female  28
13 Female  36
14 Female  26
15 Male    43
16 Female  45
17 Male    23
18 Female  46
19 Male    48
20 Male    33

To create ID column based on Group in df1, add the following code to the above snippet −

Group<-sample(c("Male","Female"),20,replace=TRUE)
Score<-sample(20:50,20)
df1<-data.frame(Group,Score)
df1$ID<-as.numeric(as.factor(df1$Group))
df1

Output

If you execute all the above given snippets as a single program, it generates the following Output −

 Group    Score ID
1  Female  20   1
2  Female  27   1
3  Female  29   1
4  Male    50   2
5  Male    42   2
6  Female  41   1
7  Male    32   2
8  Male    25   2
9  Female  21   1
10 Female  49   1
11 Female  31   1
12 Female  28   1
13 Female  36   1
14 Female  26   1
15 Male    43   2
16 Female  45   1
17 Male    23   2
18 Female  46   1
19 Male    48   2
20 Male    33   2

Example 2

Following snippet creates a sample data frame −

Class<-sample(c("First","Second","Third"),20,replace=TRUE)
Rank<-sample(1:10,20,replace=TRUE)
df2<-data.frame(Class,Rank)
df2

Output

The following dataframe is created −

   Class Rank
1  Third   5
2  Third   7
3  First   3
4  Third   8
5  Second  9
6  Third   9
7  First   3
8  Second 10
9  First   4
10 Third   2
11 Third   8
12 Third   1
13 Third  10
14 First   6
15 Third   5
16 Second  6
17 Third   7
18 Third   5
19 Third   2
20 Second  5

To create ID column based on Class in df2, add the following code to the above snippet −

Class<-sample(c("First","Second","Third"),20,replace=TRUE)
Rank<-sample(1:10,20,replace=TRUE)
df2<-data.frame(Class,Rank)
df2$ID<-as.numeric(as.factor(df2$Class))
df2

Output

If you execute all the above given snippets as a single program, it generates the following Output −

  Class   Rank ID
1  Third   5   3
2  Third   7   3
3  First   3   1
4  Third   8   3
5  Second  9   2
6  Third   9   3
7  First   3   1
8  Second 10   2
9  First   4   1
10 Third   2   3
11 Third   8   3
12 Third   1   3
13 Third  10   3
14 First   6   1
15 Third   5   3
16 Second  6   2
17 Third   7   3
18 Third   5   3
19 Third   2   3
20 Second  5   2

Updated on: 03-Nov-2021

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements