How to create a column with binary variable based on a condition of other variable in an R data frame?

R ProgrammingServer Side ProgrammingProgramming

Sometimes we need to create extra variable to add more information about the present data because it adds value. This is especially used while we do feature engineering. If we come to know about something that may affect our response then we prefer to use it as a variable in our data, hence we make up that with the data we have. For example, creating another variable applying conditions on other variable such as creating a binary variable for goodness if the frequency matches a certain criterion.

Example

Consider the below data frame −

 Live Demo

set.seed(100)
Group<-rep(c("A","B","C","D","E"),times=4)
Frequency<-sample(20:30,20,replace=TRUE)
df1<-data.frame(Group,Frequency)
df1

Output

 Group Frequency
1  A    29
2  B    26
3  C    25
4  D    22
5  E    28
6  A    29
7  B    26
8  C    25
9  D    25
10 E    23
11 A    26
12 B    25
13 C    21
14 D    26
15 E    26
16 A    26
17 B    30
18 C    27
19 D    21
20 E    22

Creating a column category having two levels as Good and Bad, where Good is for those that have Frequency greater than 25−

Example

df1$Category<-ifelse(df1$Frequency>25,"Good","Bad")
df1

Output

 Group Frequency Category
1  A       29       Good
2  B       26       Good
3  C       25       Bad
4  D       22       Bad
5  E       28       Good
6  A       29       Good
7  B       26       Good
8  C       25       Bad
9  D       25       Bad
10 E       23       Bad
11 A       26       Good
12 B       25       Bad
13 C       21       Bad
14 D       26       Good
15 E       26       Good
16 A       26       Good
17 B       30       Good
18 C       27       Good
19 D       21       Bad
20 E       22       Bad

Let’s have a look at another example −

Example

 Live Demo

Class<-rep(c("Lower","Middle","Upper Middle","Higher"),times=5)
Ratings<-sample(1:10,20,replace=TRUE)
df2<-data.frame(Class,Ratings)
df2

Output

     Class    Ratings
1    Lower       3
2    Middle      8
3 Upper Middle   2
4    Higher      9
5    Lower       2
6    Middle      3
7 Upper Middle   4
8    Higher      4
9    Lower       4
10   Middle      5
11 Upper Middle  7
12    Higher     9
13    Lower      4
14    Middle     2
15 Upper Middle  6
16    Higher     7
17    Lower      1
18    Middle     6
19 Upper Middle  9
20    Higher     9

Example

df2$Group<-ifelse(df2$Ratings>5,"Royal","Standard")
df2

Output

      Class    Ratings    Group
1    Lower       3       Standard
2    Middle      8         Royal
3 Upper Middle   2       Standard
4    Higher      9         Royal
5    Lower       2       Standard
6    Middle      3       Standard
7 Upper Middle   4       Standard
8    Higher      4       Standard
9    Lower       4       Standard
10   Middle      5       Standard
11 Upper Middle  7         Royal
12    Higher     9         Royal
13    Lower      4       Standard
14    Middle     2       Standard
15 Upper Middle  6         Royal
16    Higher     7         Royal
17    Lower      1       Standard
18    Middle     6         Royal
19 Upper Middle  9         Royal
20    Higher     9         Royal
raja
Published on 09-Sep-2020 08:00:43
Advertisements