How to deal with error “undefined columns selected when subsetting data frame” in R?

R ProgrammingServer Side ProgrammingProgramming

The error “undefined columns selected when subsetting data frame” means that R does not understand the column that you want to use while subsetting the data frame. Generally, this happens when we forget to use comma while subsetting with single square brackets.

Example

Consider the below data frame −

> set.seed(99)
> x1<-rnorm(20,0.5)
> x2<-rpois(20,2)
> x3<-runif(20,2,10)
> x4<-rnorm(20,0.2)
> x5<-rpois(20,5)
> df<-data.frame(x1,x2,x3,x4,x5)
> df
x1 x2 x3 x4 x5
1 0.7139625 4 9.321058 0.33297863 4
2 0.9796581 2 4.298837 -1.47926432 11
3 0.5878287 3 7.389898 -0.07847958 5
4 0.9438585 4 7.873764 -1.35241100 6
5 0.1371621 2 5.534758 -1.17969925 4
6 0.6226740 4 8.786676 -1.15705659 5
7 -0.3638452 1 6.407712 -0.72113718 5
8 0.9896243 2 9.374095 -0.66681774 9
9 0.1358831 2 2.086996 1.85664439 3
10 -0.7942420 0 8.730721 0.04492028 3
11 -0.2457690 3 2.687042 -1.37655243 2
12 1.4215504 3 7.075115 0.82408260 4
13 1.2500544 3 5.373809 0.53022068 5
14 -2.0085540 5 5.287499 -0.19812226 12
15 -2.5409341 1 6.217131 -0.88139693 5
16 0.5002658 3 2.723290 0.12307794 6
17 0.1059810 0 6.288451 -0.32553662 4
18 -1.2450277 2 2.942365 0.59128965 5
19 0.9986315 4 7.012492 -0.48045326 6
20 0.7709538 1 7.801093 -0.54869693 5

Now suppose, you want to select rows where x2 is greater than 2 and you type of the following code −

> df[df$x2>2]
Error in `[.data.frame`(df, df$x2 > 2) : undefined columns selected

It is throwing an error of undefined columns because you forgot the comma after defining your objective. The appropriate way to select the rows where x2 is greater than 2 is as shown below −

> df[df$x2>2,]
x1 x2 x3 x4 x5
1 0.7139625 4 9.321058 0.33297863 4
3 0.5878287 3 7.389898 -0.07847958 5
4 0.9438585 4 7.873764 -1.35241100 6
6 0.6226740 4 8.786676 -1.15705659 5
11 -0.2457690 3 2.687042 -1.37655243 2
12 1.4215504 3 7.075115 0.82408260 4
13 1.2500544 3 5.373809 0.53022068 5
14 -2.0085540 5 5.287499 -0.19812226 12
16 0.5002658 3 2.723290 0.12307794 6
19 0.9986315 4 7.012492 -0.48045326 6

Similarly, to select the rows where x2 is less than 2 is as follows −

> df[df$x2<2,]
x1 x2 x3 x4 x5
7 -0.3638452 1 6.407712 -0.72113718 5
10 -0.7942420 0 8.730721 0.04492028 3
15 -2.5409341 1 6.217131 -0.88139693 5
17 0.1059810 0 6.288451 -0.32553662 4
20 0.7709538 1 7.801093 -0.54869693 5

In the same way, the selection of rows where x2 is greater than 1 is as follows −

> df[df$x2>1,]
x1 x2 x3 x4 x5
1 0.7139625 4 9.321058 0.33297863 4
2 0.9796581 2 4.298837 -1.47926432 11
3 0.5878287 3 7.389898 -0.07847958 5
4 0.9438585 4 7.873764 -1.35241100 6
5 0.1371621 2 5.534758 -1.17969925 4
6 0.6226740 4 8.786676 -1.15705659 5
8 0.9896243 2 9.374095 -0.66681774 9
9 0.1358831 2 2.086996 1.85664439 3
11 -0.2457690 3 2.687042 -1.37655243 2
12 1.4215504 3 7.075115 0.82408260 4
13 1.2500544 3 5.373809 0.53022068 5
14 -2.0085540 5 5.287499 -0.19812226 12
16 0.5002658 3 2.723290 0.12307794 6
18 -1.2450277 2 2.942365 0.59128965 5
19 0.9986315 4 7.012492 -0.48045326 6
raja
Published on 11-Aug-2020 09:02:53
Advertisements