How to select rows of a data frame that are not in other data frame in R?

R ProgrammingServer Side ProgrammingProgramming

Instead of finding the common rows, sometimes we need to find the uncommon rows between two data frames. It is mostly used when we expect that a large number of rows are uncommon instead of few ones. We can do this by using the negation operator which is represented by exclamation sign with subset function.

Example

Consider the below data frames −

 Live Demo

> x1<-sample(1:10,20,replace=TRUE)
> y1<-sample(1:10,20,replace=TRUE)
> df1<-data.frame(x1,y1)
> df1

Output

 x1 y1
1 10 6
2 5 9
3 10 10
4 4 10
5 1 6
6 1 4
7 9 3
8 5 10
9 10 3
10 8 2
11 6 10
12 6 3
13 9 3
14 3 6
15 6 9
16 9 1
17 7 9
18 3 8
19 2 5
20 4 9

Example

 Live Demo

> x2<-sample(1:10,20,replace=TRUE)
> y2<-sample(1:10,20,replace=TRUE)
> df2<-data.frame(x2,y2)
> df2

Output

 x2 y2
1 6 10
2 3 6
3 9 6
4 9 10
5 10 10
6 3 2
7 3 3
8 2 9
9 7 5
10 1 1
11 10 10
12 1 6
13 3 4
14 4 2
15 6 3
16 1 7
17 2 2
18 4 6
19 4 1
20 1 8

Now suppose we want to take a subset of df2 variable y2 that are not in y1 of df1, then it can be done as follows −

> subset(df2,!(y2%in%df1$y1))
x2 y2
16 1 7
<0 rows> (or 0-length row.names)

Similarly, taking a subset of df2 variable y2 that are not in x1 of df1, then it can be done as follows −

> subset(df2,!(y2%in%df1$x1))
[1] x2 y2
<0 rows> (or 0-length row.names)

Let’s have a look at one more example −

Example

 Live Demo

> x1<-rep(1:10,2)
> df1<-data.frame(x1)
> df1

Output

 x1
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 1
12 2
13 3
14 4
15 5
16 6
17 7
18 8
19 9
20 10

 Live Demo

> x2<-rep(1:5,4)
> df2<-data.frame(x2)
> df2

Output

 x2
1 1
2 2
3 3
4 4
5 5
6 1
7 2
8 3
9 4
10 5
11 1
12 2
13 3
14 4
15 5
16 1
17 2
18 3
19 4
20 5
> subset(df1,!(x1%in%df2$x2))

Output

 x1
6 6
7 7
8 8
9 9
10 10
16 6
17 7
18 8
19 9
20 10
raja
Published on 04-Sep-2020 11:24:51
Advertisements