How to check if some specific columns of an R data frame are equal to a column or not?


If we have a large amount of data in a data frame and we suspect that some of the data columns are repeated or some of them are equal to a particular column then we can use sapply function in base R to figure it out. In this way, we can remove duplicated columns that does not suppose to help in our data analysis objective.

Example1

Consider the below data frame:

Live Demo

> set.seed(354)
> x1<-rpois(20,5)
> x2<-rpois(20,5)
> x3<-rpois(20,5)
> x4<-rpois(20,5)
> x5<-rpois(20,5)
> df1<-data.frame(x1,x2,x3,x4,x5)
> df1

Output

x1 x2 x3 x4 x5
1 4 5 4 4 6
2 6 4 8 7 5
3 5 6 4 7 6
4 5 2 6 13 3
5 3 4 3 5 2
6 9 10 7 5 3
7 8 5 8 2 8
8 5 1 6 4 3
9 1 3 3 9 6
10 2 9 7 9 6
11 11 5 8 5 6
12 8 3 1 2 10
13 5 8 4 4 6
14 3 5 3 5 4
15 7 11 9 6 8
16 5 2 4 6 4
17 6 3 7 4 3
18 7 5 2 6 2
19 9 3 1 5 4
20 7 5 6 4 6

Checking whether columns x1 and x5 are

Example

> all(sapply(list(df1$x5),function(x) x==df1$x1))

Output

[1] FALSE

Example

> all(sapply(list(df1$x2,df1$x3),function(x) x==df1$x1))

Output

[1] FALSE

Example

> all(sapply(list(df1$x1,df1$x3),function(x) x==df1$x2))

Output

[1] FALSE

Example

> all(sapply(list(df1$x1,df1$x2,df1$x3),function(x) x==df1$x5))

Output

[1] FALSE

Example

> all(sapply(list(df1$x1,df1$x2,df1$x3),function(x) x==df1$x4))

Output

[1] FALSE

Example2

Live Demo

> y1<-rep(c(1,2,3,4),5)
> y2<-rep(c(1,2,3,4),5)
> y3<-rep(c(1,2,3,4),5)
> y4<-rep(c(1,2,4,5),5)
> df2<-data.frame(y1,y2,y3,y4)
> df2

Output

y1 y2 y3 y4
1 1 1 1 1
2 2 2 2 2
3 3 3 3 4
4 4 4 4 5
5 1 1 1 1
6 2 2 2 2
7 3 3 3 4
8 4 4 4 5
9 1 1 1 1
10 2 2 2 2
11 3 3 3 4
12 4 4 4 5
13 1 1 1 1
14 2 2 2 2
15 3 3 3 4
16 4 4 4 5
17 1 1 1 1
18 2 2 2 2
19 3 3 3 4
20 4 4 4 5

Example

> all(sapply(list(df2$y2, df2$y3),function(x) x==df2$y1))

Output

[1] TRUE

Example

> all(sapply(list(y2,y3),function(x) x==df2$y1))

Output

[1] TRUE

Example

> all(sapply(list(df2$y3),function(x) x==df2$y2))

Output

[1] TRUE

Example

> all(sapply(list(df2$y2,df2$y3),function(x) x==df2$y4))

Output

[1] FALSE

Updated on: 21-Nov-2020

253 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements