How to combine columns by excluding missing values in R?


If we have a data set that contains missing values at alternate places for each column then we might want to combine the columns by excluding those missing values, this will reduce the data set and the analysis is likely to become easier.

For this purpose, we can use na.exclude function along with apply function as shown in the below given examples.

Example 1

Following snippet creates a sample data frame −

x1<-rep(c(NA,2,10),times=c(5,10,5))
x2<-rep(c(1,3,5,NA),times=c(5,5,5,5))
x3<-rep(c(10,NA,3),times=c(5,10,5))
df1<-data.frame(x1,x2,x3)
df1

The following dataframe is created −

   x1  x2 x3
1  NA  1 10
2  NA  1 10
3  NA  1 10
4  NA  1 10
5  NA  1 10
6   2  3 NA
7   2  3 NA
8   2  3 NA
9   2  3 NA
10  2  3 NA
11  2  5 NA
12  2  5 NA
13  2  5 NA
14  2  5 NA
15  2  5 NA
16 10 NA  3
17 10 NA  3
18 10 NA  3
19 10 NA  3
20 10 NA  3

To exclude NA’s from df1 and combine the columns, add the following code to the above snippet −

x1<-rep(c(NA,2,10),times=c(5,10,5))
x2<-rep(c(1,3,5,NA),times=c(5,5,5,5))
x3<-rep(c(10,NA,3),times=c(5,10,5))
df1<-data.frame(x1,x2,x3)
t(apply(df1,1,na.exclude))

Output

If you execute all the above given snippets as a single program, it generates the following output −

      [,1][,2]
[1,]   1   10
[2,]   1   10
[3,]   1   10
[4,]   1   10
[5,]   1   10
[6,]   2    3
[7,]   2    3
[8,]   2    3
[9,]   2    3
[10,]  2    3
[11,]  2    5
[12,]  2    5
[13,]  2    5
[14,]  2    5
[15,]  2    5
[16,] 10    3
[17,] 10    3
[18,] 10    3
[19,] 10    3
[20,] 10    3

Example 2

Following snippet creates a sample data frame −

y1<-rep(c(NA,rnorm(5)),times=c(5,2,3,3,3,4))
y2<-rep(c(rnorm(2),NA),times=c(10,5,5))
y3<-rep(c(rnorm(1),NA,rnorm(1)),times=c(5,10,5))
df2<-data.frame(y1,y2,y3)
df2

The following dataframe is created −

    y1           y2          y3
1   NA          0.1152603  -0.9838989
2   NA          0.1152603  -0.9838989
3   NA          0.1152603  -0.9838989
4   NA          0.1152603  -0.9838989
5   NA          0.1152603  -0.9838989
6  -0.74142593  0.1152603   NA
7  -0.74142593  0.1152603   NA
8  -1.88274271  0.1152603   NA
9  -1.88274271  0.1152603   NA
10 -1.88274271  0.1152603   NA
11 -0.09684216 -1.2886519   NA
12 -0.09684216 -1.2886519   NA
13 -0.09684216 -1.2886519   NA
14 -0.08528031 -1.2886519   NA
15 -0.08528031 -1.2886519   NA
16 -0.08528031  NA          0.1967864
17 -0.80126932  NA          0.1967864
18 -0.80126932  NA          0.1967864
19 -0.80126932  NA          0.1967864
20 -0.80126932  NA          0.1967864

To exclude NA’s from df2 and combine the columns, add the following code to the above snippet −

y1<-rep(c(NA,rnorm(5)),times=c(5,2,3,3,3,4))
y2<-rep(c(rnorm(2),NA),times=c(10,5,5))
y3<-rep(c(rnorm(1),NA,rnorm(1)),times=c(5,10,5))
df2<-data.frame(y1,y2,y3)
t(apply(df2,1,na.exclude))

Output

If you execute all the above given snippets as a single program, it generates the following output −

          [,1]       [,2]
[1,]   0.11526026  -0.9838989
[2,]   0.11526026  -0.9838989
[3,]   0.11526026  -0.9838989
[4,]   0.11526026  -0.9838989
[5,]   0.11526026  -0.9838989
[6,]  -0.74142593   0.1152603
[7,]  -0.74142593   0.1152603
[8,]  -1.88274271   0.1152603
[9,]  -1.88274271   0.1152603
[10,] -1.88274271   0.1152603
[11,] -0.09684216  -1.2886519
[12,] -0.09684216  -1.2886519
[13,] -0.09684216  -1.2886519
[14,] -0.08528031  -1.2886519
[15,] -0.08528031  -1.2886519
[16,] -0.08528031   0.1967864
[17,] -0.80126932   0.1967864
[18,] -0.80126932   0.1967864
[19,] -0.80126932   0.1967864
[20,] -0.80126932   0.1967864

Updated on: 22-Nov-2021

129 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements