How to find the percentage of missing values in each column of an R data frame?


To find the percentage of missing values in each column of an R data frame, we can use colMeans function with is.na function. This will find the mean of missing values in each column. After that we can multiply the output with 100 to get the percentage.

Check out the below given examples to understand how it can be done.

Example 1

Following snippet creates a sample data frame −

x1<-sample(c(NA,1,2),20,replace=TRUE)
x2<-sample(c(NA,5),20,replace=TRUE)
x3<-sample(c(NA,10,12),20,replace=TRUE)
df1<-data.frame(x1,x2,x3)
df1

Output

The following dataframe is created −

   x1   x2  x3
1  NA  NA  12
2   2   5  10
3   2   5  12
4   1   5  12
5   1   5  NA
6  NA   5  10
7   1  NA  10
8  NA   5  10
9   2  NA  12
10  2  NA  NA
11 NA  NA  NA
12 NA   5  12
13 NA  NA  10
14  1  NA  NA
15  2  NA  12
16  1   5  NA
17 NA   5  10
18  2   5  10
19 NA   5  12
20 NA   5  12

To find the percentage of NA in each column of df1, add the following code to the above snippet −

x1<-sample(c(NA,1,2),20,replace=TRUE)
x2<-sample(c(NA,5),20,replace=TRUE)
x3<-sample(c(NA,10,12),20,replace=TRUE)
df1<-data.frame(x1,x2,x3)
(colMeans(is.na(df1)))*100

Output

If you execute all the above given codes as a single program, it generates the following output −

x1 x2 x3
45 40 25

Example 2

Following snippet creates a sample data frame −

y1<-sample(c(NA,rnorm(2)),20,replace=TRUE)
y2<-sample(c(NA,rnorm(2)),20,replace=TRUE)
df2<-data.frame(y1,y2)
df2

Output

The following dataframe is created −

     y1        y2
1  -1.407410   NA
2  -1.771819   NA
3  -1.771819   NA
4         NA  -0.05582021
5         NA   NA
6  -1.407410  -0.05582021
7         NA   NA
8         NA  -0.05582021
9  -1.407410   1.19697209
10 -1.407410   NA
11 -1.771819  -0.05582021
12        NA   NA
13 -1.771819   NA
14 -1.771819  -0.05582021
15        NA  -0.05582021
16 -1.407410   1.19697209
17 -1.771819  -0.05582021
18        NA   NA
19 -1.407410  -0.05582021
20 -1.407410   1.19697209

To find the percentage of NA in each column of df2, add the following code to the above snippet −

y1<-sample(c(NA,rnorm(2)),20,replace=TRUE)
y2<-sample(c(NA,rnorm(2)),20,replace=TRUE)
df2<-data.frame(y1,y2)
(colMeans(is.na(df2)))*100

Output

If you execute all the above given codes as a single program, it generates the following output −

y1 y2
35 45

Example 3

Following snippet creates a sample data frame −

z1<-sample(c(NA,round(runif(2,1,5),2)),20,replace=TRUE)
z2<-sample(c(NA,round(runif(2,2,10),2)),20,replace=TRUE)
z3<-sample(c(NA,round(runif(2,5,10),2)),20,replace=TRUE)
df3<-data.frame(z1,z2,z3)
df3

Output

The following dataframe is created −

    z1    z2     z3
1  1.69  2.76    NA
2    NA  7.59    NA
3    NA  2.76  9.13
4  4.24    NA  9.13
5  1.69    NA  9.13
6    NA  2.76  8.85
7    NA  7.59    NA
8    NA    NA  9.13
9    NA  7.59    NA
10 1.69  2.76    NA
11 4.24  7.59  8.85
12 1.69    NA  8.85
13 4.24    NA    NA
14   NA    NA  8.85
15 4.24  7.59  9.13
16 4.24  7.59    NA
17 1.69  2.76  9.13
18   NA    NA  9.13
19 4.24  2.76  8.85
20 4.24    NA    NA

To find the percentage of NA in each column of df3, add the following code to the above snippet −

z1<-sample(c(NA,round(runif(2,1,5),2)),20,replace=TRUE)
z2<-sample(c(NA,round(runif(2,2,10),2)),20,replace=TRUE)
z3<-sample(c(NA,round(runif(2,5,10),2)),20,replace=TRUE)
df3<-data.frame(z1,z2,z3)
(colMeans(is.na(df3)))*100

Output

If you execute all the above given codes as a single program, it generates the following output −

z1 z2 z3
40 40 40

Updated on: 06-Nov-2021

8K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements