 
 Data Structure Data Structure
 Networking Networking
 RDBMS RDBMS
 Operating System Operating System
 Java Java
 MS Excel MS Excel
 iOS iOS
 HTML HTML
 CSS CSS
 Android Android
 Python Python
 C Programming C Programming
 C++ C++
 C# C#
 MongoDB MongoDB
 MySQL MySQL
 Javascript Javascript
 PHP PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Why mean is NaN even if na.rm is set to TRUE using dplyr in R?
If na.rm is set to TRUE using dplyr package then the output for statistical operations returns NaN. To avoid this, we need to exclude na.rm. Follow below steps to understand the difference between the tw −
- First of all, create a data frame.
- Summarise the data frame with na.rm set to TRUE if NA exists in the data frame.
- Summarise the data frame without setting na.rm to TRUE.
Create the data frame
Let's create a data frame as shown below −
Group&li;-rep(c("First","Second","Third"),times=c(3,10,7))
Response&li;-rep(c(NA,3,4,5,7,8),times=c(3,2,5,2,4,4))
df&li;-data.frame(Group,Response)
df
On executing, the above script generates the below output(this output will vary on your system due to randomization) −
Group Response 1 First NA 2 First NA 3 First NA 4 Second 3 5 Second 3 6 Second 4 7 Second 4 8 Second 4 9 Second 4 10 Second 4 11 Second 5 12 Second 5 13 Second 7 14 Third 7 15 Third 7 16 Third 7 17 Third 8 18 Third 8 19 Third 8 20 Third 8
Summarising data frame with na.rm set to TRUE
Loading dplyr package and summarise the data frame df with mean of Response per group −
library(dplyr)
Group<-rep(c("First","Second","Third"),times=c(3,10,7))
Response<-rep(c(NA,3,4,5,7,8),times=c(3,2,5,2,4,4))
df<-data.frame(Group,Response)
df%>%group_by(Group)%>%summarise(mean=mean(Response,na.rm=TRUE))
# A tibble: 3 x 2 Group mean <chr> <dbl> 1 First NaN 2 Second 4.3 3 Third 7.57
Summarising the data frame without setting na.rm to TRUE
Summarise the data frame df with mean of Response per group without setting na.rm to TRUE −
Group<-rep(c("First","Second","Third"),times=c(3,10,7))
Response<-rep(c(NA,3,4,5,7,8),times=c(3,2,5,2,4,4))
df<-data.frame(Group,Response)
df%>%group_by(Group)%>%summarise(mean=mean(Response))
# A tibble: 3 x 2 Group mean <chr> <dbl> 1 First NA 2 Second 4.3 3 Third 7.57
Advertisements
                    