- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find the standard deviation if NA’s are present in a column of an R data frame?
If there exists an NA in a vector or column of an R data frame, the output of the sd command for standard deviation results in NA. To solve this problem, we need to use na.rm=TRUE as we do it for vectors that do not contain missing values. For example, if we have a column of a data frame df defined as x that contains missing values then sd of x can be calculated as sd(df$x).
Example
Consider the below data frame:
> set.seed(3521) > x<-c(NA,rnorm(19,5,0.34)) > df1<-data.frame(x) > df1
Output
x 1 NA 2 5.107864 3 4.797851 4 5.184345 5 4.680958 6 5.245151 7 5.760667 8 4.924365 9 5.770071 10 5.313064 11 4.564939 12 4.139275 13 4.997252 14 4.991125 15 5.402940 16 5.020513 17 4.644727 18 4.766003 19 5.658426 20 4.939198
Example
> sd(df1$x)
Output
[1] NA
Finding the standard deviation of x by ignoring NA value:
Example
> sd(df1$x,na.rm=TRUE)
Output
[1] 0.4210732
Let’s have a look at another example:
Example
> z<-sample(c(NA,5,8,7,4,1),20,replace=TRUE) > df2<-data.frame(z) > df2
Output
z 1 NA 2 8 3 4 4 4 5 NA 6 8 7 NA 8 1 9 8 10 8 11 1 12 NA 13 7 14 4 15 1 16 5 17 4 18 5 19 NA 20 7
Example
> sd(df2$z,na.rm=TRUE)
Output
[1] 2.618615
Advertisements