- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to deal with error “Error in shapiro.test(…) : sample size must be between 3 and 5000” in R?
The shapiro.test has a restriction in R that it can be applied only up to a sample of size 5000 and the least sample size must be 3. Therefore, we have an alternative hypothesis test called Anderson Darling normality test. To perform this test, we need load nortest package and use the ad.test function as shown in the below examples.
Consider the below data frame −
Example
x<-rnorm(1000000) df1<-data.frame(x) head(df1,20)
Output
x 1 1.27305105 2 1.79910461 3 -1.05456918 4 0.27247323 5 -1.22709375 6 1.87211271 7 -0.98918543 8 -0.98504275 9 0.55901414 10 1.17920161 11 -0.16612397 12 -0.89614357 13 -0.70229748 14 1.16583130 15 -0.17427556 16 0.05428080 17 1.26193927 18 0.63517470 19 -0.02052002 20 -1.23316924
Performing shapiro.test on x −
shapiro.test(df1$x)
Error in shapiro.test(df1$x) : sample size must be between 3 and 5000
Loading nortest package and performing Anderson Darling test on x −
library(nortest) ad.test(df1$x) Anderson-Darling normality test data: df1$x A = 0.21458, p-value = 0.8496
Example
y<-sample(0:9,500000,replace=TRUE) df2<-data.frame(y) head(df2,20)
Output
y 1 8 2 9 3 7 4 0 5 3 6 4 7 9 8 3 9 1 10 5 11 9 12 4 13 5 14 9 15 5 16 7 17 1 18 0 19 4 20 4
Performing Anderson Darling test on y −
ad.test(df2$y) Anderson-Darling normality test data: df2$y A = 8634.6, p-value < 2.2e-16
Advertisements