How to deal with error “Error in shapiro.test(…) : sample size must be between 3 and 5000” in R?

R Programming Server Side Programming Programming

The shapiro.test has a restriction in R that it can be applied only up to a sample of size 5000 and the least sample size must be 3. Therefore, we have an alternative hypothesis test called Anderson Darling normality test. To perform this test, we need load nortest package and use the ad.test function as shown in the below examples.

Consider the below data frame −

Example

Live Demo

x<-rnorm(1000000)
df1<-data.frame(x)
head(df1,20)

Output

     x
1    1.27305105
2    1.79910461
3   -1.05456918
4    0.27247323
5   -1.22709375
6    1.87211271
7   -0.98918543
8   -0.98504275
9    0.55901414
10   1.17920161
11  -0.16612397
12  -0.89614357
13  -0.70229748
14   1.16583130
15  -0.17427556
16   0.05428080
17   1.26193927
18   0.63517470
19  -0.02052002
20  -1.23316924

Performing shapiro.test on x −

shapiro.test(df1$x)

Error in shapiro.test(df1$x) : sample size must be between 3 and 5000

Loading nortest package and performing Anderson Darling test on x −

library(nortest)
ad.test(df1$x)
   Anderson-Darling normality test
data: df1$x
A = 0.21458, p-value = 0.8496

Example

Live Demo

y<-sample(0:9,500000,replace=TRUE)
df2<-data.frame(y)
head(df2,20)

Output

Performing Anderson Darling test on y −

ad.test(df2$y)
   Anderson-Darling normality test
data: df2$y
A = 8634.6, p-value < 2.2e-16

Nizamuddin Siddiqui

Updated on: 2021-02-08T05:22:48+05:30

4K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started