How to deal with error “Error in shapiro.test(…) : sample size must be between 3 and 5000” in R?


The shapiro.test has a restriction in R that it can be applied only up to a sample of size 5000 and the least sample size must be 3. Therefore, we have an alternative hypothesis test called Anderson Darling normality test. To perform this test, we need load nortest package and use the ad.test function as shown in the below examples.

Consider the below data frame −

Example

 Live Demo

x<-rnorm(1000000)
df1<-data.frame(x)
head(df1,20)

Output

     x
1    1.27305105
2    1.79910461
3   -1.05456918
4    0.27247323
5   -1.22709375
6    1.87211271
7   -0.98918543
8   -0.98504275
9    0.55901414
10   1.17920161
11  -0.16612397
12  -0.89614357
13  -0.70229748
14   1.16583130
15  -0.17427556
16   0.05428080
17   1.26193927
18   0.63517470
19  -0.02052002
20  -1.23316924

Performing shapiro.test on x −

shapiro.test(df1$x)

Error in shapiro.test(df1$x) : sample size must be between 3 and 5000

Loading nortest package and performing Anderson Darling test on x −

library(nortest)
ad.test(df1$x)
   Anderson-Darling normality test
data: df1$x
A = 0.21458, p-value = 0.8496

Example

 Live Demo

y<-sample(0:9,500000,replace=TRUE)
df2<-data.frame(y)
head(df2,20)

Output

   y
1  8
2  9
3  7
4  0 
5  3
6  4
7  9
8  3
9  1
10 5
11 9
12 4
13 5
14 9
15 5
16 7
17 1
18 0
19 4
20 4

Performing Anderson Darling test on y −

ad.test(df2$y)
   Anderson-Darling normality test
data: df2$y
A = 8634.6, p-value < 2.2e-16

Updated on: 08-Feb-2021

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements