Statistics - Goodness of Fit



The Goodness of Fit test is used to check the sample data whether it fits from a distribution of a population. Population may have normal distribution or Weibull distribution. In simple words, it signifies that sample data represents the data correctly that we are expecting to find from actual population. Following tests are generally used by statisticians:

  • Chi-square

  • Kolmogorov-Smirnov

  • Anderson-Darling

  • Shipiro-Wilk

Chi-square Test

The chi-square test is the most commonly used to test the goodness of fit tests and is used for discrete distributions like the binomial distribution and the Poisson distribution, whereas The Kolmogorov-Smirnov and Anderson-Darling goodness of fit tests are used for continuous distributions.

Formula

${ X^2 = \sum {[ \frac{(O_i - E_i)^2}{E_i}]} }$

Where −

  • ${O_i}$ = observed value of i th level of variable.

  • ${E_i}$ = expected value of i th level of variable.

  • ${X^2}$ = chi-squared random variable.

Example

A toy company builts football player toys. It claims that 30% of the cards are mid-fielders, 60% defenders, and 10% are forwards. Considering a random sample of 100 toys has 50 mid-fielders, 45 defenders, and 5 forwards. Given 0.05 level of significance, can you justify company's claim?

Solution:

Determine Hypotheses

  • Null hypothesis $ H_0 $ - The proportion of mid-fielders, defenders, and forwards is 30%, 60% and 10%, respectively.

  • Alternative hypothesis $ H_1 $ - At least one of the proportions in the null hypothesis is false.

Determine Degree of Freedom

The degrees of freedom, DF is equal to the number of levels (k) of the categorical variable minus 1: DF = k - 1. Here levels are 3. Thus

${ DF = k - 1 \\[7pt] \, = 3 -1 = 2 }$

Determine chi-square test statistic

${ X^2 = \sum {[ \frac{(O_i - E_i)^2}{E_i}]} \\[7pt] \, = [\frac{(50-30)^2}{30}] + [\frac{(45-60)^2}{60}] + [\frac{(5-10)^2}{10}] \\[7pt] \, = \frac{400}{30} + \frac{225}{60} + \frac{25}{10} \\[7pt] \, = 13.33 + 3.75 + 2.50 \\[7pt] \, = 19.58 }$

Determine p-value

P-value is the probability that a chi-square statistic,$ X^2 $ having 2 degrees of freedom is more extreme than 19.58. Use the Chi-Square Distribution Calculator to find $ { P(X^2 \gt 19.58) = 0.0001 } $.

Interpret results

As the P-value (0.0001) is quite less than the significance level (0.05), the null hypothesis can not be accepted. Thus company claim is invalid.

Advertisements