Mathematical statistics functions in Python

PythonServer Side ProgrammingProgramming

The statistics module of Python library consists of functions to calculate statistical formulae using numeric data types including Fraction and Decimal types.

Following import statement is needed to use functions described in this article.

>>> from statistics import *

Following functions calculate the central tendency of sample data.

mean() − This function calculates the arithmetic mean of data in the form of sequence or iterator.

>>> from statistics import mean
>>> numbers = [12,34,21,7,56]
>>> mean(numbers)

The sample data may contain Decimal object or Fraction object

>>> from decimal import Decimal
>>> numbers = [12,34,21,Decimal('7'),56]
>>> mean(numbers)
>>> from fractions import Fraction
>>> numbers = [12,20.55,Fraction(4,5),21,56]
>>> mean(numbers)

harmonic_mean() − Harmonic mean is calculated by taking the arithmetic mean of reciprocals of elements in sample data and then taking reciprocal of the arithmetic mean itself.

Sample = [1,2,3,4,5]

Reciprocals = [1/1, 1/2, 1/3, 1/4, 1/5] = 2.28333333333

mean = 2.28333333333/5 = 0. 45666666666666667

Harmonic mean = 1 / 45666666666666667 = 2.189784218663093

>>> harmonic_mean([1,2,3,4,5])

median() − Median is the middle value of the sample data. The data is arranged automatically in the ascending order to find the median. If the count of elements is odd, the median is the middle value. If the count is odd, the mean of two middlemost numbers is the median.

>>> median([2,5,4,8,6])
>>> median([11,33,66,55,88,22])

mode() − This function returns the most common value in the sample. This function can be applied to numeric or non-numeric data.

>>> mode((4,7,8,4,9,7,12,4,8))
>>> mode(['cc','aa','dd','cc','ff','cc'])

Following function deal with the measure of dispersion of elements in the sample from central value.

variance() − This function reflects the variability or dispersion of data in the sample. Large variance means data is scattered. Smaller variance indicates that data is closely clustered.

Following is the procedure to find the variance

  • Find arithmetic mean of all elements in the sample.
  • Find the square of the difference between the mean and each element and add the squares.
  • Divide the sum by n-1 if the sample size is n to get the variance

Mathematically, the above procedure is represented by the following formula −

$$s^2 = \frac{1}{n-1}\displaystyle\displaystyle\sum\limits_{i=1}^n(x_{i}-\overline{x})^2$$

Thankfully variance() function does the computation of the above formula for you.

>>> num = [4, 9, 2, 11, 5, 22, 90, 32, 56, 70]
>>> variance(num)

stdev() − This function returns the standard deviation of data in the sample. Standard deviation is the square root of the variance.

>>> num = [4, 9, 2, 11, 5, 22, 90, 32, 56, 70]
>>> stdev(num)
Published on 05-Apr-2019 09:38:57