SciPy - Home
SciPy - Introduction
SciPy - Environment Setup
SciPy - Basic Functionality
SciPy - Relationship with NumPy
SciPy Clusters
SciPy - Clusters
SciPy - Hierarchical Clustering
SciPy - K-means Clustering
SciPy - Distance Metrics
SciPy Constants
SciPy - Constants
SciPy - Mathematical Constants
SciPy - Physical Constants
SciPy - Unit Conversion
SciPy - Astronomical Constants
SciPy - Fourier Transforms
SciPy - FFTpack
SciPy - Discrete Fourier Transform (DFT)
SciPy - Fast Fourier Transform (FFT)
SciPy Integration Equations
SciPy - Integrate Module
SciPy - Single Integration
SciPy - Double Integration
SciPy - Triple Integration
SciPy - Multiple Integration
SciPy Differential Equations
SciPy - Differential Equations
SciPy - Integration of Stochastic Differential Equations
SciPy - Integration of Ordinary Differential Equations
SciPy - Discontinuous Functions
SciPy - Oscillatory Functions
SciPy - Partial Differential Equations
SciPy Interpolation
SciPy - Interpolate
SciPy - Linear 1-D Interpolation
SciPy - Polynomial 1-D Interpolation
SciPy - Spline 1-D Interpolation
SciPy - Grid Data Multi-Dimensional Interpolation
SciPy - RBF Multi-Dimensional Interpolation
SciPy - Polynomial & Spline Interpolation
SciPy Curve Fitting
SciPy - Curve Fitting
SciPy - Linear Curve Fitting
SciPy - Non-Linear Curve Fitting
SciPy - Input & Output
SciPy - Input & Output
SciPy - Reading & Writing Files
SciPy - Working with Different File Formats
SciPy - Efficient Data Storage with HDF5
SciPy - Data Serialization
SciPy Linear Algebra
SciPy - Linalg
SciPy - Matrix Creation & Basic Operations
SciPy - Matrix LU Decomposition
SciPy - Matrix QU Decomposition
SciPy - Singular Value Decomposition
SciPy - Cholesky Decomposition
SciPy - Solving Linear Systems
SciPy - Eigenvalues & Eigenvectors
SciPy Image Processing
SciPy - Ndimage
SciPy - Reading & Writing Images
SciPy - Image Transformation
SciPy - Filtering & Edge Detection
SciPy - Top Hat Filters
SciPy - Morphological Filters
SciPy - Low Pass Filters
SciPy - High Pass Filters
SciPy - Bilateral Filter
SciPy - Median Filter
SciPy - Non - Linear Filters in Image Processing
SciPy - High Boost Filter
SciPy - Laplacian Filter
SciPy - Morphological Operations
SciPy - Image Segmentation
SciPy - Thresholding in Image Segmentation
SciPy - Region-Based Segmentation
SciPy - Connected Component Labeling
SciPy Optimize
SciPy - Optimize
SciPy - Special Matrices & Functions
SciPy - Unconstrained Optimization
SciPy - Constrained Optimization
SciPy - Matrix Norms
SciPy - Sparse Matrix
SciPy - Frobenius Norm
SciPy - Spectral Norm
SciPy Condition Numbers
SciPy - Condition Numbers
SciPy - Linear Least Squares
SciPy - Non-Linear Least Squares
SciPy - Finding Roots of Scalar Functions
SciPy - Finding Roots of Multivariate Functions
SciPy - Signal Processing
SciPy - Signal Filtering & Smoothing
SciPy - Short-Time Fourier Transform
SciPy - Wavelet Transform
SciPy - Continuous Wavelet Transform
SciPy - Discrete Wavelet Transform
SciPy - Wavelet Packet Transform
SciPy - Multi-Resolution Analysis
SciPy - Stationary Wavelet Transform
SciPy - Statistical Functions
SciPy - Stats
SciPy - Descriptive Statistics
SciPy - Continuous Probability Distributions
SciPy - Discrete Probability Distributions
SciPy - Statistical Tests & Inference
SciPy - Generating Random Samples
SciPy - Kaplan-Meier Estimator Survival Analysis
SciPy - Cox Proportional Hazards Model Survival Analysis
SciPy Spatial Data
SciPy - Spatial
SciPy - Special Functions
SciPy - Special Package
SciPy Advanced Topics
SciPy - CSGraph
SciPy - ODR
SciPy Useful Resources
SciPy - Reference
SciPy - Quick Guide
SciPy - Cheatsheet
SciPy - Useful Resources
SciPy - Discussion

SciPy - Stats

Quiz

SciPy Stats is a module within the SciPy library in Python specifically designed for statistical analysis. SciPy is a powerful library used for scientific and numerical computations and the scipy.stats module provides a wide range of statistical tools, probability distributions and functions for conducting statistical operations and analysis.

Key Features of SciPy Stats

The key features of SciPy Stats include a wide range of statistical tools and functions designed to facilitate data analysis and hypothesis testing. Following are the main features of the scipy.stats module −

Probability Distributions in SciPy Stats

The scipy.stats module provides a comprehensive set of probability distributions including continuous and discrete distributions. These distributions allow for probability calculations, data modeling and statistical analysis in Python.

Types of Probability Distributions in SciPy Stats

The scipy.stats module provides a variety of probability distributions, categorized into two main types which are mentioned as follows −

Continuous Probability Distributions: Distributions that take an infinite number of values over a continuous range.
Discrete Probability Distributions: Distributions that take specific, countable values.

The Probability Distributions can be performed with the help of functions available in scipy.stats.norm module. Following are the core functions that are applicable to most probability distributions in SciPy whether they are continuous or discrete −

S.No	Function & Description
1	scipy.stats.norm.pdf() Calculates the likelihood of a continuous random variable at a specific point.
2	scipy.stats.norm.cdf() Calculates the probability that a random variable is x.
3	scipy.stats.norm.ppf() Returns the value corresponding to a given cumulative probability.
4	scipy.stats.norm.sf() Returns the probability that a random variable is > x (1 - CDF).
5	scipy.stats.norm.isf() Returns the value corresponding to a given tail probability (1 - CDF).
6	scipy.stats.norm.rvs() Generates random samples from the normal distribution.
7	scipy.stats.norm.fit() Estimates the mean and standard deviation of the given data.
8	scipy.stats.norm.mean() Returns the theoretical mean of the normal distribution.
9	scipy.stats.norm.var() Returns the theoretical variance of the normal distribution.

Statistical Tests (Hypothesis Testing)

Statistical Tests (Hypothesis Testing) refer to the process of making inferences or decisions about a population based on sample data. The core concept involves comparing two hypotheses such as the null hypothesis (H) which suggests no effect or difference and the alternative hypothesis (H) which suggests that there is an effect or difference.

Based on the data, a statistical test evaluates the strength of the evidence against the null hypothesis. Below are the key functions available in scipy.stats module to perform Statistical Tests −

S.No.	Function & Description
1	scipy.stats.ttest_1samp() Performs a one-sample t-test to compare the sample mean to a known population mean.
2	scipy.stats.ttest_ind() Performs an independent two-sample t-test to compare means from two independent groups.
3	scipy.stats.ttest_rel() Performs a paired t-test to compare means from two related samples.
4	scipy.stats.chi2_contingency() Performs a Chi-square test for independence on a contingency table.
5	scipy.stats.f_oneway() Performs a one-way ANOVA test to compare means of two or more groups.
6	scipy.stats.levene() Tests for equality of variances across groups i.e., homogeneity of variance test.
7	scipy.stats.shapiro() Performs the Shapiro-Wilk test for normality of the dataset.
8	scipy.stats.ks_1samp() Performs a one-sample Kolmogorov-Smirnov test to compare a sample with a distribution.
9	scipy.stats.ks_2samp() Performs a two-sample Kolmogorov-Smirnov test to compare two independent samples.
10	scipy.stats.mannwhitneyu() Performs the Mann-Whitney U test, a non-parametric test for comparing two independent samples.
11	scipy.stats.wilcoxon() Performs the Wilcoxon signed-rank test for comparing two related samples.
12	scipy.stats.pearsonr() Computes Pearson's correlation coefficient and p-value for testing non-correlation.
13	scipy.stats.spearmanr() Computes Spearman's rank correlation coefficient.
14	scipy.stats.kruskal() Performs the Kruskal-Wallis H-test for comparing two or more independent samples.
15	scipy.stats.friedmanchisquare() Performs the Friedman test for repeated measures across multiple conditions.

Descriptive Statistics

Descriptive statistics involves techniques to summarize and present the key characteristics of a data set. It helps to interpret the data by highlighting its distribution, central tendency and variability. This branch of statistics includes various summary measures such as central tendency indicators (mean, median, mode), measures of spread (range, variance, standard deviation) and characteristics of distribution (skewness, kurtosis).

Here are the functions available in scipy.stats module which are used to perform Descriptive Statistics −

S.No	Function & Description
1	scipy.stats.tmean(data) Computes the arithmetic mean of the dataset.
2	scipy.stats.median(data) Finds the middle value of the dataset when sorted.
3	scipy.stats.mode(data) Returns the most frequently occurring value in the dataset.
4	scipy.stats.tvar(data) Calculates the variance of the dataset.
5	scipy.stats.tstd(data) Computes the standard deviation of the dataset.
6	scipy.stats.iqr(data) Computes the interquartile range (IQR) of the dataset.
7	scipy.stats.skew(data) Measures the asymmetry of the data distribution.
8	scipy.stats.kurtosis(data) Evaluates the "tailedness" of the distribution.
9	scipy.stats.scoreatpercentile(data, q) Returns the value below which a certain percentage of observations fall.
10	scipy.stats.mstats.mquantiles(data) Computes the quantiles of the dataset.
11	scipy.stats.trim_mean(data, proportiontocut) Calculates the mean after removing a proportion of the smallest and largest values.
12	scipy.stats.tmin(data) Returns the minimum and maximum values in the dataset.
12	scipy.stats.tmax(data) Returns the minimum and maximum values in the dataset.

Correlation & Regression Analysis in SciPy Stats

Correlation and regression analysis are powerful statistical methods used to examine the relationship between two or more variables. These techniques help identify patterns, assess the strength of associations and make predictions based on data.

S.No.	Function and Description
1	scipy.stats.pearsonr() Calculates the Pearson correlation coefficient and p-value for testing non-correlation.
2	scipy.stats.spearmanr() Computes the Spearman rank-order correlation coefficient.
3	scipy.stats.kendalltau() Calculates Kendalls Tau, a correlation measure for ordinal data.
4	scipy.stats.linregress() Performs simple linear regression and returns slope, intercept, and other statistics.
5	scipy.stats.pointbiserialr() Computes the point-biserial correlation coefficient for binary and continuous data.
6	scipy.stats.variation() Calculates the coefficient of variation (CV), which measures relative variability.
7	scipy.stats.ttest_ind() Performs an independent t-test to compare means of two independent samples.
8	scipy.stats.ttest_rel() Performs a paired t-test to compare means of related samples.
9	scipy.stats.f_oneway() Performs a one-way ANOVA test to compare means of multiple groups.
10	scipy.stats.chisquare() Performs the chi-square test for goodness-of-fit.
11	scipy.stats.chi2_contingency() Performs the chi-square test for independence between categorical variables.
12	scipy.stats.mannwhitneyu() Performs the Mann-Whitney U test for comparing two independent distributions.
13	scipy.stats.wilcoxon() Performs the Wilcoxon signed-rank test for paired samples.

Random Sampling in SciPy Stats

Random sampling is a fundamental technique in statistics used to select a subset of individuals from a population or dataset for analysis. SciPy provides various methods for generating random samples from different probability distributions −

S.No.	Function and Description
1	scipy.stats.uniform.rvs() Generates random samples from a uniform distribution over the interval [0, 1).
2	scipy.stats.norm.rvs() Generates random samples from a normal (Gaussian) distribution with a given mean (loc) and standard deviation (scale).
3	scipy.stats.randint.rvs() Generates random integers from a discrete uniform distribution between low (inclusive) and high (exclusive).
4	scipy.stats.binom.rvs() Generates random samples from a binomial distribution with parameters n (number of trials) and p (probability of success).
5	scipy.stats.poisson.rvs() Generates random samples from a Poisson distribution with rate parameter mu (mean number of events).
6	scipy.stats.expon.rvs() Generates random samples from an exponential distribution with a given scale parameter (inverse of rate).
7	scipy.stats.beta.rvs() Generates random samples from a Beta distribution with shape parameters a and b.
8	scipy.stats.gamma.rvs() Generates random samples from a Gamma distribution with shape and scale parameters.
9	scipy.stats.chi2.rvs() Generates random samples from a Chi-square distribution with df degrees of freedom.
10	scipy.stats.f.rvs() Generates random samples from an F-distribution with dfnum and dfden degrees of freedom.
11	scipy.stats.t.rvs() Generates random samples from a Students t-distribution with df degrees of freedom.
12	scipy.stats.weibull_min.rvs() Generates random samples from a Weibull distribution with shape parameter c.
13	scipy.stats.dirichlet.rvs() Generates random samples from a Dirichlet distribution with concentration parameter alpha.

Data Ranking & Scaling in SciPy Stats

Data ranking and scaling are important techniques in statistics to adjust the scale of data for comparison or to assess the relative positions of observations. Ranking involves ordering data, while scaling adjusts the range or distribution to standardize or normalize it.

S.No.	Function and Description
1	scipy.stats.rankdata() Ranks the values in an array, with ties receiving the average rank.
2	scipy.stats.zscore() Standardizes an array by scaling it to have zero mean and unit variance.
3	scipy.stats.mstats.rankdata() Ranks data using masked arrays by handling missing or invalid values properly.
4	scipy.stats.mstats.zscore() Standardizes masked data arrays by transforming to zero mean and unit variance.
5	scipy.stats.percentileofscore() Calculates the percentile rank of a score within a given dataset.
6	scipy.stats.trim_mean() Computes the mean of a dataset after removing a given proportion of the smallest and largest values.

Print Page