Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
List a few statistical methods available for a NumPy array
NumPy provides powerful statistical functions for analyzing numerical data efficiently. This article explores the essential statistical methods available for NumPy arrays, from basic descriptive statistics to advanced measures of central tendency and variability.
Statistics involves collecting, analyzing, and interpreting data. NumPy's statistical functions are fundamental tools for data analysis and scientific computing in Python.
Finding Minimum and Maximum Values
The numpy.amin() and numpy.amax() functions return the minimum and maximum values from array elements along specified axes ?
import numpy as np
# Input array
inputArray = np.array([[2, 6, 3], [1, 5, 4], [8, 12, 9]])
print('Input Array:')
print(inputArray)
print()
print("Minimum element:", np.amin(inputArray))
print("Maximum element:", np.amax(inputArray))
print()
print('Minimum along axis 0 (columns):')
print(np.amin(inputArray, 0))
print('Minimum along axis 1 (rows):')
print(np.amin(inputArray, 1))
print()
print('Maximum along axis 0 (columns):')
print(np.amax(inputArray, 0))
print('Maximum along axis 1 (rows):')
print(np.amax(inputArray, axis=1))
Input Array: [[ 2 6 3] [ 1 5 4] [ 8 12 9]] Minimum element: 1 Maximum element: 12 Minimum along axis 0 (columns): [1 5 3] Minimum along axis 1 (rows): [2 1 8] Maximum along axis 0 (columns): [ 8 12 9] Maximum along axis 1 (rows): [ 6 5 12]
Peak-to-Peak Range
The numpy.ptp() function calculates the range (maximum minus minimum) of values. The name "ptp" stands for peak-to-peak ?
import numpy as np
inputArray = np.array([[2, 6, 3], [1, 5, 4], [8, 12, 9]])
print('Input Array:')
print(inputArray)
print()
print('Peak-to-peak range of entire array:')
print(np.ptp(inputArray))
print()
print('Range along axis 1 (rows):')
print(np.ptp(inputArray, axis=1))
print()
print('Range along axis 0 (columns):')
print(np.ptp(inputArray, axis=0))
Input Array: [[ 2 6 3] [ 1 5 4] [ 8 12 9]] Peak-to-peak range of entire array: 11 Range along axis 1 (rows): [4 4 4] Range along axis 0 (columns): [7 7 6]
Percentiles
A percentile indicates the value below which a given percentage of observations fall. The numpy.percentile() function computes the nth percentile along specified axes.
Syntax
numpy.percentile(a, q, axis)
Parameters
| Parameter | Description |
|---|---|
| a | Input array |
| q | Percentile to compute (0-100) |
| axis | Axis along which to calculate |
import numpy as np
inputArray = np.array([[20, 45, 70], [30, 25, 50], [10, 80, 90]])
print('Input Array:')
print(inputArray)
print()
print('10th percentile of entire array:')
print(np.percentile(inputArray, 10))
print()
print('10th percentile along axis 1 (rows):')
print(np.percentile(inputArray, 10, axis=1))
print()
print('10th percentile along axis 0 (columns):')
print(np.percentile(inputArray, 10, axis=0))
Input Array: [[20 45 70] [30 25 50] [10 80 90]] 10th percentile of entire array: 18.0 10th percentile along axis 1 (rows): [25. 26. 24.] 10th percentile along axis 0 (columns): [12. 29. 54.]
Median Values
The median is the middle value that separates the higher and lower halves of a dataset. The numpy.median() function calculates medians for arrays ?
import numpy as np
inputArray = np.array([[20, 45, 70], [30, 25, 50], [10, 80, 90]])
print('Input Array:')
print(inputArray)
print()
print('Median of entire array:')
print(np.median(inputArray))
print()
print('Median along axis 0 (columns):')
print(np.median(inputArray, axis=0))
print()
print('Median along axis 1 (rows):')
print(np.median(inputArray, axis=1))
Input Array: [[20 45 70] [30 25 50] [10 80 90]] Median of entire array: 45.0 Median along axis 0 (columns): [20. 45. 70.] Median along axis 1 (rows): [45. 30. 80.]
Arithmetic Mean
The arithmetic mean is the sum of elements divided by the count. The numpy.mean() function calculates means along specified axes ?
import numpy as np
inputArray = np.array([[20, 45, 70], [30, 25, 50], [10, 80, 90]])
print('Input Array:')
print(inputArray)
print()
print('Mean of entire array:')
print(np.mean(inputArray))
print()
print('Mean along axis 0 (columns):')
print(np.mean(inputArray, axis=0))
print()
print('Mean along axis 1 (rows):')
print(np.mean(inputArray, axis=1))
Input Array: [[20 45 70] [30 25 50] [10 80 90]] Mean of entire array: 46.666666666666664 Mean along axis 0 (columns): [20. 50. 70.] Mean along axis 1 (rows): [45. 35. 60.]
Weighted Average
The numpy.average() function computes weighted averages when weights are specified. Without weights, it behaves like mean() ?
import numpy as np
inputArray = np.array([1, 2, 3, 4])
print('Input Array:')
print(inputArray)
print()
print('Average of all elements:')
print(np.average(inputArray))
print()
# With weights
weights = np.array([1, 2, 3, 4])
print('Weighted average:')
print(np.average(inputArray, weights=weights))
Input Array: [1 2 3 4] Average of all elements: 2.5 Weighted average: 3.0
Standard Deviation and Variance
Standard deviation measures data spread around the mean, calculated as the square root of variance. Variance is the average of squared deviations from the mean.
import numpy as np
inputArray = [1, 2, 3, 4]
print("Input Array =", inputArray)
print("Standard deviation =", np.std(inputArray))
print("Variance =", np.var(inputArray))
print()
# Manual calculation for verification
mean_val = np.mean(inputArray)
print(f"Mean = {mean_val}")
print(f"Manual variance = {np.mean((inputArray - mean_val)**2)}")
Input Array = [1, 2, 3, 4] Standard deviation = 1.118033988749895 Variance = 1.25 Mean = 2.5 Manual variance = 1.25
Conclusion
NumPy's statistical functions provide comprehensive tools for data analysis, from basic descriptive statistics like mean and median to advanced measures like percentiles and standard deviation. These functions support axis-specific calculations, making them powerful for multi-dimensional data analysis.
