Compute the histogram of a set of data using NumPy in Python


A Histogram is the graphical representation of the dataset distribution. It represents the data in the form of series of bars, where the range of data values represented by each bar and height of the bar represents the frequency of the data values defined within the range.

These are mainly used to represent the distribution of the numerical data like grades in a class, distribution of the population or distribution of the incomes of the employees etc.

In histogram, x-axis represents the range of data values, divided into intervals and the y-axis represents the frequency of the range of data values within each bin. Histograms can be normalized by dividing the frequency of each bin by the total data values, which results to the relative frequency histogram where y-axis represents the data values of each bin.

Calculating histogram using Python Numpy

In python, for creating the histograms we have numpy, matplotlib and seaborn libraries. In Numpy, we have the function named histogram() to work with the histogram data.

Syntax

Following is the syntax for creating the histograms for the given range of data.

numpy.histogram(arr, bins, range, normed, weights, density)

Where,

  • arr is the input array

  • bins is the number of bars to be in the graph to represent the data

  • range defines the range of values to be in the histogram

  • normed is in favor of the density parameter

  • weights is the optional parameter which weights for each data value

  • Density is the parameter to normalize the histogram data to form probability density.

The output of the histogram function will be a tuple containing the histogram counts and bin edges.

Example

In the following example, we are creating a histogram using the Numpy histogram() function. Here, we are passing an array as the input parameter, define bins as 10 so the histogram will be created with 10 bins and the remaining parameters can be kept as none.

import numpy as np
arr = np.array([10,20,25,40,35,23])
hist = np.histogram(arr,bins = 10)
print("The histogram created:",hist)

Output

The histogram created: (array([1, 0, 0, 1, 1, 1, 0, 0, 1, 1], dtype=int64), array([10., 13., 16., 19., 22., 25., 28., 31., 34., 37., 40.]))

Example

Let’s see another example to understand the histogram() function of the numpy library.

import numpy as np
arr = np.array([[20,20,25],[40,35,23],[34,22,1]])
hist = np.histogram(arr,bins = 20)
print("The histogram created:",hist)

Output

The histogram created: (array([1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 1, 1, 0, 0, 0,
1, 1, 0, 1],
 dtype=int64), array([ 1. , 2.95, 4.9 , 6.85, 8.8 , 10.75, 12.7 ,
14.65, 16.6 ,
 18.55, 20.5 , 22.45, 24.4 , 26.35, 28.3 , 30.25, 32.2 , 34.15,
 36.1 , 38.05, 40. ]))

Example

In this example, we are creating a histogram by specifying the bins and also the range of data to be used. The following code can be taken as a reference.

import numpy as np
arr = np.array([[20,20,25],[40,35,23],[34,22,1]])
hist = np.histogram(arr,bins = 20, range = (1,10))
print("The histogram created:", hist)

Output

The histogram created: (array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0],
 dtype=int64), array([ 1. , 1.45, 1.9 , 2.35, 2.8 , 3.25, 3.7 ,4.15, 4.6 ,
 5.05, 5.5 , 5.95, 6.4 , 6.85, 7.3 , 7.75, 8.2 , 8.65,
 9.1 , 9.55, 10. ]))

Updated on: 07-Aug-2023

61 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements