Which function of scipy.cluster.vq module is used to normalize observations on each feature dimension?

Before implementing k-means algorithms, it is always beneficial to rescale each feature dimension of the observation set. The function scipy.cluster.vq.whiten(obs, check_finite = True)is used for this purpose. To give it unit variance, it divides each feature dimension of the observation by its standard deviation (SD).


Below are given the parameters of the function scipy.cluster.vq.whiten(obs, check_finite = True) −

  • obs− ndarray

It is an array, to be rescaled, where each row is an observation, and the columns are the features seen during each observation. The example is given below −

obs = [[ 1., 1., 1.],
   [ 2., 2., 2.],
   [ 3., 3., 3.],
   [ 4., 4., 4.]]
  • check_finite− bool,optional

This parameter is used to check whether the input matrices contain only finite numbers. Disabling this parameter may give you a performance gain but it may also result in some problems like crashes or non-termination if the observations do contain infinites. The default value of this parameter is True.


It returns an array which contains the values in obs scaled by the SD of each column.


import numpy as np
from scipy.cluster.vq import whiten
observations = np.array([[2.9, 1.3, 1.9],
   [1.7, 3.2, 1.1],
   [1.0, 0.2, 1.7,]])


array([[3.69627581, 1.04908478, 5.58930985],
   [2.16678237, 2.58236253, 3.23591623],
   [1.27457787, 0.16139766, 5.00096145]])