 
 Data Structure Data Structure
 Networking Networking
 RDBMS RDBMS
 Operating System Operating System
 Java Java
 MS Excel MS Excel
 iOS iOS
 HTML HTML
 CSS CSS
 Android Android
 Python Python
 C Programming C Programming
 C++ C++
 C# C#
 MongoDB MongoDB
 MySQL MySQL
 Javascript Javascript
 PHP PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Scipy Articles - Page 4 of 5
 
 
			
			518 Views
The scipy.cluster.vq()has two methods to implement k-means clustering namely kmeans() and kmeans2(). There is a significant difference in the working of both these methods. Let us understand it −scipy.cluster.vq.kmeans(obs, k_or_guess, iter=20, thresh=1e-05, check_finite=True)− The kmeans() method forms k clusters by performing k-means algorithm on a set of observation vectors. To determine the stability of the centroids, this method uses a threshold value to compare the change in average Euclidean distance between the observations and their corresponding centroids. The output of this method is a code book mapping centroid to codes and vice versa.scipy.cluster.vq.kmeans2(data, k, iter=10, thresh=1e-05, minit='random', missing='warn', check_finite=True)− The ... Read More
 
 
			
			243 Views
scipy.cluster.vq.kmeans2(data, k, iter=10, thresh=1e-05, minit='random', missing='warn', check_finite=True)− The kmeans2() method classify a set of observations vectors into k clusters by performing k-means algorithm. To check for convergence, the kmeans2() method does not use threshold values. It has additional parameters to decide the method of initialization of centroids, to handle empty clusters, and to validate if the input metrices contain only finite numbers or not.Below is given the detailed explanation of its parameters −Parametersdata− ndarrayIt is an ‘M’ by ‘N’ array of M observations in N dimension.k− int or ndarrayThis parameter represents the number of clusters to form and the centroids ... Read More
 
 
			
			222 Views
The scipy.cluster.vq.kmeans(obs, k_or_guess, iter=20, thresh=1e- 05, check_finite=True)method forms k clusters by performing a k-means algorithm on a set of observation vectors. To determine the stability of the centroids, this method uses a threshold value to compare the change in average Euclidean distance between the observations and their corresponding centroids. The output of this method is a code book mapping centroid to codes and vice versa.Below is given the detailed explanation of its parameters−Parametersobs− ndarrayIt is an ‘M’ by ‘N’ array where each row is an observation, and the columns are the features seen during each observation. Before using, these features ... Read More
 
 
			
			205 Views
Before implementing k-means algorithms, the scipy.cluster.vq.vq(obs, code_book, check_finite = True) used to assign codes to each observation from a code book. It first compares each observation vector in the ‘M’ by ‘N’ obs array with the centroids in the code book. Once compared, it assigns the code to the closest centroid. It requires unit variance features in the obs array, which we can achieve by passing them through the scipy.cluster.vq.whiten(obs, check_finite = True)function.ParametersBelow are given the parameters of the function scipy.cluster.vq.vq(obs, code_book, check_finite = True) −obs− ndarrayIt is an ‘M’ by ‘N’ array where each row is an observation, and ... Read More
 
 
			
			158 Views
Before implementing k-means algorithms, it is always beneficial to rescale each feature dimension of the observation set. The function scipy.cluster.vq.whiten(obs, check_finite = True)is used for this purpose. To give it unit variance, it divides each feature dimension of the observation by its standard deviation (SD).ParametersBelow are given the parameters of the function scipy.cluster.vq.whiten(obs, check_finite = True) −obs− ndarrayIt is an array, to be rescaled, where each row is an observation, and the columns are the features seen during each observation. The example is given below −obs = [[ 1., 1., 1.], [ 2., 2., 2.], ... Read More
 
 
			
			230 Views
If you are unsure of how to use a particular function or variable in NumPy and SciPy, you can call for the documentation with the help of ‘?’. In Jupyter notebook and IPython shell we can call up the documentation as follows −ExampleIf you want to know NumPy sin () function, you can use the below code −import numpy as np np.sin?OutputWe will get the details about sin() function something like as follows −We can also view the source with the help of double question mark (??) as follows −import numpy as np np.sin??Similarly, if you want to see the ... Read More
 
 
			
			860 Views
The Pima Indian Diabetes dataset, which we will be using here, is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. Based on the following diagnostic factors, this dataset can be used to place a patient in ether diabetic cluster or non-diabetic cluster −PregnanciesGlucoseBlood PressureSkin ThicknessInsulinBMIDiabetes Pedigree FunctionAgeYou can get this dataset in .CSV format from Kaggle website.ExampleThe example below will use SciPy library to create two clusters namely diabetic and non-diabetic from the Pima Indian diabetes dataset.#importing the required Python libraries: import matplotlib.pyplot as plt import numpy as np from scipy.cluster.vq import whiten, kmeans, vq ... Read More
 
 
			
			182 Views
Yes, we can also implement a K-means clustering algorithm by splitting the random data in 3 clusters. Let us understand with the example below −Example#importing the required Python libraries: import numpy as np from numpy import vstack, array from numpy.random import rand from scipy.cluster.vq import whiten, kmeans, vq from pylab import plot, show #Random data generation: data = vstack((rand(200, 2) + array([.5, .5]), rand(150, 2))) #Normalizing the data: data = whiten(data) # computing K-Means with K = 3 (3 clusters) centroids, mean_value = kmeans(data, 3) print("Code book :", centroids, "") print("Mean of Euclidean distances :", mean_value.round(4)) ... Read More
 
 
			
			438 Views
K-means clustering algorithm, also called flat clustering, is a method of computing the clusters and cluster centers (centroids) in a set of unlabeled data. It iterates until we find the optimal centroid. The clusters, we might think of a group of data points whose inter-point distances are small as compared to the distances to the point outside of that cluster. The number of clusters identified from unlabeled data is represented by ‘K’ in K-means algorithm.Given an initial set of K centers, the K-means clustering algorithm can be done using SciPy library by executing by the following steps −Step1− Data point ... Read More
 
 
			
			199 Views
Clustering is one among the most useful unsupervised ML methods. It is used to find the relationship patterns and similarity among the input data samples. After finding these patterns, unsupervised algorithm clusters the data samples having similarities into groups as illustrated in the diagram below −Anomaly detection, image segmentation, medical imaging, social network analysis, and market segmentation are some common applications for clustering. K-means and Hierarchical are the two most common forms of clustering.To implement clustering, SciPy provides us a clustering package (scipy.cluster) which further has two modules as given below −scipy.cluster.vq module − This SciPy module provides functions for k-means ... Read More