What is K-means clustering?

K-means clustering is the most common partitioning algorithm. K-means reassigns each data in the dataset to only one of the new clusters formed. A record or data point is assigned to the nearest cluster using a measure of distance or similarity.

The k-means algorithm creates the input parameter, k, and division a group of n objects into k clusters so that the resulting intracluster similarity is large but the intercluster analogy is low. Cluster similarity is computed regarding the mean value of the objects in a cluster, which can be looked at as the cluster’s centroid or center of gravity.

There are the following steps used in the K-means clustering −

  • It can select K initial cluster centroid c1, c2, c3… . . ck.

  • It can assign each instance x in the S cluster whose centroid is nearest to x.

  • For each cluster, recompute its centroid based on which elements are contained in that cluster.

  • Go to (b) until convergence is completed.

  • It can separate the object (data points) into K clusters.

  • It is used to cluster center (centroid) = the average of all the data points in the cluster.

  • It can assign each point to the cluster whose centroid is nearest (using distance function).

The original values for the means are arbitrarily authorized. These can be assigned randomly or perhaps can use the values from the first k input items themselves. The convergence element can be based on the squared error, but they are required not to be. For example, the algorithm is assigned to different clusters. Other termination techniques have simply locked at a fixed number of iterations. A maximum number of iterations can be included to ensure shopping even without convergence.



D = {t1 t2 … tn} // Set of elements
k // Number of desired clusters


K // Set of clusters

K-means algorithm

   assign initial values for means m1 m2 … . . mk
   assign each item ti to the cluster which has the closest mean
calculate the new mean for each cluster
until convergence criteria are met

It is used to arbitrarily select three objects as the three original cluster centers, where cluster centers are denoted by a “+”. Each object is distributed to a cluster depending on the cluster center to which it is convenient.

Next, the cluster centers are updated. The mean value of each cluster is recomputed based on the prevailing objects in the cluster. By utilizing the new cluster centers, the objects are redistributed to the clusters depending on which cluster center is adjacent. Such a redistribution structure new silhouettes surrounded by dashed curves.

The procedure of iteratively recreating objects to clusters to improve the partitioning is defined as repetitive relocation. There is no redistribution of the objects in any cluster that appears, and so the process removes. The resulting clusters are restored by the clustering phase.