What is the Bisecting K-Means?

Data MiningDatabaseData Structure

The bisecting K-means algorithm is a simple development of the basic K-means algorithm that depends on a simple concept such as to acquire K clusters, split the set of some points into two clusters, choose one of these clusters to split, etc., until K clusters have been produced.

The k-means algorithm produces the input parameter, k, and division a set of n objects into k clusters so that the resulting intracluster similarity is high but the intercluster analogy is low. Cluster similarity is evaluated concerning the mean value of the objects in a cluster, which can be viewed as the cluster’s centroid or center of gravity.

The original values for the means are arbitrarily authorized. These can be authorized randomly or perhaps can need the values from the first k input items themselves. The convergence component can be based on the squared error, but they are needed not to be. For instance, the algorithm is assigned to multiple clusters. Other termination methods have been locked at a fixed number of iterations. A maximum number of iterations can be involved to provide shopping even without convergence.

The Algorithm of bisecting K-Means which are as follows −

  • Initialize the list of clusters to include the cluster such as all points.

  • repeat

  • Remove a cluster from the list of clusters.

  • {Implement multiple "trial" bisections of the selected cluster.}

  • for i : 1 to number of trials do

  • Bisect the choose cluster utilizing basic K-means.

  • end for

  • Choose the two clusters from the bisection with the smallest total SSE.

  • Insert these two clusters to the document of clusters.

  • until the document of clusters includes K clusters.

There are several ways to choose which cluster to split. It can choose the highest cluster at each step, select the one with the largest SSE, or use an element based on both size and SSE. Multiple choices result in different clusters.

It can clarify the outcoming clusters by utilizing their centroids as the original centroids for the basic K-means algorithm. This is essential because although the K-means algorithm is secured to find a clustering that defines a local minimum concerning the SSE, in bisecting K-means it is using the K-means algorithm "locally," i.e., to bisect single clusters. Hence, the final set of clusters does not define a clustering that is a Local minimum concerning the total SSE.

Finally, by recording the series of clusterings created as K-means bisects clusters, it can also need bisecting K-means to make a hierarchical clustering.

Updated on 14-Feb-2022 11:32:59