What are the SOM Algorithm?

Data MiningDatabaseData Structure

SOM represents Self-Organizing Feature Map. It is a clustering and data visualization technique depends on a neural network viewpoint. Regardless of the neural network basis of SOM, it is simply presented-minimum in the context of the alteration of prototype-based clustering.

The algorithm of SOM is as follows −

  • Initialize the centroids.

  • repeat

  • Choose the next object.

  • Determine the closest centroid to the object.

  • Refresh this centroid and the centroids that are close, i.e., in a definite neighborhood.

  • until the centroids don't change much or a threshold is outspace.

  • Create each object to its nearest centroid and restore the centroids and clusters.

Initialization − This step (line 1) can be implemented in multiple ways. One method is to select each element of a centroid randomly from the range of values observed in the data for that element.

While this method works, it is not essentially the best method, especially for making rapid convergence. Another method is to randomly select the original centroids from the accessible data points. This is very much like randomly choosing centroids for K-means.

Selection of an object − The first step in the loop (line 3) is the choice of the next object. This is simple, but there are several difficulties. Because convergence can require some steps, each data object can be used several times, particularly if the multiple objects is small. But if the number of objects is large, then not each object required to be used. It is also applicable to improve the influence of specific groups of objects by improving their frequency in the training set.

Assignment − The determination of the nearest centroid (line 4) is easy, although it needed the description of a distance metric. The Euclidean distance metric is utilized, as is the dot product metric. When utilizing the dot product distance, the data vectors are generally normalized beforehand and the reference vectors are normalized at every step. In this method, using the dot product metric is same to using the cosine measure.

Update − The update step (line 5) is difficult. Let m1..., mk, be the centroids. For time step t, let p(t) be the current object (point) and consider that the nearest centroid to p(t) is mj. Therefore, for time t+1, the jth centroid is refreshed by using the following equation.

$$\mathrm{mj(t + 1) = mj(t) + hj(t)(p(t) - mj(t))}$$

Termination − It is determining when it is adequate to a stable set of centroids is an essential issue. Ideally, iteration must continue until convergence appears, that is, until the reference vectors do not change or change small. The cost of convergence will based on a multiple factors, including the data and 𝛼(t).

Updated on 14-Feb-2022 12:27:03