# What is model-based clustering?

Model-based clustering is a statistical approach to data clustering. The observed (multivariate) data is considered to have been created from a finite combination of component models. Each component model is a probability distribution, generally a parametric multivariate distribution.

For instance, in a multivariate Gaussian mixture model, each component is a multivariate Gaussian distribution. The component responsible for generating a particular observation determines the cluster to which the observation belongs.

Model-based clustering is a try to advance the fit between the given data and some mathematical model and is based on the assumption that data are created by a combination of a basic probability distribution.

There are the following types of model-based clustering are as follows −

Statistical approach − Expectation maximization is a popular iterative refinement algorithm. An extension to k-means −

• It can assign each object to a cluster according to weight (probability distribution).

• New means are computed based on weight measures.

The basic idea is as follows −

• It can start with an initial estimate of the parameter vector.

• It can be used to iteratively rescore the designs against the mixture density made by the parameter vector.

• It is used to rescored patterns are used to update the parameter estimates.

• It can be used to pattern belonging to the same cluster if they are placed by their scores in a particular component.

## Algorithm

• Initially, assign k cluster centers randomly.

• It can be iteratively refined the clusters based on two steps are as follows −

Expectation step − It can assign each data point Xi to cluster Ci with the following probability

$$\mathrm{P(X_{i}\in\:C_{k})\:=\:P(C_k\arrowvert\:X_i)\:=\:\frac{P(C_k)P(X_i\arrowvert\:C_k)}{P(X_i)}}$$

Maximization step − It can be used to estimate of model parameter

$$\mathrm{m_k\:=\:\frac{1}{N}\displaystyle\sum\limits_{i=1}^N \frac{X_{i}P(X_i\:\in\:C_k)}{X_{j}P(X_i)\in\:C_j}}$$

Machine learning approach − Machine learning is an approach that makes complex algorithms for huge data processing and supports results to its users. It uses complex programs that can understand through experience and create predictions.

The algorithms are improved by themselves by frequent input of training information. The main objective of machine learning is to learn data and build models from data that can be understood and used by humans.

It is a famous approach of incremental conceptual learning, which produces a hierarchical clustering in the form of a classification tree. Each node defines a concept and includes a probabilistic representation of that concept.

Limitations

• The assumption that the attributes are independent of each other is often too strong because correlation can exist.

• It is not suitable for clustering large database data, skewed trees, and expensive probability distributions.

Neural Network Approach − The neural network approach represents each cluster as an example, acting as a prototype of the cluster. The new objects are distributed to the cluster whose example is the most similar according to some distance measure.