What is Expectation-Maximization?

The EM (Expectation-Maximization) algorithm is a famous iterative refinement algorithm that can be used for discovering parameter estimates. It can be considered as an extension of the k-means paradigm, which creates an object to the cluster with which it is most similar, depending on the cluster mean.

EM creates each object to a cluster according to a weight defining the probability of membership. In other term, there are no strict boundaries among clusters. Thus, new means are evaluated based on weighted measures.

EM begins with an original estimate or “guess” of the parameters of the combination model (collectively defined as the parameter vector). It can iteratively rescore the objects as opposed to the mixture density make by the parameter vector. The rescored objects are used to restore the parameter estimates. Each object has created a probability that it can possess a specific set of attribute values given that it was a member of a given cluster. The algorithm is represented as follows −

  • It can be used to make an original guess of the parameter vector − This contains randomly selecting k objects to define the cluster means or centers (as in k-means partitioning), and making guesses for the new parameters.

  • It can repetitively refine the parameters (or clusters) depending on the following two steps −

  • (a) Expectation Step − It can create each object xi to cluster ck with the probability

    $$P(x_{i}\epsilon C_{k})=p(C_{k}|x_{i})=\frac{p(C_{k})p(x_{i}|C_{k})}{p(x_{i})}$$

    where p(xi|Ck ) = N(mk, Ek (xi)) follows the normal (i.e., Gaussian) distribution around mean, mk, with expectation, Ek. In another terms, this step computes the probability of cluster membership of object xi, for each of the clusters. These probabilities are the “expected” cluster memberships for object xi.

  • (b) Maximization Step − It can need the probability estimates from above to reestimate (or refine) the model parameters. For example,

    $$m_{k}=\frac{1}{n}\sum_{i=1}^{n}\frac{x_{i}P(x_{i}\epsilon C_{k})}{\sum_{j}P(x_{i}\epsilon C_{j})}$$

This phase is the “maximization” of the likelihood of the allocations given the data.

The EM algorithm is simple and understandable to execute. It converges quickly but cannot reach the global optima. Convergence is guaranteed for specific forms of optimization functions. The computational complexity is linear in d (the number of input characteristics), n (the number of items), and t (the number of redundancy). Bayesian clustering techniques target the computation of class-conditional probability density. They are generally used in the statistics community.

In industry, AutoClass is a famous Bayesian clustering technique that uses a modification of the EM algorithm. The best clustering maximizes the capability to predict the attributes of an object given the accurate cluster of the object. AutoClass can also estimate the number of clusters. It has been used in various domains and was able to find a new class of stars depending on infrared astronomy data.