What Is Cluster Analysis?

Data MiningDatabaseData Structure

Cluster analysis is an essential human activity. Cluster analysis is used to form groups or clusters of the same records depending on various measures made on these records. The key design is to define the clusters in ways that can be useful for the objective of the analysis. This data has been used in several areas, such as astronomy, archaeology, medicine, chemistry, education, psychology, linguistics, and sociology.

Cluster analysis is a branch of statistics that has been studied widely for several years. The benefit of using this technique is that interesting structures or clusters can be discovered directly from the data without utilizing any background knowledge, such as concept hierarchy.

Clustering algorithms used in statistics, like PAM or CLARA, are reported to be inefficient from the computational complexity point of view. As per the efficiency concern, a new algorithm called CLARANS (Clustering Large Applications based upon Randomized Search) was developed for cluster analysis.

There is one famous use of cluster analysis in marketing is for market segmentation − users are segmented based on demographic and transaction history data, and marketing techniques are tailored for each segment.

Another term is for market structure analysis identifying teams of the same products according to competitive measures of similarity. In marketing and political forecasting, clustering of neighborhoods using U.S. postal zip codes has been used strongly to group neighborhoods by lifestyles.

In finance, cluster analysis can be used for making balanced portfolios − Given data on several investment opportunities such as stocks. It can discover clusters depending on financial performance variables including return such as daily, weekly, or monthly, volatility, beta, etc., including industry and market capitalization.

There is another operation of cluster analysis in finance is for market analysis. For a given industry, it is interested in finding teams of the same firms based on measures such as growth rate, profitability, industry size, product range, and presence in several international markets. These teams can then be analyzed to learn the market structure and to decide, for example, who is a competitor.

Cluster analysis can be used for large amounts of data. For example, Internet search engines use clustering methods to cluster queries that users submit. These can then be used for developing search algorithms.

Generally, the basic data used to cluster are a table of measurements on various variables, where each column defines a variable and a row defines a record. The aim is to form groups of data so that the same records are in the same group. The number of clusters can be pre-specified or decided from the data.

Updated on 14-Feb-2022 09:58:19