What is Clustering?

Data MiningDatabaseData Structure

The process of combining a set of physical or abstract objects into classes of the same objects is known as clustering. A cluster is a set of data objects that are the same as one another within the same cluster and are disparate from the objects in other clusters. A cluster of data objects can be considered collectively as one group in several applications. Cluster analysis is an essential human activity.

Cluster analysis is used to form groups or clusters of the same records depending on various measures made on these records. The key design is to define the clusters in ways that can be useful for the objective of the analysis. This data has been used in several areas, such as astronomy, archaeology, medicine, chemistry, education, psychology, linguistics, and sociology.

There is one famous use of cluster analysis in marketing is for market segmentation − users are segmented based on demographic and transaction history data, and marketing techniques are tailored for each segment.

Another term is for market structure analysis identifying teams of the same products according to competitive measures of similarity. In marketing and political forecasting, clustering of neighborhoods using U.S. postal zip codes has been used strongly to group neighborhoods by lifestyles.

In finance, cluster analysis can be used for making balanced portfolios − Given data on several investment opportunities (e.g., stocks), one can find clusters depending on financial performance variables including return (daily, weekly, or monthly), volatility, beta, and other characteristics, including industry and market capitalization. Selecting securities from multiple clusters can help make a balanced portfolio.

There is another operation of cluster analysis in finance is for market analysis. For a given industry, it is interested in finding teams of the same firms based on measures such as growth rate, profitability, industry size, product range, and presence in several international markets. These teams can then be analyzed to learn the market structure and to decide, for example, who is a competitor.

Cluster analysis can be used for large amounts of data. For example, Internet search engines use clustering methods to cluster queries that users submit. These can then be used for developing search algorithms.

Generally, the basic data used to cluster are a table of measurements on various variables, where each column defines a variable and a row defines a record. The aim is to form groups of data so that the same records are in the same group. The number of clusters can be pre-specified or decided from the data.

Published on 24-Nov-2021 06:30:02