What is Conceptual Clustering?

Conceptual clustering is a form of clustering in machine learning that, given a set of unlabeled objects, makes a classification design over the objects. Unlike conventional clustering, which generally identifies groups of like objects, conceptual clustering goes one step further by also discovering characteristic definitions for each group,where each group defines a concept or class.

Therefore, conceptual clustering is a two-step process − clustering is implemented first, followed by characterization. Thus, clustering quality is not solely a service of single objects. Most techniques of conceptual clustering adopt a statistical method that uses probability measurements in deciding the concepts or clusters.

Probabilistic descriptions are generally used to define each derived concept. COBWEB is a famous and simple method of incremental conceptual clustering. Its input objects are defined by categorical attribute-value pairs. COBWEB makes a hierarchical clustering in the form of a classification tree.

A classification tree differs from a decision tree. Each node in a classification tree defines a concept and includes a probabilistic description of that concept, which summarizes the objects classified under the node. The probabilistic description contains the probability of the concept and conditional probabilities of the form $P(A_{i}=v_{ij}|C_{k})$ is an attribute-value pair (the ith attribute takes its jth possible value) and Ck is the concept class.

COBWEB uses a heuristic evaluation measure known as category utility to guide the construction of the tree. Category Utility (CU) is defined as

$$\frac{\sum_{k=1}^{n}P(C_{k})\left [\sum_{i}\sum_{j}P(A_{i}=v_{ij}|C_{k})^{2}-\sum_{i}\sum_{j}P(A_{i}=v_{ij})^{2}\right ]}{n}$$

where n is the number of nodes, concepts, or “categories” forming a partition, {C1,C2,..., Cn}, at the given level of the tree. In other terms, category utility is the increase in the expected number of attribute values that can be perfectly guessed given a partition (where this expected number corresponds to the term $P(C_{k})\sum_{i}\sum_{j}P(A_{i}=v_{ij}|C_{k})^{2}$ over the expected number of correct guesses with no such knowledge (corresponding to the term $\sum_{i}\sum_{j}P(A_{i}=v_{ij})^{2}$ .Although it does not have room to display the derivation, category utility rewards intraclass similarity and interclass dissimilarity, where −

Intraclass similarity − It is the probability $P(A_{i}=v_{ij}|C_{k})$. The higher this value is, the higher the proportion of class members that share this attribute-value pair and the more predictable the pair is of class members.

Interclass dissimilarity − It is the probability $P(C_{k}|A_{i}=v_{ij})$. The higher this value is,the fewer the objects in contrasting classes that share this attribute-value pair and the more predictive the pair is of the class.

COBWEB descends the tree along a suitable path, refreshing counts along the way, in search of the “best host” or node at which to define the object. This decision depends on temporarily locating the object in each node and evaluating the category utility of the resulting partition. The placement that results in the highest category utility should be the best host for the object.