- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

Conceptual clustering is a form of clustering in machine learning that, given a set of unlabeled objects, makes a classification design over the objects. Unlike conventional clustering, which generally identifies groups of like objects, conceptual clustering goes one step further by also discovering characteristic definitions for each group,where each group defines a concept or class.

Therefore, conceptual clustering is a two-step process − clustering is implemented first, followed by characterization. Thus, clustering quality is not solely a service of single objects. Most techniques of conceptual clustering adopt a statistical method that uses probability measurements in deciding the concepts or clusters.

Probabilistic descriptions are generally used to define each derived concept. COBWEB is a famous and simple method of incremental conceptual clustering. Its input objects are defined by categorical attribute-value pairs. COBWEB makes a hierarchical clustering in the form of a classification tree.

A classification tree differs from a decision tree. Each node in a classification tree defines a concept and includes a probabilistic description of that concept, which summarizes the objects classified under the node. The probabilistic description
contains the probability of the concept and conditional probabilities of the form
$P(A_{i}=v_{ij}|C_{k})$ is an attribute-value pair (the i^{th} attribute takes its j^{th} possible value) and C_{k} is the concept class.

COBWEB uses a heuristic evaluation measure known as category utility to guide the construction of the tree. Category Utility (CU) is defined as

$$\frac{\sum_{k=1}^{n}P(C_{k})\left [\sum_{i}\sum_{j}P(A_{i}=v_{ij}|C_{k})^{2}-\sum_{i}\sum_{j}P(A_{i}=v_{ij})^{2}\right ]}{n}$$

where n is the number of nodes, concepts, or “categories” forming a partition, {C_{1},C_{2},..., C_{n}}, at the given level of the tree. In other terms, category utility is the increase in the expected number of attribute values that can be perfectly guessed given a partition (where this expected number corresponds to the term $P(C_{k})\sum_{i}\sum_{j}P(A_{i}=v_{ij}|C_{k})^{2}$ over the expected number of correct guesses with no such knowledge (corresponding to the term $\sum_{i}\sum_{j}P(A_{i}=v_{ij})^{2}$ .Although it does not have room to display the derivation, category utility rewards intraclass similarity and interclass dissimilarity, where −

**Intraclass similarity** − It is the probability $P(A_{i}=v_{ij}|C_{k})$. The higher this value is, the higher the proportion of class members that share this attribute-value pair and the more predictable the pair is of class members.

**Interclass dissimilarity** − It is the probability $P(C_{k}|A_{i}=v_{ij})$. The higher this value is,the fewer the objects in contrasting classes that share this attribute-value pair and the more predictive the pair is of the class.

COBWEB descends the tree along a suitable path, refreshing counts along the way, in search of the “best host” or node at which to define the object. This decision depends on temporarily locating the object in each node and evaluating the category utility of the resulting partition. The placement that results in the highest category utility should be the best host for the object.

- Related Questions & Answers
- What is Clustering?
- What is Multi-relational Clustering?
- What is clustering Index in DBMS?
- What is scipy cluster hierarchy? How to cut hierarchical clustering into flat clustering?
- What are the methods of clustering?
- What are the applications of clustering?
- Why is wavelet transformation useful for clustering?
- Which SciPy package is used to implement Clustering?
- What are the requirements of clustering in data mining?
- Asymmetric and Symmetric Clustering System
- Difference Between Classification and Clustering
- How to make a scatter plot for clustering in Python?
- Implementing K-means clustering of Diabetes dataset with SciPy library
- Implementing K-means clustering with SciPy by splitting random data in 3 clusters?
- Implementing K-means clustering with SciPy by splitting random data in 2 clusters?

Advertisements