What is the working of COWEB?

Data MiningDatabaseData Structure

COBWEB incrementally include objects into a classification tree. COBWEB descends the tree along an allocate path, refreshing counts along the method, in search of the “best host” or node at which to define the object.

This decision depends on temporarily locating the object in each node and calculating the category utility of the resulting division. The placement that results in the highest element utility must be a best host for the object.

COBWEB also calculates the category utility of the partition that can result if a new node is made for the object. The object is located in a current class, or a new class is generated for it, based on the partition with the largest category utility value. COBWEB has the capacity to automatically adjust the multiple classes in a partition. It does not required to rely on the user to give such an input parameter.

COBWEB has two operators that help create it less susceptible to input order. These are combining and splitting. When an object is integrated, the two best hosts are treated for combining into a single class.

Moreover, COBWEB considers dividing the children of the good host between the current categories. These decisions are depends on category utility. The combining and splitting operators enable COBWEB to implement a bidirectional search for instance, a merge can undo a previous split.

Limitations of COWEB

The limitation of COWEB is as follows −

It depends on the assumption that probability distributions on independent attributes are statistically separate of one another. This assumption is not always correct because correlation among attributes often exists.

Furthermore, the probability distribution description of clusters create it quite expensive to refresh and store the clusters. This is particularly so when the attributes have a huge number of values because the time and space complexities depend not only the several attributes, but also the several values for each attribute.

Moreover, the classification tree is not height-balanced for skewed input records, which can cause the time and space complexity to reduce dramatically.

CLASSIT is an expansion of COBWEB for incremental clustering of continuous (or real-valued) information. It save a continuous normal distribution (i.e., mean and standard deviation) for each single attribute in each node and need a modified category utility measure that is an elemental over continuous attributes rather than a sum over discrete attributes as in COBWEB.

Updated on 17-Feb-2022 10:58:38