What are the Categorization of Constraints in data mining?



Constraint-based algorithms need constraints to decrease the search area in the frequent itemset generation phase (the association rule creating step is exact to that of exhaustive algorithms).

The importance of constraints is well-defined and they make only association rules that are interesting to customers. The method is quite trivial and the rules area is decreased whereby remaining rules use the constraints.

There are three types of constraints which are as follows −

Constraints on instances − A constraint on instances defines how a pair or a set of instances must be grouped in the cluster analysis. There are two types of constraints from this category such as −

  • Must-link constraints − If a must-link constraint is defined on two objects x and y, therefore x and y must be grouped into one cluster in the output of the cluster analysis. These must-link constraints are transitive i.e., must-link(x, y) and must-link(y,z), then must-link(x,z).

  • Cannot-link constraints − Cannot-link constraints are the reversed of must-link constraints. If a cannot-link constraint is defined on two objects, x and y, therefore in the output of the cluster analysis, x and y must belong to several clusters. Cannot-link constraints can be entailed. If cannot-link(x, y), must-link (x, x), and must-link (y, y), then cannot-link (x, y).

Constraints on clusters − A constraint on clusters defines a requirement on the clusters, possibly utilizing attributes of the clusters. For instance, a constraint can define the minimum number of objects in a cluster, the maximum diameter of a cluster, or the shape of a cluster (e.g., a convex). The number of clusters defined for partitioning clustering methods can be marked as a constraint on clusters.

Constraints on similarity measurement − A similarity measure, including Euclidean distance, is used to calculate the similarity among objects in a cluster analysis. In various applications, exceptions use. A constraint on similarity measurement defines a requirement that the similarity computation must respect.

For instance, it can cluster people as changing objects in a plaza, while Euclidean distance can produce the walking distance among two points, a constraint on similarity measurement is that the trajectory executing the shortest distance cannot cross a wall.

There is another approach to classify clustering constraints considers how rigidly the constraints have to be respected. A constraint is difficult if a clustering that disrupt the constraint is unacceptable. A constraint is soft if a clustering that disrupt the constraint is not desirable but acceptable when no better solution can be discovered. Soft constraints are also known as preferences.


Advertisements