- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# What are the methods for Clustering with Constraints?

There are various techniques are required to handle specific constraints. The general principles of handling hard and soft constraints which are as follows −

**Handling Hard Constraints** − A general methods for handling difficult constraints is to strictly regard the constraints in the cluster assignment procedure. Given a data set and a group of constraints on examples (i.e., must-link or cannot-link constraints), how can we develop the k-means approach to satisfy such constraints? The COP-kmeans algorithm works as follows −

**Generate super instances for must-link constraints** − It can calculate the transitive closure of the must-link constraints. Therefore, all must-link constraints are considered as an equivalence relation. The closure provides one or several subsets of objects where some objects in a subset should be assigned to one cluster.

It can define such a subset, it can replace some objects in the subset by the mean. The super instance also produce a weight, which is the number of objects it defines. After this process, the must-link constraints are continually satisfied.

**Conduct modified k-means clustering** − In k-means, an object is created to the closest center. It can respect cannot-link constraints, and it change the center assignment process in k-means to a closest feasible center assignment.

When the objects are assigned to centers in sequence, at every step it can sure the assignments so far do not disrupt some cannot-link constraints. An object is assigned to the closest center so that the assignment respects some cannot-link constraints.

Because COP-k-means provides that no constraints are violated at each step, it does not needed any backtracking. It is a greedy algorithm for creating a clustering that satisfies all constraints, supported that no conflicts exist between the constraints.

**Handling Soft Constraints** − Clustering with soft constraints is an optimization issues. When a clustering disrupt a soft constraint, a penalty is required on the clustering. Hence, the optimization aim of the clustering includes two parts such as optimizing the clustering aspect and minimizing the constraint violation penalty. The objective service is a set of the clustering quality score and the penalty score.

Given a data set and a set of soft constraints on examples, the CVQE (Constrained Vector Quantization Error) algorithm strategy k-means clustering while enforcing constraint violation penalties. The objective function utilized in CVQE is the total of the distance used in k-means, modified by the constraint violation penalties, which are computed as follows −

**Penalty of a must-link violation** − If there is a must-link constraint on objects x and y, but they are created to two multiple centers, c_{1} and c_{2}, accordingly, therefore the constraint is violated. As a result, dist (c_{1},c_{2}), the distance among c_{1} and c_{2}, is inserted to the objective function as the penalty.

**Penalty of a cannot-link violation** − If there is a cannot-link constraint on objects x and y, but they are created to a common center, c, therefore the constraint is violated. The distance, dist (c,c^{’}), between c and c^{’} is inserted to the objective function as the penalty.

- Related Articles
- What are the clustering methods for spatial data mining?
- What are the methods of clustering?
- What are the applications of clustering?
- What are MySQL constraints?
- What are the methods for sanitizing user inputs with PHP?\n
- What are the elements in Hierarchical clustering?
- What are the characteristics of clustering algorithms?
- What are the methods for generating frequent itemsets?
- What are the algorithms of Grid-Based Clustering?
- What are the approaches of Graph-based clustering?
- What are the methodologies of data streams clustering?
- What are the data Mining methods for Recommender Systems?
- What are the methods for expressing attribute test conditions?
- What are the methods for constructing an Ensemble Classifier?
- What are the Different Treatment Methods for Hazardous Wastes?