- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What are the requirements of clustering in data mining?
There are the following requirements of clustering in data mining which are as follows −
Scalability − Some clustering algorithms work well on small data sets including fewer than some hundred data objects. A huge database can include millions of objects. Clustering on a sample of a given huge data set can lead to partial results. Highly scalable clustering algorithms are required.
Ability to deal with different types of attributes − Some algorithms are designed to cluster interval-based (numerical) information. However, applications can require clustering several types of data, including binary, categorical (nominal), and ordinal data, or a combination of these data types.
Discovery of clusters with arbitrary shape − Some clustering algorithms determine clusters depending on Euclidean or Manhattan distance measures. Algorithms that depend on such distance measures tend to discover spherical clusters with the same size and density. But, a cluster can be of any shape. It is essential to develop algorithms that can recognize clusters of arbitrary shapes.
Minimal requirements for domain knowledge to determine input parameters − Some clustering algorithms needed users to input specific parameters in cluster analysis (including the number of desired clusters). The clustering results can be absolutely sensitive to input parameters. Parameters are difficult to decide, especially for data sets including high-dimensional objects. This not only task users, but it also creates the quality of clustering difficult to control.
Ability to deal with noisy data − Most real-world databases include outliers or missing, unknown, or erroneous information. Some clustering algorithms are keen on such data and can lead to clusters of poor quality.
Incremental clustering and insensitivity to the order of input records − Some clustering algorithms cannot include newly inserted information (i.e., database updates) into current clustering structures and, instead, must decide a new clustering from scratch.
Some clustering algorithms are sensitive to the order of input records. Given a set of data objects, including algorithm can return dramatically different clusterings depending on the order of presentation of the input objects. It is essential to develop incremental clustering algorithms and algorithms that are insensitive to the order of input.
High dimensionality − A database or a data warehouse can include multiple dimensions or attributes. Some clustering algorithms are good at managing low-dimensional data, containing only two to three dimensions. Human eyes are best at determining the quality of clustering for up to three dimensions. It is used to find clusters of data objects in high-dimensional space is complex, especially treating that such data can be inadequate and highly skewed.
- Related Articles
- What are the examples of clustering in data mining?
- What are the types of Clustering in data mining?
- What are the clustering methods for spatial data mining?
- What are the areas of text mining in data mining?
- What are the methodologies of data streams clustering?
- What are the functionalities of data mining?
- What are the challenges of data mining?
- What are the applications of data mining?
- What are the features of data mining?
- What are the limitations of data mining?
- What are the components of data mining?
- What are the techniques of data mining?
- What are the tools of data mining?
- What are the trends in data mining?
- What are the implementations of EAI in data mining?
