- Trending Categories
- Data Structure
- Operating System
- C Programming
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What are the various Issues regarding Classification and Prediction in data mining?
There are the following pre-processing steps that can be used to the data to facilitate boost the accuracy, effectiveness, and scalability of the classification or prediction phase which are as follows −
Data cleaning − This defines the pre-processing of data to eliminate or reduce noise by using smoothing methods and the operation of missing values (e.g., by restoring a missing value with the most generally appearing value for that attribute, or with the best probable value established on statistics). Although various classification algorithms have some structures for managing noisy or missing information, this step can support reducing confusion during learning.
Relevance analysis − There are various attributes in the data that can be irrelevant to the classification or prediction task. For instance, data recording the day of the week on which a bank loan software was filled is improbable to be relevant to the success of the software. Moreover, some different attributes can be redundant.
Therefore, relevance analysis can be implemented on the data to delete some irrelevant or redundant attributes from the learning procedure. In machine learning, this step is referred to as feature selection. It contains such attributes that can otherwise slow down, and likely mislead the learning step.
Correctly, the time used on relevance analysis, when inserted to the time used on learning from the resulting “reduced” feature subset, and must be less than the time that would have been used on learning from the initial set of features. Therefore, such analysis can help boost classification effectiveness and scalability.
Data transformation − The data can be generalized to a larger-level approach. Concept hierarchies can be used for these goals. This is especially helpful for continuous-valued attributes. For instance, mathematical values for the attribute income can be generalized to the discrete field including low, medium, and high. Likewise, nominal-valued attributes, such as the street, can be generalized to larger-level concepts, such as the city.
Because generalization shortens the initial training data, fewer input/output operations can be included during learning. The data can also be normalized, especially when neural networks or techniques containing distance measurements are used in the learning step.
Normalization includes scaling all values for a given attribute so that they decline inside a small specified area, including -1.0 to 1.0, or 0 to 1.0. In these approaches that apply distance measurements, for instance, this can avoid attributes with originally high ranges (such as, income) from
- What are the various issues related to data mining?
- What are Classification and Prediction?
- What is the classification of data mining systems?
- What are the user interaction issues related to data mining methodology?
- What are the additional issues of K-Means Algorithm in data mining?
- What are the data mining transformations?
- What are the data mining interfaces?
- What are the trends in data mining?
- What are the areas of text mining in data mining?
- What are the advantages and disadvantages of data mining?
- What are the challenges regarding the construction and utilization of spatial data warehouses?
- What are the security issues in a data warehouse?
- What are the functionalities of data mining?
- What are the challenges of data mining?
- What are the applications of data mining?