- Trending Categories
- Data Structure
- Operating System
- C Programming
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What is the example of data generalization and analytical generalization?
Data generalization summarizes data by replacing relatively low-level values (including numeric value for attribute age) with high-level concepts (including young, middle-aged, and senior). Therefore, it is a process that abstracts a huge set of task-relevant information in a database from a relatively low conceptual level to higher conceptual levels.
Following are the two approaches for the efficient and flexible generalization of large data sets −
OLAP approach − The data cube technology can be treated as a data warehouse-based, pre-computation-oriented, materialized view approach. It implements offline aggregation earlier an OLAP or data mining query is moved for processing.
Attribute-oriented induction approach − It is a relational database query-oriented, generalization-based, online data analysis approach. In attribute-oriented induction, first, the task-relevant information is collected using a relational database query and then generalization is implemented based on the examination of the multiple distinct values of each attribute in the relevant collection of data.
The generalization is implemented by attribute removal. By combining identical generalized tuples and accumulating their respective counts implement aggregation, decreasing the size of generalized data set and interactive presentation with users.
Basic principles of attribute-oriented induction approach −
- Data focusing − Data must be task-related, such as dimensions and the result is the original relation.
- Attribute-removal − It can choose the set of relevant attributes or remove attributes A if there is a huge set of specific values for A but there is no generalization operator on A, or A's higher-level concepts are defined in terms of additional attributes.
- Attribute generalization − If there is a huge set of distinct values for A, and there exists a set of generalization operators on A, then select an operator and generalize A.
- Analytical characterization − It is a statistical approach for preprocessing data to filter out irrelevant attributes or rank the relevant attribute. Measures of attribute relevance analysis can be utilized to analyze irrelevant attributes that can be unauthorized from the concept description procedure. The inclusion of this preprocessing step into class characterization or comparison is defined as an analytical characterization.
Reasons for attribute relevance analysis
There are several reasons for attribute relevance analysis are as follows −
It can determine which dimensions should be included.
It can achieve a high level of generalization.
It can decrease the number of attributes that support us to understand patterns easily.
The basic concept behind attribute relevance analysis is to evaluate some measure that can compute the relevance of an attribute regarding a given class or approach. Such measures involve information gain, ambiguity, and correlation coefficient.
- What are the methods for Data Generalization and Concept Description?
- What are the rules of Attribute Generalization?
- How can generalization be performed on such data?
- Generalization, Specialization and Aggregation in ER Model
- Difference Between Generalization and Specialization in DBMS
- Explain the design constraints on the generalization and specialization (DBMS)?
- What is the structure for On-Line Analytical Mining?
- What is Kiting, Definition and Example?
- How does data mining relate to information processing and online analytical processing?
- What is the difference between data security and data integrity?
- What is the difference between Data Mining and Data Warehouse?
- What is Fourier Spectrum? – Theory and Example
- What is the Example keyword in Cucumber?
- What is the task of Data Mining?
- What is the architecture of data mining?