- Trending Categories
- Data Structure
- Operating System
- C Programming
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What are the methods for Data Generalization and Concept Description?
Data generalization summarizes data by replacing relatively low-level values (such as numeric values for an attribute age) with higher-level concepts (such as young, middleaged, and senior). Given the high amount of data saved in databases, it is beneficial to be able to define concepts in concise and succinct terms at generalized (rather than low) methods of abstraction.
It is allowing data sets to be generalized at multiple levels of abstraction facilitates users in examining the general behavior of the data. Given the AllElectronics database, for instance, rather than examining single customer transactions, sales managers can prefer to view the data generalized to higher levels, including summarized by user groups as per the geographic regions, frequency of purchases per group, and users income. This leads us to the notion of concept description, which is a form of data generalization.
A concept generally defines as set of data including frequent buyers, graduate students, etc. As a data mining task, concept description is not a simple enumeration of the data. Instead, concept description generates descriptions for the characterization and comparison of the data. It is also known as class description, when the concept to be defined a class of objects.
Characterization supports a concise and succinct summarization of the given set of data, while concept or class comparison (also referred to as discrimination) supports descriptions comparing two or more sets of data. There are the following cases which are as follows −
Complex data types and aggregation − Data warehouses and OLAP tools are depends on a multidimensional data model that views information in the form of a data cube, including dimensions (or attributes) and measures (aggregate services).
However, several current OLAP systems confine dimensions to non-numeric records and measures to numeric information. The database can involve attributes of several data types, such as numeric, non-numeric, spatial, text, or image, which must be involved in the concept description.
User-control versus automation − On-line analytical processing in data warehouses is a user-controlled phase. The selection of dimensions and the software of OLAP services, including drill-down, roll-up, slicing, and dicing, are generally directed and managed by the users.
Although the control in several OLAP systems is user-friendly, users do need a best understanding of the importance of each dimension. Moreover, it can find a satisfactory description of the information, users can required to define a long series of OLAP operations.
It is desirable to have a more automated phase that supports users decide which dimensions (or attributes) must be included in the analysis, and the degree to which the given data set must be generalized in order to create an interesting summarization of the records.
- What are the methods for the generation of concept hierarchies for nominal data?
- What is the example of data generalization and analytical generalization?
- What are the techniques of Discretization and Concept Hierarchy Generation for Numerical Data?
- What are the clustering methods for spatial data mining?
- What are the data Mining methods for Recommender Systems?
- What are the methods of Data Mining for Intrusion Detection and Prevention?
- What is the difference between concept description in a large database and OLAP?
- What are the rules of Attribute Generalization?
- What is the techniques of Discretization and Concept Hierarchy Generation for Categorical Data?
- What are the estimation methods in data mining?
- What are the methods for generating frequent itemsets?
- What are the methods for Clustering with Constraints?
- What are the various extraction methods in data warehouses?
- What are the methods of Privacy-preserving data mining?
- Return a description for the given data type code in Python