What is the techniques of Discretization and Concept Hierarchy Generation for Categorical Data?

Categorical data are discrete data. Categorical attributes have a fixed number of distinct values, with no sequencing among the values involving geographic area, job category, and item type. There are various methods for the generation of concept hierarchies for categorical data are as follows −

  • Specification of a partial ordering of attributes explicitly at the schema level by users or experts − Concept hierarchies for categorical attributes or dimensions generally contain a group of attributes. A user or professional can simply represent a concept hierarchy by defining a partial or total ordering of the attributes at the schema level.

For instance, a relational database or a dimension area of a data warehouse can include the following team of attributes such as street, city, province or state, and country. A hierarchy can be represented by defining the total ordering between these attributes at the schema level, including street < city < province or state < country.

  • Specification of a portion of a hierarchy by explicit data grouping − This is the manual definition of a portion of a concept hierarchy. In a high database, it is unrealistic to represent a whole concept hierarchy by explicit value enumeration. On the contrary, it can simply represent explicit groupings for a small portion of intermediate-level data.

  • Specification of a set of attributes, but not of their partial ordering − A user can describe a set of attributes forming a concept hierarchy, but eliminate explicitly state their partial ordering. The system can try to automatically generate the attribute order to construct a meaningful concept hierarchy.

It is based on this observation, a concept hierarchy can be automatically created based on the multiple distinct values per attribute in the given attribute set. The attribute with the most distinct values is situated at the lowest level of the hierarchy. The lower the multiple distinct values an attribute has, the higher it is in the generated concept hierarchy. This heuristic rule operates well in some cases. Some local-level swapping or adjustments can be used by users or professionals, when necessary, after analysis of the generated hierarchy.

  • Specification of only a partial set of attributes − The user might have a vague idea of what is required to be included in the hierarchy, for example, the user name can specify only first and last name and not include middle name. It is such partially specified hierarchy is managed by installing data semantics in the database design for pinning together the attributes with the fast semantic connection.