What are the rules of Attribute Generalization?

Attribute generalization depends on the following rule: If there is a huge collection of distinct values for an attribute in the original working relation, and there exists a group of generalization operators on the attribute, thus a generalization operator should be choose and utilized to the attribute.

This rule depends on the following reasoning. The use of a generalization services to generalize an attribute value inside a tuple, or rule, in the working relation will create the rule cover more of the initial data tuples, therefore generalizing the concept it defines. This corresponds to the generalization rule defined as climbing generalization trees in knowledge from instances, or concept tree ascension.

It is based on the attributes or application contained, a user can prefer some attributes to remain at a moderately low abstraction method while others are generalized to higher method. The control of how high an attribute should be generalized is generally subjective. The control of this phase is known as attribute generalization control.

If the attribute is generalized “too high,” it can lead to overgeneralization, and the resulting rules cannot be very descriptive. In other words, if the attribute is not generalized to an “adequately high level,” then under generalization can result, where the rules obtained cannot be informative either. Therefore, a balance must be acquired in attribute-oriented generalization.

There are many possible ways to control a generalization process as follows −

Attribute generalization threshold control − The first technique, known as attribute generalization threshold control, either sets one generalization threshold for some the attributes, or sets one threshold for every attribute. If the multiple distinct values in an attribute is higher than the attribute threshold, moreover attribute removal or attribute generalization must be implemented.

Data mining systems generally have a default attribute threshold value usually ranging from 2 to 8 and must enable professionals and users to change the threshold values as well. If a user understand that the generalization reaches too large a level for a specific attribute, the threshold can be improved.

Generalized relation threshold control − The second technique, known as generalized relation threshold control, decided a threshold for the generalized relation. If the multiple (distinct) tuples in the generalized relation is higher than the threshold, moreover generalization must be implemented.

Therefore, no further generalization must be implemented. Such a threshold can also be preset in the data mining system (generally within a range of 10 to 30), or set by a professional or user, and must be adjustable. For instance, if a user understand that the generalized relation is too small, it can increase the threshold, which signifies drilling down.

Updated on: 16-Feb-2022


Kickstart Your Career

Get certified by completing the course

Get Started