What are the mining multilevel association rules from transactional databases?

The approaches to mining multilevel association rules are based on the supportconfidence framework. The top-down strategy is employed where counts are accumulated for the calculation of frequent itemsets at each concept level, starting at concept level 1 and working towards the lower specific concept levels until more frequent itemsets can be found using the Apriori algorithm.

Data can be generalized by replacing low-level concepts within the data with their higher-level concepts or ancestors from a concept hierarchy. In a concept hierarchy, which is represented as a tree with the root as D i.e., Task-relevant data.

The popular area of application for the multi-level association is market basket analysis, which studies the buying habits of customers by searching for sets of items that are frequently, purchased together which was displayed in the concept of concept hierarchy.

Each node indicates an item or item set that has been examined. There are various approaches for finding frequent itemsets at any level of abstraction. Some of the methods which are in use are 'using uniform minimum support for all levels', using reduced minimum support at low levels, level-by-level independent.

Multi-level databases need a hierarchy-data encoded transaction table rather than the initial transaction table. This is useful when we are interested in only a portion of the transaction database such as food, instead of all the items. This way we can first collect the relevant set of data and then work repeatedly on the task-relevant set. Thus in the transaction table, each item is encoded as a sequence of digits.

Using uniform minimum support for all levels − When a uniform minimum support threshold is used, the search procedure is simplified. An optimization technique can be adopted, based on the knowledge that an ancestor is a superset of its descendants, the search avoids examining itemsets containing any item whose ancestor does not have minimum support.

The main drawback of the uniform support approach is that the items at lower levels of abstraction will occur as frequently as those at higher levels of abstraction.

Using reduced minimum support at lower levels − Each level of abstraction has its minimum support threshold. The lower the abstraction level, the smaller the equivalent threshold. The following search categories for mining multiple-level association with reduced support are −

  • Level by level independent − It is a full breadth search, background knowledge of frequent itemsets is used for pruning. Here each node is examined regardless of the parent node is found to be frequent.

  • Level cross-filtering by a single item − An item as the ith level is determined if and only if its parent node at the (i-1)th level is frequent.

  • Level cross-filtering by k-itemset − An itemset at the ith level is determined if and only if its equivalent parent A-itemset at the (i-1)th level is frequent.