How can we mine closed frequent itemsets?

In naïve approach, it can mine the complete set of frequent itemsets and then remove each frequent itemset that is a proper subset of, and give the similar support as, a current frequent itemset.

This method can derive 2100−1 frequent itemsets to obtain a length-100 frequent itemset, all before it can start to remove redundant itemsets. A recommended techniques is to search for closed frequent itemsets precisely during the mining phase. This needed us to prune the search area as soon as it can identify the method of closed itemsets during mining. There are various pruning strategies include the following −

Item merging − If each transaction including a frequent itemset X also includes an itemset Y but not some proper superset of Y, therefore X ∪Y forms a frequent closed itemset and there is no required to search for some itemset including X but no Y.

Sub-itemset pruning − If a frequent itemset X is a proper subset of an earlier discovered frequent closed itemset Y and support_count(X) = support_count(Y), thus X and all of X’s descendants in the set enumeration tree cannot be frequent closed itemsets and therefore can be pruned.

Item skipping − In the depth-first mining of closed itemsets, at every level, there can be a prefix itemset X related to a header table and a projected database. If a local frequent item p has the similar support in multiple header tables at several levels, it can safely prune p from the header tables at larger levels.

When a new frequent itemset is changed, it is essential to implement two types of closure checking which are as follows −

  • Superset checking − It can tests if this new frequent itemset is a superset of some earlier found closed itemsets with the similar support.

  • Subset checking − It can tests whether the newly discovered itemset is a subset of an earlier found closed itemset with the similar support.

It can adopt the item merging pruning techniques under a divide-and-conquer structure, then the superset testing is actually built-in and there is no required to explicitly implement superset checking. This is because if a frequent itemset X∪Y is discovered later than itemset X, and carries the similar support as X, it should be in X’s projected database and should have been produced during itemset merging.

It can helps in subset checking, a compressed pattern-tree can be constructed to support the set of closed itemsets mined. The pattern-tree is same in mechanism to the FP-tree except that all of the closed itemsets discovered are saved explicitly in the corresponding tree branches.