What are the methods for generating frequent itemsets?

Data MiningDatabaseData Structure

Apriori is the algorithms to have strongly addressed the combinatorial burst of frequent itemset generation. It implements this by using the Apriori principle to shorten the exponential search area. Despite its important performance enhancement, the algorithm acquires considerable I/O overhead because it needed making various passes over the transaction recordset.

The act of the Apriori algorithm can degrade essentially for dense data sets because of the enhancing width of transactions. Several methods have been produced to overcome these drawbacks and enhance the effectiveness of the Apriori, algorithm.

The following is a high-level description of these methods which are as follows −

Traversal of Itemset Lattice − A search for frequent itemsets can be considered as a traversal on the itemset lattice. The search methods engaged by an algorithm method how the lattice architecture is traversed during the frequent itemset generation phase. Some search methods are superior to others, based on the composition of frequent itemsets in the lattice.

General-to-Specific versus Specific-to-General − The Apriori algorithm needs a general-to-specific search approach, where pairs of frequent (k- l)-itemsets are combined to obtain candidate k-itemsets. This general-to-specific search method is efficient, supported the maximum length of a frequent itemset is not too long.

A specific-to-general search method views for more definite frequent itemsets first,before discovering the more general frequent itemsets. This method is beneficial to find maximal frequent itemsets in dense transactions, where the frequent itemset border is situated near the bottom of the lattice.

The Apriori principle can be used to prune some subsets of maximal frequent itemsets. Particularly, if a candidate k-itemset is maximal frequent, it does not have to determine any of its subsets of size k - 1. But if the candidate k-itemset is infrequent, it is required to check all of its k - 1 subset in the next iteration.

Another method is to connect both general-to-specific and specific-to-general search methods. This bidirectional approach needed more space to save the candidate itemsets, but it can support promptly identifying the frequent itemset border.

Equivalence Classes − There is another method to envision the traversal is to first partition the lattice into a disjoint team of nodes (or same classes). A frequent itemset generation algorithm searches for frequent itemsets within a specific equivalence class first before changing to another equivalence class.

The level-wise method used in the Apriori, the algorithm can be treated to be partitioning the lattice on the support of itemset sizes; i.e., the algorithm finds some frequent 1-itemsets first before the operation to higher-sized itemsets. Equivalence classes can also be represented as per the prefix or suffix labels of an itemset.

Updated on 11-Feb-2022 13:30:47