What is Support Counting?



Support counting is the procedure of deciding the frequency of appearance for each candidate itemset that survives the candidate pruning step of the apriori-gen function.

One method for doing this is to compare each transaction against each candidate itemset and to refresh the support counts of candidates included in the transaction. This method is computationally costly, particularly when the multiple transactions and candidate itemsets are high.

A second approach is to enumerate the itemsets included in each transaction and need them to refresh the support counts of their specific candidate itemsets. Consider a transaction t that includes five items, {I, 2, 3, 5, and 6}. There are (5 3) = 10 itemsets of size 3 included in this transaction.

Various itemsets can correspond to the candidate 3-itemsets under analysis, in which case, their support counts are incremented. There are different subsets of t that do not correspond to some candidates that can be ignored.

A systematic approach for enumerating the 3-itemsets included in t. Considering that each itemset maintains its items in improving lexicographic order, an itemset can be enumerated by defining the smallest item first, followed by the higher items.

For example, given t : {1, 2, 3, 5, and 6}, all the 3- itemsets included in t should start with items 1, 2, or 3. It is not applicable to make a 3-itemset that starts with items 5 or 6 because there are two items in t whose labels are higher than or the same as 5.

The prefix architecture shows how itemsets included in a transaction can be consistently enumerated, i.e., by defining their items one by one, from the leftmost item to the rightmost item.

It can determine whether each enumerated 3-itemset correlates to an existing candidate itemset. If it connects one of the candidates, therefore the support count of the correlating candidate is incremented.

In the Apriori, algorithm, candidate itemsets are divided into multiple buckets and saved in a hash tree. During support counting, itemsets included in each transaction are also hashed into their suitable buckets. Rather than comparing each itemset in the transaction with each candidate itemset, it is connected only against candidate itemsets that belong to a similar bucket.

Each internal node of the tree needs the following hash function, h(p) : p mod 3, to determine which branch of the current node must be followed next. For instance, items 1, 4, and 7 are hashed to the same branch (i.e., the leftmost branch) because they have a similar remainder after splitting the number by 3. All candidate itemsets are saved at the leaf nodes of the hash tree.


Advertisements