What is Apriori Algorithm?

Data MiningDatabaseData Structure

Apriori is a seminal algorithm developed by R. Agrawal and R. Srikant in 1994 formining frequent itemsets for Boolean association rules. The algorithm depends on the case that the algorithm need previous knowledge of frequent itemset properties.

Apriori use an iterative method called a level-wise search, where k-itemsets can explore (k+1)-itemsets. First, the set of frequent 1-itemsets is discovered by browsing the database to assemble the count for each item, and receiving those items that satisfy minimum support. The resulting set is indicated L1.

Next, L1 can find L2, the set of frequent 2-itemsets, which can find L3, etc, until no more frequent k-itemsets can be discovered. The finding of each Lk needed one complete scan of the database.

It can increase the effectiveness of the level-wise generation of frequent itemsets, an essential property known as the Apriori property. It can reduce the search space.

Apriori property − Some nonempty subsets of a frequent itemset should also be frequent.

The Apriori property depends on the following observation. By description, if an itemset I does not satisfy the minimum support threshold, min sup, then I is not frequent; that is, P(I) < min_sup.

If an item A is inserted to the itemset I, thus the resulting itemset (i.e., I ∪ A) cannot appear regularly than I. Thus, I∪A is not frequent such as P (I ∪ A) < min_sup.

This property belongs to an element of properties known as antimonotone in the sense that if a set cannot change a test, some supersets will decline the similar test as well. It is known as antimonotone because the property is monotonic in the context of declining a test.

There are two-step process is followed, including join and prune actions which are as follows −

The join step − It can find Lk, a set of candidate k-itemsets is produced by joining Lk−1 with itself. This set of candidates is indicated Ck. Let L1 and L2 be itemsets in Lk−1. The documentation Li[j] defines the jth item in Li (e.g., L1 [k−2] defines the second to the last item in L1).

The prune step − Ck is a superset of Lk, i.e., its members cannot be frequent, but some frequent k-itemsets are involved in Ck. A scan of the database to decide the count of every candidate in Ck can result in the determination of Lk (i.e., some candidates having a count no less than the minimum support count are frequent by description, and thus belong to Lk). Ck can be large, and it can include large computation.

Updated on 16-Feb-2022 11:26:46