What is Apriori Algorithm?

Data Mining Database Data Structure

Apriori is a seminal algorithm developed by R. Agrawal and R. Srikant in 1994 formining frequent itemsets for Boolean association rules. The algorithm depends on the case that the algorithm need previous knowledge of frequent itemset properties.

Apriori use an iterative method called a level-wise search, where k-itemsets can explore (k+1)-itemsets. First, the set of frequent 1-itemsets is discovered by browsing the database to assemble the count for each item, and receiving those items that satisfy minimum support. The resulting set is indicated L₁.

Next, L₁ can find L₂, the set of frequent 2-itemsets, which can find L₃, etc, until no more frequent k-itemsets can be discovered. The finding of each L_k needed one complete scan of the database.

It can increase the effectiveness of the level-wise generation of frequent itemsets, an essential property known as the Apriori property. It can reduce the search space.

Apriori property − Some nonempty subsets of a frequent itemset should also be frequent.

The Apriori property depends on the following observation. By description, if an itemset I does not satisfy the minimum support threshold, min sup, then I is not frequent; that is, P(I) < min_sup.

If an item A is inserted to the itemset I, thus the resulting itemset (i.e., I ∪ A) cannot appear regularly than I. Thus, I∪A is not frequent such as P (I ∪ A) < min_sup.

This property belongs to an element of properties known as antimonotone in the sense that if a set cannot change a test, some supersets will decline the similar test as well. It is known as antimonotone because the property is monotonic in the context of declining a test.

There are two-step process is followed, including join and prune actions which are as follows −

The join step − It can find L_k, a set of candidate k-itemsets is produced by joining L_k−1 with itself. This set of candidates is indicated C_k. Let L₁ and L₂ be itemsets in L_k−1. The documentation L_i[j] defines the jth item in L_i (e.g., L₁ [k−2] defines the second to the last item in L₁).

The prune step − C_k is a superset of L_k, i.e., its members cannot be frequent, but some frequent k-itemsets are involved in C_k. A scan of the database to decide the count of every candidate in C_k can result in the determination of L_k (i.e., some candidates having a count no less than the minimum support count are frequent by description, and thus belong to L_k). C_k can be large, and it can include large computation.

Ginni

Updated on: 16-Feb-2022

905 Views

Kickstart Your Career

Get certified by completing the course

Get Started