What is the complexity of the Apriori Algorithm?

Data MiningDatabaseData Structure

The computational complexity of the Apriori algorithm can be influenced by the following factors which are as follows −

Support Threshold − Lowering the support threshold results in higher itemsets being stated as frequent. This has an unfavorable effect on the computational complexity of the algorithm because higher candidate itemsets should be produced and counted.

The maximum size of frequent itemsets also influences to improve with lower support thresholds. As the maximum size of the frequent itemsets improves, the algorithm will be required to create more passes over the data set.

Number of Items (Dimensionality) − As the number of several items increases, more space will be required to save the support counts of items. If the multiple frequent items also increase with the dimensionality of the data, the computation and I/O values will increase because of the higher number of candidate itemsets produced by the algorithm.

Number of Transactions − Because of the Apriori, the algorithm creates repeated passes over the dataset, its run time enhances with a higher number of transactions.

Average Transaction Width − For dense data sets, the average transaction width can be high. This influences the complexity of the Apriori algorithm in two methods such as the maximum size of frequent itemsets influence to increase as the average transaction width increases. The transaction width increases, higher itemsets are included in the transaction. This will increase the multiple hash tree traversals implemented during support counting.

Generation of frequent l-itemsets − For each transaction, it is required to update the support count for each item present in the transaction. Considering that w is the average transaction width, this operation needed O(Nw) time, where N is the total number of transactions.

Candidate generation − It can make candidate k-itemsets, pairs of frequent (k - 1)- itemsets are combined to decide whether they have minimum k - 2 items in common. Each combining operation is needed at most k - 2 equality comparisons. In the best-case scenario, each combining step makes a viable candidate k-itemset.

In the worst-case scenario, the algorithm should combine each pair of frequent (k - 1)-itemsets found in the prior iteration. Hence, the complete cost of merging frequent itemsets is

$$\mathrm{\displaystyle\sum\limits_{k=2}^w\:(k-2)|C_{k}|<\:Cost\:of\:merging\:<\displaystyle\sum\limits_{k=2}^w\:(k-2)|F_{k}-1|^2}$$

A hash tree is also produced during candidate generation to save the candidate itemsets. Due to the maximum depth of the tree being k, the cost for populating the hash tree with candidate itemsets is O($\mathrm{\displaystyle\sum\limits_{k=2}^w\:k|C_{k}|}$).

During candidate pruning, it is required to check that the k - 2 subsets of each candidate k-itemset are frequent. Because the cost for viewing up a candidate in a hash tree is O (k), the candidate pruning step needed O($\mathrm{\displaystyle\sum\limits_{k=2}^w\:k|C_{k}|}$) time.

raja
Updated on 11-Feb-2022 13:21:18

Advertisements