What are the techniques for Mining Negative Patterns?

Data MiningDatabaseData Structure

The first class of techniques produced for mining infrequent patterns considers each item as a symmetric binary variable. The transaction information can be binarized by augmenting it with negative items. It displays an instance of changing the initial data into transactions having both positive and negative items. By using current frequent itemset generation algorithms including Apriori on the augmented transactions,some negative itemsets can be derived.

Such an approach is possible only if several variables are considered as symmetric binary (i.e., it is viewed for negative patterns containing the negation of only a small number of items). If each item should be considered as symmetric binary, the problem becomes computationally difficult because of the following reasons.

The multiple items doubles when each item is augmented with its corresponding negative item. Rather than exploring an itemset lattice of size 2d, where d is the number of items in the initial data set, the lattice becomes higher.

Support-based pruning is no more efficient when negative items are augmented. For every variable x, either x or x has provided higher than or equal to 50%. Therefore, even if the support threshold is as large as 50%, half of the items will frequent.

For lower thresholds, several items and possibly itemsets including them will be frequent. The support-based pruning methods engaged by Apriori are efficient only when the support for most itemsets is low; therefore, the various frequent itemsets grow exponentially.

The width of each transaction improves when negative items are augmented. Consider that there are d items available in the initial data set. For sparse data sets including market basket transactions, the width of each transaction influence to be much smaller than d.

Accordingly, the maximum size of a frequent itemset, which is limited by the maximum transaction width, wmax, influences to be associatively small. When negative items are contained, the width of the transactions increases to d because an item is present in the transaction or absent from the transaction, but not both.

Because the maximum transaction width has increased from wmax to d, this will increase the number of multiple frequent itemsets rapidly changes. Accordingly, some current algorithms tend to break down when they are used to the lengthy data set.

The prior brute-force approach is computationally costly because it forces us to decide the support for a huge number of positive and negative patterns. Rather than augmenting the data set with negative items, another approach is to decide the support of the negative itemsets depending on the support of their correlating positive items.

raja
Updated on 14-Feb-2022 09:52:28

Advertisements