What are the representation of FP-Tree?



An FP-tree is a solid description of the input data. It is assembled by reading the data set one transaction at a time and measuring each transaction onto a route in the FP-tree. Several transactions can have multiple items in common, their route can overlap.

The more the routes overlap with one another, the more compression can implement using the FP-tree architecture. If the size of the FP-tree is adequate to fit into the main memory, this will enable us to extract frequent itemsets directly from the architecture in memory rather than creating repeated passes over the data saved on disk.

Each node in the tree includes the label of an item along with a counter that displays the multiple transactions mapped onto the given route. Originally, the FP-tree includes only the root node defined by the nulf symbol.

The FP-tree is continued in the following methods which are as follows −

The data set is searched once to decide the support count of each item. Infrequent items are dropped, while the frequent items are constant in reducing support counts.

The algorithm creates a second pass over the data to make the FP tree. After reviewing the first transaction, {a, b}, the nodes labeled as a and b are produced. A path is formed from null→a→b to encrypt the transaction. Each node along the route has a frequency count of 1.

After reviewing the second transaction, {b, c, d}, a new set of nodes is produced for items b, c, and d. A route is then formed to define the transaction by linking the nodes null→b→c→d.

Each node along this route also has a frequency count same to one. Although the first two transactions have an item in frequent, which is b, their routes are disjoint because the transactions do not share a frequent prefix.

The third transaction, {a, c, d, e}, shares a frequent prefix item (which is a) with the first transaction. Accordingly, the route for the third transaction, null→a→ c →d→e, overlaps with the route for the first transaction, nuII→a→b. Due to their overlapping route, the frequency count for node o is incremented to two, while the frequency counts for the recently created nodes, c, d, and e are the same as one.

This phase continues until each transaction has been mapped onto one of the routes given in the FP-tree. The size of an FP-tree is smaller than the size of the uncompressed data because several transactions in market basket information share more items in common.


Advertisements