What are the steps involved in Association Rule Clustering System?

Data MiningDatabaseData Structure

There are the following steps are involved in association rule clustering system which are as follows −

Binning − Quantitative attributes can have a broad range of values representing their domain. It can think about how big a 2-D grid would be if it can plotted age and income as axes, where every possible value of age was created a specific position on one axis, and same, every possible value of income was created a specific position on the other axis.

It can maintain grids down to a manageable size, it can instead partition the areas of quantitative attributes into intervals. These intervals are powerful in that they can be combined during the mining phase. The partitioning phase is defined as binning, that is, where the intervals are treated “bins.”

There are three common binning strategies area as follows −

Equal-width binning − In equal-width bining, where the interval size of each bin is the same.

Equal-frequency binning − In equal frequency bining, where each bin has approximately the same number of tuples assigned to it.

Clustering-based binning − In clustering-based binning, where clustering is performed on the quantitative attribute to group neighboring points (judged based on various distance measures) into the same bin.

ARCS need equal-width binning, where the bin size for every quantitative attribute is input by the user. A 2-D array for every possible bin combination including both quantitative attributes is produced.

Every array cell influence the corresponding count distribution for each achievable class of the categorical attribute of the rule right-hand side. By making this data structure, the task-relevant data required only be scanned once. The same 2-D array can be used to produce rules for some value of the categorical attribute, depends on the same two quantitative attributes.

Finding frequent predicate sets − Because the 2-D array including the count distribution for every category is set up, it can be scanned to discover the frequent predicate sets (those satisfying minimum support) that also satisfy minimum confidence.

The algorithm checks the grid, seeking for rectangular clusters of rules. In this method, bins of the quantitative attributes appearing within a rule cluster can be combined, and therefore dynamic discretization of the quantitative attributes appears.

The grid-based method described that the original association rules can be clustered into rectangular regions. Before implementing the clustering, smoothing methods can be used to provide remove noise and outliers from the records. Rectangular clusters can oversimplify the information.

A non-grid-based approaches has been recommended to discover quantitative association rules that are more general, where several number of quantitative and categorical attributes can occur on either side of the rules.

In this approach, quantitative attributes are dynamically isolated using same frequency binning, and the partitions are combined depends on a measure of partial completeness, which quantifies the data lost because of partitioning.

Updated on 16-Feb-2022 11:34:44