How are measures computed in data mining?

Measures can be organized into three elements including distributive, algebraic, and holistic. It depends on the type of aggregate functions used.

Distributive − An aggregate function is distributive if it can be calculated in a delivered manner as follows. Consider the data are independent into n sets. It can use the service to each partition, resulting in n aggregate values.

If the result changed by using the function to the n aggregate values is the same as that derived by using the function to the whole data set (without partitioning), the function can be evaluated in a distributed way.

For instance, count() can be calculated for a data cube by first partitioning the cube into a group of subcubes, calculating count() for every subcube, and then summing up the counts acquired for each subcube. Therefore, count() is a distributive aggregate service.

A measure is distributive if it is obtained by using a distributive aggregate service. Distributive measures can be calculated effectively because they can be calculated in a distributive way.

Algebraic − An aggregate function is algebraic if it can be calculated by an algebraic service with M arguments (where M is a bounded positive integer), each of which is obtained by using a distributive aggregate service.

For instance, avg() (average) can be calculated by sum()/count(), where both sum() and count() are distributive aggregate service. Similarly, it can be displayed that min N() and max N() (which discover the N minimum and N maximum values, accordingly, in a given set) and standard deviation() are algebraic aggregate services. A measure is algebraic if it is acquired by using an algebraic aggregate service.

Holistic − An aggregate function is holistic if there is no fixed bound on the storage size required to define a subaggregate. If there are does not continue an algebraic function with M arguments (where M is a constant) that describe the computation.

Examples of holistic functions such as median (), mode (), and rank (). A measure is holistic if it is acquired by using a holistic aggregate function.

Most large data cube applications needed effective computation of distributive and algebraic measures. There are some efficient methods for this exist. In contrast, it is complex to calculate holistic measures efficiently. An efficient approach to approximate the computation of some holistic measures, still, does exist.

For instance, instead of computing the exact median(), can be used to calculate the approximate median value for a huge data set. In some cases, such methods are sufficient to overcome the difficulties of effective calculations of holistic measures.

Updated on: 16-Feb-2022

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started