What is the theoretical foundations of Data Mining?

There are several theories for the basis of data mining include the following −

Data reduction − In this theory, the basis of data mining is to reduce the data representation. Data reduction trades certainty for speed in response to the need to obtain fast approximate answers to queries on huge databases.

Data reduction methods include singular value decomposition (the driving component behind principal components analysis), wavelets, regression, log-linear models, histograms, clustering, sampling, and the development of index trees.

Data compression − According to this theory, the basis of data mining is to compress the given information by encoding in terms of bits, association rules, decision trees, clusters, etc.

Pattern discovery − In this theory, the basis of data mining is to find patterns appearing in the database, including associations, classification models, sequential patterns, etc. There are various areas including machine learning, neural network, association mining, sequential pattern mining, clustering, and several different subfields contribute to this theory.

Probability theory − This is based on statistical theory. In this theory, the basis of data mining is to find joint probability distributions of random variables, for instance, Bayesian belief networks or hierarchical Bayesian models.

Microeconomic view − The microeconomic view considers data mining as the service of discovering patterns that are fascinating only to the extent that they can be used in the decision-making procedure of some enterprise (e.g., regarding marketing approaches and production plans).

This view is one of service, in which patterns are considered interesting if they can be based on. Enterprises are regarded as facing optimization issues, where the object is to maximize the service or value of a decision. In this theory, data mining becomes a nonlinear optimization issues.

Inductive databases − According to this theory, a database schema includes data and patterns that are saved in the database. Data mining is the problem of implementing induction on databases, where the function is to query the information and the theory (i.e., patterns) of the database. This view is famous between several researchers in database systems.

These theories are not jointly exclusive. For instance, pattern discovery can also be view as a design of data reduction or data compression. Ideally, a theoretical framework must be able to model typical data mining tasks (including association, classification, and clustering), have a probabilistic features, be able to manage several forms of data, and treated the iterative and interactive essence of data mining. Moreover efforts are needed toward the establishment of a well-defined structure for data mining, which satisfies these requirements.

Updated on: 17-Feb-2022

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started