What is KDD?

KDD represents Knowledge Discovery in Databases. It defines the broad process of discovering knowledge in data and emphasizes the high-level applications of definite data mining techniques. It is an area of interest to researchers in several fields, such as artificial intelligence, machine learning, pattern recognition, databases, statistics, knowledge acquisition for professional systems, and data visualization.

The main objective of the KDD process is to extract data from information in the context of huge databases. It does this by utilizing Data Mining algorithms to recognize what is considered knowledge. The Knowledge Discovery in Databases is treated as a programmed, exploratory analysis and modeling of huge data repositories. KDD is the organized process of recognizing valid, useful, and understandable design from large and difficult data sets.

KDD is the non-trivial procedure of identifying valid, novel, probably useful, and basically logical designs in data. The process indicates that KDD includes many steps, which include data preparation, search for patterns, knowledge evaluation, and refinement, all repeated in multiple iterations. By non-trivial, it means that some search or inference is contained; namely, it is not an easy computation of predefined quantities like calculating the average value of a set of numbers.

Data Mining is the root of the KDD procedure, such as the inferring of algorithms that investigate the records, develop the model, and discover previously unknown patterns. The model is used for extracting the knowledge from the information, analyzing the information, and predicting the information.

Data mining is a step in the KDD process that includes applying data analysis and discovery algorithms that, under acceptable computational efficiency limitations, make a specific enumeration of patterns (or models) over the data.

The field of patterns is often infinite, and the enumeration of patterns contains some form of search in this space. Practical computational constraints place serious limits on the subspace that can be analyzed by a data-mining algorithm.

The KDD process contains using the database along with some required selection, preprocessing, subsampling, and transformations of it; using data-mining methods (algorithms) to enumerate patterns from it; and computing the products of data mining to recognize the subset of the enumerated patterns deemed knowledge.

The data-mining component of the KDD process is concerned with the algorithmic method by which patterns are extracted and enumerated from records. The complete KDD process contains the evaluation and possible interpretation of the mined patterns to decide which patterns can be treated with new knowledge.