KDD represents Knowledge Discovery in Databases. It defines the broad process of discovering knowledge in data and emphasizes the high-level applications of definite data mining techniques. It is an area of interest to researchers in several fields, such as artificial intelligence, machine learning, pattern recognition, databases, statistics, knowledge acquisition for professional systems, and data visualization.
The knowledge discovery process is iterative and interactive, includes nine steps. The process is iterative at every stage, implying that transforming back to the previous actions can be required. The process has several imaginative methods in the sense that one cannot present one formula or create a complete scientific categorization for the correct decisions for each step and application type. Therefore, it is required to understand the process and the multiple requirements and possibilities in each stage.
Developing an understanding − This is the basic preliminary step. It creates the scene for learning what should be done with the several decisions like transformation, algorithms, representation, etc. The individuals who are in charge of a KDD venture are required to learn and characterize the goals of the end-user and the environment in which the knowledge discovery process will appear (involves relevant prior knowledge).
Creating a target data set − It can be choosing a data set or targeting a subset of variables or data samples, on which discovery is to be implemented. This process is essential because Data Mining learns and finds from the accessible data. This is the evidence foundation for building the models. If some important attributes are missing, at that point, then the whole study can be unsuccessful from this respect, the more attributes are considered.
Data cleaning and pre-processing − Data cleaning defines to clean the data by filling in the missing values, smoothing noisy data, identifying and eliminating outliers, and removing inconsistencies in the data.
Exploratory analysis and model and hypothesis selection − It can be selecting the data mining algorithm(s) and selecting method(s) to be used for searching for data patterns. This process contains deciding which models and parameters can be appropriate and matching a particular data-mining method with the long-term criteria of the KDD process.
Data mining − It is used to search for patterns of interest in a specific representational form or a set of such representations, involving classification rules or trees, regression, and clustering. The user can significantly help the data-mining method by correctly implementing the preceding steps.
Acting on the discovered knowledge − It is using the knowledge directly, including the knowledge into another system for additional action, or simply documenting it and reporting it to interested parties. This process also contains checking for and resolving potential conflicts with previously accepted (or extracted) knowledge.