What are the steps involved in data mining when viewed as a process of knowledge discovery?


KDD represents Knowledge Discovery in Databases. It defines the broad process of discovering knowledge in data and emphasizes the high-level applications of definite data mining techniques. It is an area of interest to researchers in several fields, such as artificial intelligence, machine learning, pattern recognition, databases, statistics, knowledge acquisition for professional systems, and data visualization.

The main objective of the KDD process is to extract data from information in the context of huge databases. It does this by utilizing Data Mining algorithms to recognize what is considered knowledge.

The Knowledge Discovery in Databases is treated as a programmed, exploratory analysis and modeling of huge data repositories. KDD is the organized process of identifying valid, helpful, and understandable designs from large and difficult data sets.

Data Mining is the root of the KDD procedure, such as the inferring of algorithms that investigate the records, develop the model, and discover previously unknown patterns. The model is used for extracting the knowledge from the information, analyzing the information, and predicting the information.

Data mining is a step in the KDD process that includes applying data analysis and discovery algorithms that, under acceptable computational efficiency limitations, make a specific enumeration of patterns (or models) over the data.

The KDD process contains using the database along with some required selection, preprocessing, subsampling, and transformations of it; using data-mining methods (algorithms) to enumerate patterns from it; and computing the products of data mining to recognize the subset of the enumerated patterns deemed knowledge.

The steps involved in the knowledge discovery process are as follows −

  • Selection − The data needed for the data mining process is collected from various sources. Therefore, the first step is choosing a dataset or focusing on a subset of variables or data samples on which discovery is to be implemented.
  • Data cleaning and preprocessing − The data to be used by the process can contain missing or incorrect values so as the basic operations include removing noise, collecting the necessary information to model or account for noise, deciding on techniques for handling missing data fields, and accounting for time-sequence information, is completed in the second step of KDD process.
  • Data transformation − This step includes finding useful features to represent the data depending on the goal of the task. With dimensionality reduction or transformation approaches, the efficient number of the variable under consideration can be reduced, or invariant representation for the data can be discovered.
  • Data mining − It is based on the data mining task being performed, this step applies an algorithm to the transformed data, searches for patterns of interest in a particular representational form or a set of specific representations, including classification rules or trees, regression, and clustering.
  • Interpreting mined patterns − This step can also involve visualization of the extracted patterns and models or visualization of the data given in the extracted models.

Updated on: 15-Feb-2022

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements