Data mining is the process of finding useful new correlations, patterns, and trends by transferring through a high amount of data saved in repositories, using pattern recognition technologies including statistical and mathematical techniques. It is the analysis of factual datasets to discover unsuspected relationships and to summarize the records in novel methods that are both logical and helpful to the data owner.
A data mining task can be defined in the design of a data mining query, which is input to the data mining system. A data mining query is represented in conditions of data mining task primitives. These primitives enable the user to mutually connect with thedata mining system during discovery to direct the mining process or test the findings from multiple angles or depths.
The task of data mining is as follows −
The set of task-relevant data to be mined − This defines the portions of the database or the set of information in which the user is concerned. This involves the database attributes or data warehouse dimensions of interest (defined as the relevant attributes or dimensions).
The kind of knowledge to be mined − This defines the data mining functions to be operated, including characterization, discrimination, association or correlation analysis, classification, prediction, clustering, outlier analysis, or evolution analysis.
The background knowledge to be used in the discovery process − This knowledge about the domain to be mined helps direct the knowledge discovery process and for computing the patterns established. Concept hierarchies are a famous form of background knowledge, which enables data to be mined at several methods of abstraction.
The interestingness measures and thresholds for pattern evaluation − They can be used to guide the mining process or, after discovery, to compute the discovered patterns. Multiple types of knowledge can have different interesting measures.
The expected representation for visualizing the discovered patterns − This represents the form in which discovered patterns are to be presented, which can contain rules, tables, charts, graphs, decision trees, and cubes.
A data mining query language can be designed to incorporate these primitives, enabling users to flexibly connect with data mining systems. A data mining query language supports an authority on which user-friendly graphical interfaces can be constructed. This promotes a data mining system’s communication with other data systems and its integration with the complete data processing environment.
It is designing an inclusive data mining language is challenging because data mining protects a wide spectrum of functions, from data characterization to evolution analysis. Each task has several requirements. The design of an effective data mining query language needed broad learning of the power, limitation, and underlying structure of the different types of data mining tasks.