There are various challenges of data mining which are as follows −
Efficiency and scalability of data mining algorithms − It can effectively extract data from a large amount of data in databases, the knowledge discovery algorithms should be efficient and scalable to huge databases. Specifically, the running time of a data mining algorithm should be predictable and acceptable in huge databases. Algorithms with exponential or even channel-order polynomial complexity will not be of efficient use.
Usefulness, certainty, and expressiveness of data mining results − The identified knowledge should exactly portray the contents of the database and be beneficial for specific applications. The imperfectness must be defined by measures of uncertainty, in the form of approximate rules or quantitative rules.
Noise and exceptional data must be managed elegantly in data mining systems. This also stimulates a systematic study of measuring the quality of the discovered knowledge, such as interestingness and reliability, by the development of statistical, analytical, and simulative models and tools.
Expression of various kinds of data mining results − Several kinds of knowledge can be discovered from a huge amount of data. It can also like to examine discovered knowledge from multiple views and display them in different forms.
This needed us to define both the data mining requests and the discovered knowledge in high-level languages or graphical user interfaces so that the data mining task can be defined by non-experts and the discovered knowledge can be understandable and precisely available by users. This also needed the discovery system to select expressive knowledge representation techniques.
Interactive mining knowledge at multiple abstraction levels − Because it is complex to predict what exactly can be discovered from a database, a high-level data mining query must be considered as a probe that can disclose some interesting traces for further exploration.
Interactive discovery must be encouraged, which enables a user to interactively refine a data mining request, dynamically change data focusing, progressively deepen a data mining process, and flexibly view the information and data mining results at several abstraction levels and from multiple angles.
Mining information from different sources of data − The broadly available local and wide-area computer network, such as the Internet, and can connect various sources of data and form huge distributed, heterogeneous databases. Mining knowledge from multiple sources of formatted or unformatted information with diverse data semantics poses a new requirement to data mining.
Otherwise, data mining can help disclose the high-level data regularities in heterogeneous databases which can barely be discovered by simple query systems. Furthermore, the huge size of the database, the broad distribution of data, and the computational complexity of several data mining methods motivate the advancement of parallel and distributed data mining algorithms.