Why do we need KDD?

The traditional techniques of turning data into knowledge depend on manual analysis and interpretation. For instance, in the healthcare industry, it is familiar for specialists to systematically analyze current trends and changes in healthcare data, every quarter.

The specialists support a report detailing the analysis to the sponsoring healthcare organization; this report becomes the basis for future decision making and planning for healthcare management. There are several types of applications, including planetary geologists sifting through remotely sensed images of planets and asteroids, carefully situating and cataloging such geologic objects of interest as impact craters.

This form of manual probing of a data set is slow, high-priced, and highly subjective. As data volumes produce dramatically, this kind of manual data analysis is becoming completely impractical in several domains.

In business, the main KDD application areas contain marketing, finance (especially investment), fraud detection, manufacturing, telecommunications, and web agents.

Marketing − In marketing, the basic application is database marketing systems, which analyze customer databases to recognize several customer groups and forecast their behavior.

Investment − Several companies use data mining for investment, but most do not represent their systems. One exception is LBS Capital Management. Its system uses professional systems, neural nets, and genetic algorithms to handle portfolios totaling $600 million; since its start in 1993, the system has outperformed the wide stock market.

Fraud detection − HNC Falcon and Nestor PRISM systems are used for checking credit card fraud, watching over millions of accounts. The FAIS system from the U.S. Treasury Financial Crimes Enforcement Network can identify financial transactions that can denote money laundering activity.

Manufacturing − The CASSIOPEE troubleshooting system, developed as an element of a joint venture between General Electric and SNECMA. It was used by three major European airlines to diagnose and predict issues for the Boeing 737.

Telecommunications − The telecommunications alarm-sequence analyzer (TASA) was developed in cooperation with a manufacturer of telecommunications equipment and three telephone networks (Mannila, Toivonen, and Verkamo 1995). The system uses a novel infrastructure for locating frequently occurring alarm episodes from the alarm stream and displaying them as rules.

There are huge sets of discovered rules that can be explored with flexible data- retrieval tools providing interactivity and iteration. In this method, TASA provides pruning, grouping, and ordering devices to refine the results of a basic brute-force search for rules.

Data cleaning − The MERGE-PURGE system was used for the identification of duplicate welfare claims (Hernandez and Stolfo 1995). It was used strongly on data from the Welfare Department of the State of Washington.