Data mining is widely used in diverse areas. There are a number of commercial data mining system available today and yet there are many challenges in this field. In this tutorial, we will discuss the applications and the trend of data mining.
Here is the list of areas where data mining is widely used −
The financial data in banking and financial industry is generally reliable and of high quality which facilitates systematic data analysis and data mining. Some of the typical cases are as follows −
Design and construction of data warehouses for multidimensional data analysis and data mining.
Loan payment prediction and customer credit policy analysis.
Classification and clustering of customers for targeted marketing.
Detection of money laundering and other financial crimes.
Data Mining has its great application in Retail Industry because it collects large amount of data from on sales, customer purchasing history, goods transportation, consumption and services. It is natural that the quantity of data collected will continue to expand rapidly because of the increasing ease, availability and popularity of the web.
Data mining in retail industry helps in identifying customer buying patterns and trends that lead to improved quality of customer service and good customer retention and satisfaction. Here is the list of examples of data mining in the retail industry −
Design and Construction of data warehouses based on the benefits of data mining.
Multidimensional analysis of sales, customers, products, time and region.
Analysis of effectiveness of sales campaigns.
Product recommendation and cross-referencing of items.
Today the telecommunication industry is one of the most emerging industries providing various services such as fax, pager, cellular phone, internet messenger, images, e-mail, web data transmission, etc. Due to the development of new computer and communication technologies, the telecommunication industry is rapidly expanding. This is the reason why data mining is become very important to help and understand the business.
Data mining in telecommunication industry helps in identifying the telecommunication patterns, catch fraudulent activities, make better use of resource, and improve quality of service. Here is the list of examples for which data mining improves telecommunication services −
Multidimensional Analysis of Telecommunication data.
Fraudulent pattern analysis.
Identification of unusual patterns.
Multidimensional association and sequential patterns analysis.
Mobile Telecommunication services.
Use of visualization tools in telecommunication data analysis.
In recent times, we have seen a tremendous growth in the field of biology such as genomics, proteomics, functional Genomics and biomedical research. Biological data mining is a very important part of Bioinformatics. Following are the aspects in which data mining contributes for biological data analysis −
Semantic integration of heterogeneous, distributed genomic and proteomic databases.
Alignment, indexing, similarity search and comparative analysis multiple nucleotide sequences.
Discovery of structural patterns and analysis of genetic networks and protein pathways.
Association and path analysis.
Visualization tools in genetic data analysis.
The applications discussed above tend to handle relatively small and homogeneous data sets for which the statistical techniques are appropriate. Huge amount of data have been collected from scientific domains such as geosciences, astronomy, etc. A large amount of data sets is being generated because of the fast numerical simulations in various fields such as climate and ecosystem modeling, chemical engineering, fluid dynamics, etc. Following are the applications of data mining in the field of Scientific Applications −
Intrusion refers to any kind of action that threatens integrity, confidentiality, or the availability of network resources. In this world of connectivity, security has become the major issue. With increased usage of internet and availability of the tools and tricks for intruding and attacking network prompted intrusion detection to become a critical component of network administration. Here is the list of areas in which data mining technology may be applied for intrusion detection −
Development of data mining algorithm for intrusion detection.
Association and correlation analysis, aggregation to help select and build discriminating attributes.
Analysis of Stream data.
Distributed data mining.
Visualization and query tools.
There are many data mining system products and domain specific data mining applications. The new data mining systems and applications are being added to the previous systems. Also, efforts are being made to standardize data mining languages.
The selection of a data mining system depends on the following features −
Data Types − The data mining system may handle formatted text, record-based data, and relational data. The data could also be in ASCII text, relational database data or data warehouse data. Therefore, we should check what exact format the data mining system can handle.
System Issues − We must consider the compatibility of a data mining system with different operating systems. One data mining system may run on only one operating system or on several. There are also data mining systems that provide web-based user interfaces and allow XML data as input.
Data Sources − Data sources refer to the data formats in which data mining system will operate. Some data mining system may work only on ASCII text files while others on multiple relational sources. Data mining system should also support ODBC connections or OLE DB for ODBC connections.
Data Mining functions and methodologies − There are some data mining systems that provide only one data mining function such as classification while some provides multiple data mining functions such as concept description, discovery-driven OLAP analysis, association mining, linkage analysis, statistical analysis, classification, prediction, clustering, outlier analysis, similarity search, etc.
Coupling data mining with databases or data warehouse systems − Data mining systems need to be coupled with a database or a data warehouse system. The coupled components are integrated into a uniform information processing environment. Here are the types of coupling listed below −
Scalability − There are two scalability issues in data mining −
Row (Database size) Scalability − A data mining system is considered as row scalable when the number or rows are enlarged 10 times. It takes no more than 10 times to execute a query.
Column (Dimension) Salability − A data mining system is considered as column scalable if the mining query execution time increases linearly with the number of columns.
Visualization Tools − Visualization in data mining can be categorized as follows −
Data Mining query language and graphical user interface − An easy-to-use graphical user interface is important to promote user-guided, interactive data mining. Unlike relational database systems, data mining systems do not share underlying data mining query language.
Data mining concepts are still evolving and here are the latest trends that we get to see in this field −
Scalable and interactive data mining methods.
Integration of data mining with database systems, data warehouse systems and web database systems.
SStandardization of data mining query language.
Visual data mining.
New methods for mining complex types of data.
Biological data mining.
Data mining and software engineering.
Distributed data mining.
Real time data mining.
Multi database data mining.
Privacy protection and information security in data mining.
64 Lectures 10.5 hours