What are the basic concepts of data mining?

Data mining is the process of finding useful new correlations, patterns, and trends by transferring through a high amount of data saved in repositories, using pattern recognition technologies including statistical and mathematical techniques. It is the analysis of factual datasets to discover unsuspected relationships and to summarize the records in novel methods that are both logical and helpful to the data owner.

There are various concepts of data mining which are as follows −

Classification − Classification is the procedure of discovering a model that represents and distinguishes data classes or concepts, for the objective of being able to use the model to predict the class of objects whose class label is anonymous. The derived model is based on the analysis of a group of training records (i.e., data objects whose class label is familiar).

Predictions − Prediction is the same as classification, except that for prediction, the results are misrepresented in the future.

Examples of prediction functions in business and research include −

  • It can be predicting the value of a stock three months into the future.

  • It can be predicting the percentage increase in traffic deaths next year if the speed limit is raised.

  • It can be predicting the winner of this fall’s baseball World Series, based on a correspondence of team statistics.

  • It can be predicted whether a definite molecule in drug discovery will begin a cost-effective new drug for a pharmaceutical company.

Association Rules and Recommendation Systems − Association rules, or affinity analysis, are designed to find such general associations patterns between items in large databases. The rules can be used in several methods. For example, grocery stores can use such information for product placement.

They can use the rules for weekly promotional offers or for bundling products. Association rules derived from a hospital database on patients’ symptoms during consecutive hospitalizations can help find “which symptom is followed by what other symptom” to help predict future symptoms for returning patients.

Data Reduction − Data mining is used to the selected data in a huge amount database. When data analysis and mining is completed on a huge amount of records then it takes a very high time to process, which develops it impossible and infeasible.

It can reduce the processing time for data analysis, data reduction techniques are used to obtain a reduced representation of the dataset that is much smaller in volume by maintaining the integrity of the original data. By reducing the data, the efficiency of the data mining process is improved which produces the same analytical results.

Data reduction aims to define it more compactly. When the data size is smaller, it is easier to use mature and computationally high-cost algorithms. The reduction of the data may be in terms of the number of rows (records) or terms of the number of columns (dimensions).