How are metarules useful in data mining?

Data mining is the process of finding useful new correlations, patterns, and trends by transferring through a high amount of data saved in repositories, using pattern recognition technologies including statistical and mathematical techniques. It is the analysis of factual datasets to discover unsuspected relationships and to summarize the records in novel methods that are both logical and helpful to the data owner.

It is the procedure of selection, exploration, and modeling of high quantities of information to find regularities or relations that are at first unknown to obtain clear and beneficial results for the owner of the database.

Data Mining is similar to Data Science. It is carried out by a person, in a particular situation, on a specific data set, with an objective. This phase contains several types of services including text mining, web mining, audio and video mining, pictorial data mining, and social media mining. It is completed through software that is simple or greatly specific.

Metarules enables users to define the syntactic form of rules that they are involved in mining. The rule forms can be used as constraints to provide improve the effectiveness of the mining phase. Metarules can be based on the analyst’s experience, expectations, or intuition concerning the data or can be automatically generated depends on the database schema.

Metarule-guided mining − Consider that as a market analyst for AllElectronics, it can have access to the data defining customers (including customer age, address, and credit rating) and the list of customer transactions.

It can be finding associations among customer traits and the items that customers purchase. However, instead of finding some association rules reflecting these relationships, it is interested only in deciding which pairs of customer traits enhance the sale of office software.

An example of such a metarule is

P1(X, Y)∧ P2(X, W) ⇒ buys(X, “office software”)

where P1 and P2 are predicate variables that are instantiated to attributes from the given database during the mining phase, X is a variable defining a customer, and Y and W take on values of the attributes assigned to P1 and P2, accordingly.

Generally, a user can define a list of attributes to be treated for instantiation with P1and P2. Therefore, a default set can be used.

In general, a metarule forms a hypothesis concerning the relationships that the user is implicated in perceptive or confirming. The data mining system can search for rules that connect the given metarule. For example,

age(X, “30...39”)∧income(X, “41K...60K”) ⇒ buys(X, “office software”)