What is the basic method of attribute subset selection?

Attribute subset selection decreases the data set size by eliminating irrelevant or redundant attributes (or dimensions). Attribute subset selection aims to discover a minimum set of attributes such that the resulting probability distribution of the data classes is as close as applicable to the original distribution accessing using all attributes. Data mining on a reduced set of attributes has an extra benefit. It reduces the multiple attributes occurring in the discovered patterns, provides to create the patterns simpler to understand.

For n attributes, there are 2n possible subsets. An exhaustive search for the optimal subset of attributes can be intensely expensive, particularly as n and multiple data classes increase. Thus, heuristic methods that explore a reduced search space are frequently used for attribute subset selection.

These methods are usually greedy in that while searching through attribute space, they always create what looks to be the better choice at the time. Their strategy is to make a locally optimal choice in the hope that this will lead to a globally optimal solution. Such greedy approaches are efficient in practice and can come close to estimating an optimal solution.

The best and worst attributes are generally determined using tests of statistical significance, which consider that the attributes are separate from one another. Some other attribute evaluation measures can be used, including the information gain measure used in building decision trees for classification.

There are the following methods of attribute subset selection which are as follows −

  • Stepwise forward selection − The process starts with a null set of attributes as the reduced set. The best of the original attributes is determined and added to the reduced set. At every subsequent iteration or step, the best of the remaining original attributes is inserted into the set.

  • Stepwise backward elimination − The procedure starts with the full set of attributes. At each step, it removes the worst attribute remaining in the set.

  • Combination of forward selection and backward elimination − The stepwise forward selection and backward elimination methods can be connected so that, at each step, the procedure chooses the best attribute and eliminate the worst from among the remaining attributes.

  • Decision tree induction − Decision tree algorithms including ID3, C4.5, and CART, were originally designed for classification. Decision tree induction constructs a flowchart-like structure where each internal (non-leaf) node denotes a test on an attribute, each branch corresponds to an outcome of the test, and each external (leaf) node denotes a class prediction. At each node, the algorithm chooses the “best” attribute to partition the data into individual classes.