Weka - Feature Selection



When a database contains a large number of attributes, there will be several attributes which do not become significant in the analysis that you are currently seeking. Thus, removing the unwanted attributes from the dataset becomes an important task in developing a good machine learning model.

You may examine the entire dataset visually and decide on the irrelevant attributes. This could be a huge task for databases containing a large number of attributes like the supermarket case that you saw in an earlier lesson. Fortunately, WEKA provides an automated tool for feature selection.

This chapter demonstrate this feature on a database containing a large number of attributes.

Loading Data

In the Preprocess tag of the WEKA explorer, select the labor.arff file for loading into the system. When you load the data, you will see the following screen −

Loading Data

Notice that there are 17 attributes. Our task is to create a reduced dataset by eliminating some of the attributes which are irrelevant to our analysis.

Features Extraction

Click on the Select attributesTAB.You will see the following screen −

Select Attributes

Under the Attribute Evaluator and Search Method, you will find several options. We will just use the defaults here. In the Attribute Selection Mode, use full training set option.

Click on the Start button to process the dataset. You will see the following output −

Start Dataset

At the bottom of the result window, you will get the list of Selected attributes. To get the visual representation, right click on the result in the Result list.

The output is shown in the following screenshot −

Screenshot Output

Clicking on any of the squares will give you the data plot for your further analysis. A typical data plot is shown below −

Data Plot

This is similar to the ones we have seen in the earlier chapters. Play around with the different options available to analyze the results.

What’s Next?

You have seen so far the power of WEKA in quickly developing machine learning models. What we used is a graphical tool called Explorer for developing these models. WEKA also provides a command line interface that gives you more power than provided in the explorer.

Clicking the Simple CLI button in the GUI Chooser application starts this command line interface which is shown in the screenshot below −

Gui Chooser

Type your commands in the input box at the bottom. You will be able to do all that you have done so far in the explorer plus much more. Refer to WEKA documentation (https://www.cs.waikato.ac.nz/ml/weka/documentation.html) for further details.

Lastly, WEKA is developed in Java and provides an interface to its API. So if you are a Java developer and keen to include WEKA ML implementations in your own Java projects, you can do so easily.

Conclusion

WEKA is a powerful tool for developing machine learning models. It provides implementation of several most widely used ML algorithms. Before these algorithms are applied to your dataset, it also allows you to preprocess the data. The types of algorithms that are supported are classified under Classify, Cluster, Associate, and Select attributes. The result at various stages of processing can be visualized with a beautiful and powerful visual representation. This makes it easier for a Data Scientist to quickly apply the various machine learning techniques on his dataset, compare the results and create the best model for the final use.

Advertisements