Weka - Introduction

The foundation of any Machine Learning application is data - not just a little data but a huge data which is termed as Big Data in the current terminology.

To train the machine to analyze big data, you need to have several considerations on the data −

  • The data must be clean.
  • It should not contain null values.

Besides, not all the columns in the data table would be useful for the type of analytics that you are trying to achieve. The irrelevant data columns or ‘features’ as termed in Machine Learning terminology, must be removed before the data is fed into a machine learning algorithm.

In short, your big data needs lots of preprocessing before it can be used for Machine Learning. Once the data is ready, you would apply various Machine Learning algorithms such as classification, regression, clustering and so on to solve the problem at your end.

The type of algorithms that you apply is based largely on your domain knowledge. Even within the same type, for example classification, there are several algorithms available. You may like to test the different algorithms under the same class to build an efficient machine learning model. While doing so, you would prefer visualization of the processed data and thus you also require visualization tools.

In the upcoming chapters, you will learn about Weka, a software that accomplishes all the above with ease and lets you work with big data comfortably.