What are the methods for constructing an Ensemble Classifier?

The concept is to build multiple classifiers from the initial data and then aggregate their predictions when describing unknown examples. The ensemble of classifiers can be constructed in several methods which are as follows −

By manipulating the training set − In this method, multiple training sets are generated by resampling the initial data as per some sampling distribution. The sampling distribution decides how likely it is that instances will be chosen for training, and it can change from one trial to another. A classifier is constructed from each training set using a specific learning algorithm. Bagging and boosting are instances of ensemble methods that manipulate their training sets.

By manipulating the input features − In this method, a subset of input features is selected to form every training set. The subset can be selected randomly or depends on the recommendation of domain professionals. Several studies have a display that this method works very well with data sets that include hugely redundant features. Random forest is an ensemble technique that manipulates its input features and needs decision trees as its base classifiers.

By manipulating the class labels − This method can be used when the several classes are adequately large. The training data is changed into a binary class problem by randomly subdividing the class labels into two disjoint subsets, such as A0 and A1.

Training instances whose class labels apply to the subset A0 are defined to class 0, while those that apply to the subset A1 are defined to class 1. The relabeled instances are used to train a base classifier. By recurrent, the class-relabeling and model-building steps several times, an ensemble of base classifiers is acquired.

When a test instance is presented, each base classifier Ci can predict its class label. If the test instances are predicted as class 0, therefore all the classes that apply to A0 will get a vote.

By manipulating the learning algorithm − Several learning algorithms can be manipulated in such a method that using the algorithm several times on the equal training data can result in multiple models. For instance, an artificial neural network can make several models by modifying its network topology or the original weights of the connections between neurons. Similarly, an ensemble of decision trees can be assembled by injecting randomness into the tree-growing process.

The first three methods are generic techniques that are pertinent to some classifiers, whereas the fourth method is based on the type of classifier used. The base classifiers methods can be created sequentially (one after another) or in parallel (all at once).

The first process is to produce a training set from the initial data D. It is based on the type of ensemble approaches used, the training sets are exact to or slight conversion of D. The size of the training set is maintained the same as the initial data, but the distribution of instances cannot be identical i.e., some instances can occur various times in the training set, while others cannot occur even once.