How to construct a decision tree?

A decision tree is a flow-chart-like tree mechanism, where each internal node indicates a test on an attribute, each department defines an outcome of the test, and leaf nodes describe classes or class distributions. The largest node in a tree is the root node.

The issues of constructing a decision tree can be defined recursively. First, select an attribute to place at the root node, and make one branch for each possible value. This divides up the example set into subsets, one for each value of the attribute. The procedure can be repeated recursively for every branch, utilizing only those instances that reach the department. If some instances at a node have a similar classification, stop creating that element of the tree.

The measure of purity that we will use is called the information and is measured in units called bits. Associated with each node of the tree, it represents the expected amount of information that would be needed to specify whether a new instance should be classified as yes or no, given that the instances reached that node.

Pruning is the procedure that decreases the size of decision trees. It is used to reduce the risk of overfitting by describing the size of the tree or removing areas of the tree that provides little power. Pruning provides by trimming the departments that follow anomalies in the training data because of noise or outliers and provides the initial tree in a method that improves the generalization effectiveness of the tree.

Several methods frequently use statistical measures to remove the least reliable departments, frequently resulting in faster classification and an enhancement in the capability of the tree to accurately classify independent test data.

Algorithms for learning Decision Trees

Algorithm − Create a decision tree from the given training information.

Input − The training samples, samples, described by discrete-valued attributes; the set of students attributes, attribute-list.

Output − A decision tree.


  • Create a node N;

  • If samples are some same class, C therefore

  • Return N as a leaf node labeled with the class C

  • If the attribute-list is null then

  • Return N as a leaf node labeled with the most frequent class in samples. // majority voting

  • Choose test-attribute, the attribute between attribute-list with the largest information gain.

  • Label node N with test attribute.

  • For each known value ai of test-attribute // partition the samples.

  • Grow a branch from node N for the condition test-attribute= ai.

  • Let si be the set of samples in samples for which test-attribute= ai.

  • If si is empty then

  • It can be linked to a leaf labeled with the most common class in samples.

  • Else attach the node returned by Generate decision tree ( si, attribute-list - test-attribute)