How are decision trees used for classification?

Decision tree induction is the learning of decision trees from class-labeled training tuples. A decision tree is a sequential diagram-like tree structure, where every internal node (non-leaf node) indicates a test on an attribute, each branch defines a result of the test, and each leaf node (or terminal node) influence a class label. The highest node in a tree is the root node.

It defines the concept buys computer, i.e., it predicts whether a user at AllElectronics is likely to buy a computer. Internal nodes are indicated by rectangles, and leaf nodes are indicated by ovals. There are various decision tree algorithms create only binary trees (where every internal node branches to accurately two other nodes), whereas others can create non-binary trees.

Given a tuple, X, for which the related class label is anonymous, the attribute values of the tuple are checked against the decision tree. A direction is traced from the root to a leaf node, which influence the class prediction for that tuple. Decision trees can be changed to classification rules.

The development of decision tree classifiers does not need some domain knowledge or parameter setting, and thus is suitable for exploratory knowledge discovery.

Decision trees can manage large dimensional data. Their description of acquired knowledge in tree form is intuitive and usually simple to understand by humans. The learning and classification phase of decision tree induction are easy and quick.

In general, decision tree classifiers have good efficiency. However, successful use can based on the data at hand. Decision tree induction algorithms have been used for classification in several application areas, including medicine, manufacturing and production, monetary analysis, astronomy, and molecular biology. Decision trees are based on multiple commercial rule induction systems.

During tree construction, attribute selection measures are used to choose the attribute that best partitions the tuples into different classes. When decision trees are constructed, some branches can reflect noise or outliers in the training records. Tree pruning tries to recognize and eliminate such branches, with the aim of improving classification accuracy on unview data.

ID3, C4.5, and CART ratify a greedy (i.e., non-backtracking) method in which decision trees are built in a top-down recursive divide-and-conquer method. Several algorithms for decision tree induction also follow such a top-down method, which begins with a training collection of tuples and their related class labels. The training collection is recursively division into smaller subsets as the tree is being constructed.