What are the characteristics of Decision tree induction?

Data MiningDatabaseData Structure

There are various characteristics of decision tree induction is as follows −

Decision tree induction is a nonparametric method for constructing classification models. In other terms, it does not need some previous assumptions regarding the type of probability distributions satisfied by the class and the different attributes.

It can be finding an optimal decision tree is an NP-complete problem. Many decision tree algorithms employ a heuristic-based approach to guide their search in the vast hypothesis space.

There are various techniques developed for constructing computationally inexpensive decision trees, making it possible to quickly construct models even when the training set size is very large. Moreover, because a decision tree has been developed, defining test data is completely fast, with a worst-case complexity of O(w), where w is the maximum depth of the tree.

Decision trees, particularly smaller-sized trees, are associatively simple to execute. The efficiency of the trees is also comparable to several classification methods for several data sets.

Decision trees support an expressive description for learning discrete-valued functions. But they do not generalize well to a specific method of Boolean problems. An instance is the parity function, whose value is 0 (1) when there is an odd (even) several Boolean attributes with the value True.

The presence of redundant attributes does not influence the effectiveness of decision trees. An attribute is redundant if it is powerfully correlated with a different attribute in the data. Two redundant attributes cannot be used for dividing because the other attribute has been selected.

But if the data set includes several irrelevant attributes, i.e., attributes that are not beneficial for the classification service, then several irrelevant attributes can be accidentally selected during the tree-growing process which results in a decision tree that is larger than necessary. Feature selection techniques can help to improve the accuracy of decision trees by eliminating the irrelevant attributes during preprocessing.

Because several decision tree algorithms use top-down, recursive partitioning methods, the multiple data becomes smaller as it can traverse down the tree. At the leaf nodes, the several data can be too small to create a statistically significant decision about the class description of the nodes. This is called the data fragmentation problem. One possible solution is to disallow more splitting when the multiple data falls below a specific threshold.

A subtree can be repeated several times in a decision tree. This creates the decision tree more difficult than necessary and possibly more complex to execute. Various situations can increase from decision tree execution that depends on a single attribute test condition at every internal node.

Some decision tree algorithms need a divide-and-conquer partitioning approaches, the similar test condition can be used to multiple parts of the attribute space, therefore Ieading to the subtree replication issues.

Updated on 11-Feb-2022 11:56:45