- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

The Hoeffding tree algorithm is a decision tree learning method for stream data classification. It was initially used to track Web clickstreams and construct models to predict which Web hosts and Web sites a user is likely to access. It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners.

It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound (or additive Chernoff bound).

Suppose we make N independent observations of a random variable r with range R, where r is an attribute selection measure. (For a probability, R is one, and for an information gain, it is log c, where c is the number of classes.) In the case of Hoeffding trees, r is information gain. If we compute the mean, r’, of this sample, the Hoeffding bound states that the true mean of r is at least r’−ε, with probability 1−δ, where δ is user-specified and

$$\varepsilon=\sqrt{\frac{R^{2}ln\frac{1}{\delta}}{2N}} $$

The Hoeffding Tree algorithm uses the Hoeffding bound to determine, with high probability, the smallest number, N, of examples needed at a node when selecting a splitting attribute. The Hoeffding bound is independent of the probability distribution, unlike most other bound equations. This is desirable, as it may be impossible to know the probability distribution of the information gain, or whichever attribute selection measure is used.

The algorithm takes as input a sequence of training examples, S, described by attributes A, and the accuracy parameter, δ. The evaluation function G(A_{i}) is supplied, which could be information gain, gain ratio, Gini index, or some other attribute selection measure. At each node in the decision tree, we need to maximize G (A_{i}) for one of the remaining attributes,A_{i}. The goal is to find the smallest number of tuples, N, for which the Hoeffding bound is satisfied.

The algorithm takes as input a sequence of training examples, S, described by attributes A, and the accuracy parameter, δ. The evaluation function G(A_{i}) is supplied, which could be information gain, gain ratio, Gini index, or some other attribute selection measure. At each node in the decision tree, we need to maximize G (A_{i}) for one of the remaining attributes,A_{i}. The goal is to find the smallest number of tuples, N, for which the Hoeffding bound is satisfied.

For a given node, let A_{a} be the attribute that achieves the highest G, and Abbe the attribute that achieves the second-highest G. If G(A_{a} ) − G(A_{b}) > ε, where ε is calculated.

The only statistics that must be maintained in the Hoeffding tree algorithm are the counts n_{ijk} for the value v_{j} of attribute A_{i} with class label y_{k}. Therefore, if d is the number of attributes, v is the maximum number of values for any attribute, c is the number of classes, and l is the maximum depth (or the number of levels) of the tree, then the total memory required is O (ldvc).

- Related Questions & Answers
- What is Dijikstra Algorithm?
- What is Parallel Algorithm?
- What is Congestion Control Algorithm?
- What is Syntax Tree?
- What is Distance Vector Routing Algorithm?
- Kruskal’s Minimum Spanning Tree Algorithm
- Prim’s Minimum Spanning Tree Algorithm
- C++ Program to Implement Expression Tree Algorithm
- What is algorithm for computing the CRC?
- What is a Non-Adaptive Routing Algorithm?
- What is a Decision Tree?
- Kruskal’s (Minimum Spanning Tree) MST Algorithm
- Prim’s (Minimum Spanning Tree) MST Algorithm
- What is a Routing Algorithm in Computer Network?
- What is Booth Multiplication Algorithm in Computer Architecture?

Advertisements