# What is Hoeffding Tree Algorithm?

The Hoeffding tree algorithm is a decision tree learning method for stream data classification. It was initially used to track Web clickstreams and construct models to predict which Web hosts and Web sites a user is likely to access. It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners.

It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound (or additive Chernoff bound).

Suppose we make N independent observations of a random variable r with range R, where r is an attribute selection measure. (For a probability, R is one, and for an information gain, it is log c, where c is the number of classes.) In the case of Hoeffding trees, r is information gain. If we compute the mean, r’, of this sample, the Hoeffding bound states that the true mean of r is at least r’−ε, with probability 1−δ, where δ is user-specified and

$$\varepsilon=\sqrt{\frac{R^{2}ln\frac{1}{\delta}}{2N}}$$

The Hoeffding Tree algorithm uses the Hoeffding bound to determine, with high probability, the smallest number, N, of examples needed at a node when selecting a splitting attribute. The Hoeffding bound is independent of the probability distribution, unlike most other bound equations. This is desirable, as it may be impossible to know the probability distribution of the information gain, or whichever attribute selection measure is used.

The algorithm takes as input a sequence of training examples, S, described by attributes A, and the accuracy parameter, δ. The evaluation function G(Ai) is supplied, which could be information gain, gain ratio, Gini index, or some other attribute selection measure. At each node in the decision tree, we need to maximize G (Ai) for one of the remaining attributes,Ai. The goal is to find the smallest number of tuples, N, for which the Hoeffding bound is satisfied.

The algorithm takes as input a sequence of training examples, S, described by attributes A, and the accuracy parameter, δ. The evaluation function G(Ai) is supplied, which could be information gain, gain ratio, Gini index, or some other attribute selection measure. At each node in the decision tree, we need to maximize G (Ai) for one of the remaining attributes,Ai. The goal is to find the smallest number of tuples, N, for which the Hoeffding bound is satisfied.

For a given node, let Aa be the attribute that achieves the highest G, and Abbe the attribute that achieves the second-highest G. If G(Aa ) − G(Ab) > ε, where ε is calculated.

The only statistics that must be maintained in the Hoeffding tree algorithm are the counts nijk for the value vj of attribute Ai with class label yk. Therefore, if d is the number of attributes, v is the maximum number of values for any attribute, c is the number of classes, and l is the maximum depth (or the number of levels) of the tree, then the total memory required is O (ldvc).