Huffman Codes and Entropy in Data Structure

Huffman Code

A Huffman code is defined asa particular type of optimal prefix code that is commonly used for lossless data compression.

The process of finding or implementing such a code proceeds by means of Huffman coding, an algorithm which was developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

The output from Huffman's algorithm can be displayed as a variable-length code table for encoding a source symbol (such as a character in a file). The algorithm creates this table from the estimated probability or frequency of occurrence (weight) for each possible value of the source symbol. As in other entropy encoding methods, more common symbols are generally represented implementing fewer bits than less common symbols. Method of Huffman can be efficiently implemented, finding a code in time linear to the number of input weights if these weights are sorted.


In information theory, Shannon's source coding theorem (or noiseless coding theorem) is able to establish the limits to possible data compression, and the operational meaning of the Shannon entropy.

The source coding theorem displays that (in the limit, as the length of a stream of independent and identically-distributed random variable (i.i.d.) data tends to infinity) it is not possible to compress the data such that the code rate (average number of bits per symbol) is smaller than the Shannon entropy of the source, without it being virtually certain that information will be lost. However, it is possible to obtain the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.

Information entropy is defined as the average rate at which information is produced by a stochastic source of data.

Calculate the Entropy for a Random Variable

We can also calculate how much information there is in a random variable.

For example, if we wanted to compute the information for a random variable X with probability distribution p, this might be written as a function H(); for example:H(X)

In effect, computing the information for a random variable is the similar as computing the information for the probability distribution of the events for the random variable.

Computing the information for a random variable is denoted “information entropy,” “Shannon entropy,” or simply “entropy“.

It is related to the idea of entropy from physics by analogy, in that both are concerned with term uncertainty.

The intuition for entropy is that it is defined as the average number of bits required to represent or transmit an event drawn from the probability distribution for the random variable.

The Shannon entropy of a distribution is defined as the expected amount of information in an event drawn from that distribution.

It provides a lower bound on the number of bits required on average to encode symbols drawn from a distribution P.

Entropy can be computed for a random variable X with k in K discrete states as follows

H(X) = -sum(each k in K p(k) * log(p(k)))

That means the negative of the sum of the probability of each event multiplied by the log of the probability of each event.

Like information, the log() function implements base-2 and the units are bits. A natural logarithm can be implemented instead.

The lowest entropy is computed for a random variable that has a single event with a probability of 1.0, a certainty. The largest entropy for a random variable will be possible if all events are performed equally likely.