# Huffman Coding

Huffman coding is lossless data compression algorithm. In this algorithm a variable-length code is assigned to input different characters. The code length is related with how frequently characters are used. Most frequent characters have smallest codes, and longer codes for least frequent characters.

There are mainly two parts. First one to create Huffman tree, and another one to traverse the tree to find codes.

For an example, consider some strings “YYYZXXYYX”, the frequency of character Y is larger than X and the character Z has least frequency. So the length of code for Y is smaller than X, and code for X will be smaller than Z.

• Complexity for assigning code for each character according to their frequency is O(n log n)

Input − A string with different characters, say “ACCEBFFFFAAXXBLKE”
Output − Code for different characters:

Data: K, Frequency: 1, Code: 0000
Data: L, Frequency: 1, Code: 0001
Data: E, Frequency: 2, Code: 001
Data: F, Frequency: 4, Code: 01
Data: B, Frequency: 2, Code: 100
Data: C, Frequency: 2, Code: 101
Data: X, Frequency: 2, Code: 110
Data: A, Frequency: 3, Code: 111

## Algorithm

### huffmanCoding(string)

Input − A string with different characters.

Output − The codes for each individual characters.

Begin
define a node with character, frequency, left and right child of the node for Huffman tree.
create a list ‘freq’ to store frequency of each character, initially all are 0
for each character c in the string do
increase the frequency for character ch in freq list.
done
for all type of character ch do
if the frequency of ch is non zero then add ch and its frequency as a node of priority queue Q.
done
while Q is not empty do
remove item from Q and assign it to left child of node
remove item from Q and assign to the right child of node
traverse the node to find the assigned code
done
End

### traverseNode(n: node, code)

Input − The node n of Huffman tree, and code assigned from previous call

Output − Code assigned with each character

if left child of node n ≠ φ then
traverseNode(leftChild(n), code+’0’) //traverse through the left child
traverseNode(rightChild(n), code+’1’) //traverse through the right child
else
display the character and data of current node.

## Example

#include<iostream>
#include<queue>
#include<string>
using namespace std;
struct node{
int freq;
char data;
const node *child0, *child1;
node(char d, int f = -1){ //assign values in the node
data = d;
freq = f;
child0 = NULL;
child1 = NULL;
}
node(const node *c0, const node *c1){
data = 0;
freq = c0->freq + c1->freq;
child0=c0;
child1=c1;
}
bool operator<( const node &a ) const { //< operator performs to find priority in queue
return freq >a.freq;
}
void traverse(string code = "")const{
if(child0!=NULL){
child0->traverse(code+'0'); //add 0 with the code as left child
child1->traverse(code+'1'); //add 1 with the code as right child
}else{
cout << "Data: " << data<< ", Frequency: "<<freq << ", Code: " << code<<endl;
}
}
};
void huffmanCoding(string str){
priority_queue<node> qu;
int frequency;
for(int i = 0; i<256; i++)
frequency[i] = 0; //clear all frequency
for(int i = 0; i<str.size(); i++){
frequency[int(str[i])]++; //increase frequency
}
for(int i = 0; i<256; i++){
if(frequency[i]){
qu.push(node(i, frequency[i]));
}
}
while(qu.size() >1){
node *c0 = new node(qu.top()); //get left child and remove from queue
qu.pop();
node *c1 = new node(qu.top()); //get right child and remove from queue
qu.pop();
qu.push(node(c0, c1)); //add freq of two child and add again in the queue
}
cout << "The Huffman Code: "<<endl;
qu.top().traverse(); //traverse the tree to get code
}
main(){
string str = "ACCEBFFFFAAXXBLKE"; //arbitray string to get frequency
huffmanCoding(str);
}

## Output

The Huffman Code:
Data: K, Frequency: 1, Code: 0000
Data: L, Frequency: 1, Code: 0001
Data: E, Frequency: 2, Code: 001
Data: F, Frequency: 4, Code: 01
Data: B, Frequency: 2, Code: 100
Data: C, Frequency: 2, Code: 101
Data: X, Frequency: 2, Code: 110
Data: A, Frequency: 3, Code: 111

Updated on: 05-Aug-2019

5K+ Views 