CSCI2110-Huffman-Coding

Requirement

The objective of this request is to implement the Huffman coding algorithm using the binary tree data structure.

Download BinaryTree.java, Frequency.txt and Pokemon.txt files given next to the request link.

Problem Summary

You are given a table of letters of the English alphabet and their frequencies. Build a Huffman tree with the alphabet symbols and their probabilities. Derive the Huffman codes. Using the codes, encode a given text file with the codes. Decode the encoded text file and show that it is the same as the input text file.

Problem in Detail

In order to help you with the request, here’s the Huffman algorithm step-by-step procedure (as discussed in the lectures).

Step 1

Read the text file frequency.txt. Its link is given next to this lab document. It contains the frequency of letters in the English alphabet based on a sample of 40,000 words as shown below. (The file actually contains each letter and its frequency on two separate lines).

To do this step, you will find it useful to create a class called Pair.java that defines the letter and its probability as an object.

1
2
3
4
5
6
7
8
public class Pair {
private char letter;
private double prob;

//constructor
//get and set methods
//toString method
}

Step 2

Using this set of letters and frequencies, build the Huffman tree.

Step 3.1

Create a queue of Binary Tree nodes. Each Binary Tree node is of type Pair. The queue can be implemented as another simple Arraylist, where enqueue means adding an item to the end of the Arraylist and dequeue means removing the item at index 0. That is, the queue is an Arraylist of type <BinaryTree>. The queue contains these sorted according to the increasing order of their frequencies. This is your Queue S. This is done by checking the Arraylist freqs for values in increasing order, creating the binary tree nodes and enqueueing them in the queue.

If you enumerate the Queue S, it should have the Pair objects in increasing order of their frequencies

Step 3.2

Now initialize another queue T (another Arraylist) of type <BinaryTree>.

Step 3.3

Build the Huffman tree according to the algorithm discussed in the lectures.

For instance, in the above example, first (‘Z’, 0.07) and (‘J’, 0.10), will be dequeued from S. Create a node with the combined frequency. What do you put as the character for the combined node? You can put a dummy character, say ‘&’. So (‘&’,0.17) will be the parent node, and (‘Z’, 0.07) and (‘J’, 0.10), will be the left and right children. This tree will be enqueued to Queue T.

Submit a zip file containing all the source codes (.java files), Frequency.txt, Pokemon.txt, Encoded.txt and Decoded.txt.