Exercise 7 Completing the Huffman Codec ADT
To complete the C The other ADT in Huffman.cc is called the Huffman Codec. The word “codec” is a contraction of two words: “COde” and “DECode”. This ADT does the work of coding and decoding using a Huffman tree. The Huffman Codec is a record that stores two records: the Huffman Tree thatwas created in the previous exercise, and a codebook, which was created by analysing the Huffman tree. The codebook has a place to store a C-string for every ASCII character, though most of the codebook will be filled with NULL references; only the characters in the original message will have a code. The function find_codes() builds the codebook using the Huffman Tree.
In this exercise, you’ll complete the encode() operations for the Huffman Codec. Here’s what it looks like currently:
1 | // encode(hcdc, message) |
Your job is to complete this function.
Hints
The operation has to encode each character in the given message. The output of this function is a reference to a C-string, so we have to allocate a C-string big enough to fill with the encoded message. Use a large size, just to be sure.
To encode the message, you have to look at each character in the message, and copy its code from the Codec’s codebook to the encoded C-string. It’s possible to use strcat() for this, but it’s just as easy to write your own loop to copy the code for a character from the codebook to the coded C-string. Remember that you’re looking at the message one character at a time, but you are copying several characters into the coded C-string.
The encoded C-string is probably twice as long as the original message. That’s because we’re using ‘0’ and ‘1’ in the codebook; a real compression utility would use bits 0 and 1, not characters. That’s okay for us; seeing the code is much easier to debug.
Exercise Summary
Complete encode() as indicated.
Build the testADT app, and run it.
If your code does the right things, tests 45-48 should all report “Passed.”
Exercise 8 Completing the Huffman Codec ADT
To complete the Huffman Codec, we need an algorithm to decode a coded string. The operation to decode a coded string is called decode(). We’ve given it to you in the file Huffman.cc. It calls a function called decode_char(), whose job it is to step through the Huffman tree, starting at the root, and winding up at a leaf; decode_char() returns the character stored in the Frequency record stored at the leaf. (We cannot use the codebook for decoding! We have to use the Huffman Tree. To decode a character using the Huffman tree requires us to step down from tbhe root to a leaf, which should be fairly short. The complexity of stepping through the tree is O(h) where h is the height of the Huffman Tree. If we used the codebook, we’d have to look at each possible code, and try to figure out if the code we’re looking at is a code in the codebook. This would require at least O(nh) time, where n is the number of characters in the message. So use the tree!
Here’s what decode_char() looks like currently:
1 | // decode_char(tnode, message, d) { |
Your job is to complete this function.