Exercise 4 Constructor for FrequencyList
The Huffman tree algorithm uses a list of trees, each tree storing a number of Frequency records. The way we’ll manage this list is to build a “wrapper” ADT around the List ADT.
The file FrequencyList.cc contains the implementation of this wrapper ADT. Notice that most of the List operations are simply delegations to the contained List.
We want to give the wrapper a new constructor operation, and a new operation called remove_smallest(), which is the next exercise. Because these are very specific operations for the Huffman tree algorithm, it is better to build a wrapper ADT than to put these special purpose operations in the List ADT.
The FrequencyList constructor currently looks like this:
1 | // createFrequencyList(message) |
The constructor should do the following:
- Walk through the message, character by character, counting how many times each character is used. Use an integer array for this.
- Once the whole message has been seen, use the array to build a Frequency record for each character seen at least once.
- Build a TreeNode for each Frequency record, and then store the TreeNode in the List. Each Tree will be a leaf node for now. We’ll use the initialized list in a later exercise.
- Return the FrequencyList full of TreeNodes.
In the lecture slides on Huffman trees, the character and frequency data were attributes of the Tree record itself.
In this request, we are using Frequency records to store the character and frequency values. This adds another ADT to think about, but it allows us to reuse the ordinary tree that simply stores a single Element. This is good software reuse (of Trees) and adaptability: if we want to store other data in the Tree, we can change the Frequency ADT, and most if not all of the rest of the application would not need to change.
Hints on counting characters
Each character you can type or use on a computer has an integer value; for letters and digits and normal keyboard symbols, this is called the ASCII value. For example, the letter ‘c’ has an ASCII value of 99. Because of this, we can use a character as an index into a characeter count array like this: counts[‘c’]. We can do this for any character, whether it’s ‘c’ or ‘&’.
To calculate the frequency of each letter in a message, we use an array of counts (initialized to 0). Each time we observe a character in a message, we can increase its count. For example to increase the count for the letter ‘c’, we would do the following:1
counts['c'] += 1;
This will increase the count of the integer at index 99 (the one corresponding to the letter ‘c’). The letter ‘c’ is the 99th character in the ASCII character set, and its counter is the 100th in our counts array. To initialize the counts array, we have the following:
1 | int counts[ASCII_SIZE]; // ASCII_SIZE is defined in LOCALE.h |
Here we’re using integers, not characters, to index into the array. This is okay, because in a sense, characters are integers. Characters have special status when we want to think of them as characters, but they also have integer values which we use when that’s handy.
One problem remains: with compiler flags -Wall and -pedantic, the compiler may complain when you use a character as an array index:
warning: array subscript has type ‘char’ [-Wchar-subscripts]
To get rid of these warnings, we can use a technique called typecasting. Typecasting tells the compiler to treat a variable of one type as though it has a different type. For example, we can typecast a character to an integer as follows:1
counts[(int) 'c'] += 1;
The use of (int) in brackets in front of a value tells the compiler that it’s not a mistake to try to use the value ‘c’ as an integer (99). Typecasting can also be done when you want to force a float to be truncated into a integer, or something like that. We’ll only use it for characters in this asignment.
Exercise Summary
- Complete createFrequencyList() as indicated.
- Build the testADT app, and run it.
- If your code does the right things, tests 33-34 should all report “Passed.”