Read the instructions carefully before writing any code!!
In this phase of the request, you will:
- Implement your proposed solution to the problem
- Implement a data structure designed specifically for this application (called a “union find” or “disjoint set”)
- Show that both solutions solve this problem
- Compare the running times of the two data structures on a pathological case
Part 1
Create an interface for the ADT described in phase 1. We’re not going to include listMembers, so don’t include it in your interface!
Part 2
Write a class that implements that interface using the strategy you came up with in phase 1. It should be generic because we’ll be testing with Strings (the stated problem in phase 1), and Integers (for testing timings in the pathological case)
Part 3
Write a “union find” data structure that implements the same interface as your solution. The Union find works as follows:
The union find data structure is a “forest” data structure (a bunch of trees). A node stores a value, its “rank,” and its parent. The rank is essentially the height of the subtree rooted at that node.
Each element in your data structure (in our example problem, an element is a student’s name), will have an associated node. You’ll need to be able to efficiently retrieve an element’s node; think about how you might achieve this.
The ADT operations can be implemented via the following pseudocode:
MakeSet(T data): create a node containing data. It will be its own parent and will have rank 0.
CombineGroup(T a, T b): Compute repA = find(a), repB = find(b). If repA and repB are the same, then a and b are already in the same set and you don’t need to do anything. Otherwise, assume repA lower rank than repB. set repA’s parent to be repB (if repB has lower rank, do the opposite). If the ranks are equal, increase the rank of whichever node is the new parent. For example, if repA and repB both have rank 2, you could set repA’s parent to be repB, and repB’s new rank would be 3 (repA’s rank doesn’t change).
Find(T data): You probably want to implement this using a helper method that takes and returns a node:1
21
Node<T> findNode(Node<T> n):
if n.parent == n, then n is the representative of its group, so return it
otherwise: set n.parent to findNode(n.parent) and then return n’s new parent.
This method “flattens” all the nodes in the tree on the path from n to the root (its representative), and make any subsequent find operations super fast. After calling find, n and all of its ancestors will be connected directly to the root!
GetGroup(T data): this method can be implemented using a traversal
Part 3
Write a function called groupStudents that takes the interface you defined as a parameter. This will allow you to call the function with either YOUR implementation OR the UnionFind data structure. This function should load the input file, use the data structure to compute and then print the list of students in each group. Note: there is no operation in the ADT that gives you the list of groups, so you’ll have to figure out how you can use the supported operations to do so.
Here is a sample input file: sampleInput.txt
Part 4
Compare the two implementations for the pathological case described below. In this example, all of the elements will be combined into a single group as slowly as possible.
The elements will be the integers from 0 to n.
Combine the groups as follows until there is a single group
First combine 0 and 1, 2 and 3, 4 and 5, etc, so you have n/2 groups with 2 elements each
Then combine 0 and 2, 4 and 6, 8 and 10, etc. You’ll have n/4 groups with 4 elements each repeat this process until everything is in a single group.
This test case will require lots and lots of unions with groups of equal size, which will be the worst case for either implementation for the ADT.
Time this testcase for various values of n, for both implementations and report your results.