Computer Programming for Data Scientists
- 1 Activity 1 (10 points) Contents
- 1.1 The domain classes (3 points)
- 1.2 The shopping system (3 points)
- 1.3 Doing some shopping (4 points)
- 2 Activity 2 (10 points)
- 2.1 The binary search tree class (1 point)
- 2.2 Operations on BSTs (2 points)
- 2.3 Random trees’ simulation (2 points)
- 2.4 Complexity analysis for BSTs (2 points)
- 2.5 The linked list class (1 point)
- 2.6 Operations on linked lists (1 point)
- 2.7 Comparing the two data types (1 point)
1 Activity 1 (10 points) Contents
This activity requires you to use many of the concepts learned so far on the module. You will develop a simple eCommerce system. The program has to allow the user to:
- add products;
- remove products;
- show a summary of their shopping session;
- export the cart inJSONformat.
The activity is divided in three parts that add up to 10 points. You need to submit one file called
shopping.py.
1.1 The domain classes (3 points)
First create a class Product with the following attributes:
- name;
- price;
- quantity;
- EAN identifier(this is the 13-digit number below the bar code of a product, see alsohttps:
//en.wikipedia.org/wiki/International_Article_Number); - brand.
The class Product also offers atojson()method that returns theJSONrepresentation of the
product.
Then, create three subclasses of the class Product :
- Clothing , which has the following attributes:size, andmaterial;
- Food , which has the following attributes:expirydate,glutenfree, andsuitableforvegans;
and - one additional subclass of your choice. Think about the domain of your system and pick a product you are familiar with. Define at least 2 relevant attributes of that product.
Each subclass of Product must override thetojson()method of the superclass.
1.2 The shopping system (3 points)
Next, create the class ShoppingCart , which is a container of products in a shopping session. The ShoppingCart class offers the following methods:
- addProduct(p), to add a productpto the cart;
- removeProduct(p), to remove the productpfrom the cart;
- showSummary(), to display the content of the cart;
- changeProductQuantity(p, q), to change the quantity of productpto the quantityq.
1.3 Doing some shopping (4 points)
Now that the classes are ready, you need some code to let the user do some shopping. The script is prompting the user to type in commands in a while loop; see Listing 1.
1 | print('The program has started.') |
Listing 1:Structure of the main while loop. The loop can terminate only when the users types the
command T
The script should support the following commands:
- A - allows the user to add a product to the cart (see the example in Listing 2);
- R - allows the user to remove an existing product from the shopping chart (see Listing 3);
- S - the script prints out a summary of the cart, along the lines of the example in Listing 4;
- Q - the user can change the quantity of a product already present in the cart;
- E - the script generates a summary of the cart asJSON, printed to the console. TheJSON
output is an array ofJSONproducts (see the example in Listing 5); - C - at any moment, allows the user to interrupt the shopping session. The program will print
out the following message:“The current operation is cancelled”. Then, the user should be
able to continue with a new command. - T - the script terminates (exiting the while loop);
- H - a request for help from the user. The commands that the program recognises are printed
out to the console (see in Listing 6);
Any other command should print out the following message:“Command not recognised. Please try
again”.
Insert the next command: A
Adding a new product:
Insert its type: Clothing
Insert its name: Gloves
Insert its price (£): 23.
Insert its quantity: 1
Insert its brand: Moret and Spark
Insert its EAN code: 1234567891234
Insert its size: XL
Insert its material: Sink
The product Gloves has been added to the cart.
The cart contains 4 products.
1 | Listing 2:Adding a new product to the cart |
Insert the next command: R
What is the name of the product you want to remove? Shoes
‘Shoes’ is not in the cart.
Insert the next command: R
What is the name of the product you want to remove? Slippers
‘Slippers’ successfully removed from the cart.
The cart contains 3 products.
1 | Listing 3:Removing a product from the cart |
Insert the next command: S
This is the total of the expenses:
1 - Hat =£ 15
2 - 6 Eggs =£2.
3 - 2 Orange juice =£ 5
4 - “Gloves” =£23.
Total =£45.
1 | Listing 4:The summary of an ongoing shopping session |
Insert the next command: E
[{ “type”: \clothing”, \name”: \Hat”,…
1 | Listing 5:JSONexport of the cart content |
Insert the next command: H
The program supports the following commands:
[A] - Add a new product to the cart
[R] - Remove a product from the cart
[S] - Print a summary of the cart
[Q] - Change the quantity of a product
[E] - Export a JSON version of the cart
[C] - Cancel the current operation
[T] - Terminate the program
[H] - List the supported commands
1 | Listing 6:User help request |
2 Activity 2 (10 points)
In Week 4 , we talked about a series of advanced data types such as trees. In particular, we
mentioned binary search trees (or BST), which are trees with the following property: for any
parent node, its data value must be higher than the data from any child node to its left, and lower
than the data of any child node to its right. Creating and maintaining such a tree comes with a cost,
as every time new nodes are added, removed or changed, the developer needs to ensure that this
key property still holds. In return, the tree is fast to search through - finding a value in it takesO(log
n). For largenthis should be faster than a regular linked list.
In this activity, you will implement a binary search tree and study its algorithmic complexity. You will
also create a linked list and compare the complexities of both structures.
2.1 The binary search tree class (1 point)
To begin with, create a Python filebinarysearchtree.pyand implement a TreeNode class. The
class should include attributes and methods to store and manipulate:
- A data value (or cargo) of the node;
- A link to its left child;
- A link to its right child;
- A way to print its value;
Now, create a classBinarySearchTree. It should have:
- The root of the tree as an attribute;
- An optional limit in size;
- A methodisempty()to check if the tree is empty;
- A methodisfull()to check if the tree is full;
2.2 Operations on BSTs (2 points)
Expand your tree with operations. You should implement the following methods:
- A methodsearch(), which receives a data value and searches the tree for it, returning a
Boolean indicating whether or not the item is there. - A methodinsert(), which receives a data value and inserts it as a new node in the tree. The
tree may include duplicate values. - A methoddelete(), which deletes a value from the tree, and mends the tree back together
in case it breaks, keeping the tree’s binary search property intact. - A methodtraverse(), which returns the values stored across the nodes of the tree in as-
cending order. - A methodprinttree()to print a visual representation of the tree. Think about how you
would prefer to visualise a tree if you could only useprint()calls; it doesn’t need to be thestr method.
2.3 Random trees’ simulation (2 points)
You should now have a functioning binary search tree. It is time to study the time complexity of its
search. To achieve this, do the following:
- Create a new script, calledcomplexity.pyand import your BinarySearchTree class;
- Incomplexity.py, create a functionrandomtree(n), which takes an inputnand generates
a tree of sizenby populating it withnrandom integers from 1 to 1000 ; - Then create a listXof equally spaced numbers from 5 to 100 on step size 5 (so, 5 , 10 , 15 etc.)
- For each valuesinX, generate 1000 random trees of sizes, and search them for the value
42 , storing the average time spent by thesearchcall into a listY;
Notes :
- Thetimemodule might be useful here.
- In case your laptop does not have the capacity to run a simulation of this size, reduce ei-
ther the size of the trees or the size ofXor both. In this case, declare two constants in
complexity.py, one calledT REESIZEand the other one calledN U M BEROFTREES
where you specify the numbers that work for you.
2.4 Complexity analysis for BSTs (2 points)
We can now see how the time spent to search in a BST varies according to its sizen. Incomplex-
ity.py, add more code to:
- PlotXagainstY. Xstores links to the trees of different sizes. Y stores the times spent
searching. Use the code in Listing 7 for that. - Add a comment incomplexity.pythat describes in your own words the complexity of BST
search based on the shape of the graph you plotted. Hint : relationships could be e.g. linear,
sub-linear, exponential, quadratic, logarithmic etc as explained in the lectures in Week 4. Start
your comment with the wordsComplexity analysis X vs Y”so that we can find it easily in your
code. - Create a new listY2with estimates of average search timetunder an ideal linear relationship
to the size of the trees inX. Hint : In a linear relationship,t=c∗n+b, wherecandbare
constants. Using the search timetforn= 5andn= 10fromY, you can findcandband
estimate the other values oftinY 2. - In the same way, create a third listY 3 with estimates of average search time under an ideal
logarithmic relationship to the size of the trees inX. Hint : In a logarithmic relationship,
t=c∗log(n) +b, wherecandbare constants. Using the search timetforn= 5andn= 10
fromY, you can findcandband estimate the other values oftinY 3. Uselog(n) =log 2 (n). - Plot the three curves (XagainstY,Y 2 andY 3 ) using the code found in Listing 8.
- Add another comment incomplexity.pythat describes in your words how the initial graph
compares to an ideal linear or logarithmic complexity. What could be the reason wny yourY
line does not follow exactly the same line as e.g.Y 3? Is there any issue with our class Binary
Search Tree or with the way we created tree objects that we’d need to fix so thatYgets closer
1 | to an ideal complexity? Start your comment with the wordsComplexity analysis X vs Y, Y |
1 | Note : Thetimemodule’stime()function records time as seen by the processor. Try not to run |
1 import matplotlib.pyplot as plt
2 plt.plot(X, Y)
3 plt.xlabel(‘Size of trees’)
4 plt.ylabel(‘Search time’)
5 plt.ticklabel_format(axis=’both’, style=’sci’, scilimits=(0,0))
6 plt.show()
1 | Listing 7:How to plot a simpleXvsYgraph, whereXandYare number lists of the same length |
1 import matplotlib.pyplot as plt
2 plt.plot(X, Y)
3 plt.plot(X, Y2)
4 plt.plot(X, Y3)
5 plt.legend([‘BST’,’Linear’,’Logarithmic’])
6 plt.xlabel(‘Size of trees’)
7 plt.ylabel(‘Search time’)
8 plt.ticklabel_format(axis=’both’, style=’sci’, scilimits=(0,0))
9 plt.show()
1 | Listing 8:How to plot a multi-line graph with line labels |
2.5 The linked list class (1 point)
1 | For a better understanding of the importance of complexity, let us now compare BSTs to standard |
- A data value (or cargo) stored in the node;
- A link to its next node;
- A way to print the data value in the node;
Now, create a class LinkedList. It should have:
- The first node as an attribute;
- An optional limit in size;
- A methodisempty()to verify if the list is empty;
- A methodisfull()to verify if the list is full;
- A method str to print a visual representation of the list.
2.6 Operations on linked lists (1 point)
Expand your LinkedList with proper operations. You should implement the following methods:
- A methodsearch(), which receives a data value and searches the list for it, returning a
Boolean indicating whether or not the item is there. - A methodinsert(), which receives a data value as a parameter and inserts it as a new node
in the list. Duplicate values are allowed. - A methoddelete(), which receives a data value as a parameter and deletes its first occur-
rence from the list. - A methodtraverse(), which returns the values of the nodes in the linked order.
2.7 Comparing the two data types (1 point)
Repeat the steps from Section 2.3 using linked lists. Use the same listXto generate the lists and
save the average search times in a new listY 4.
Then analyse the time complexity. To do so:
- PlotXvsY 4 by executing the code found in Listing 9.
- Add a new comment in your code to describe how linked lists compare to BSTs. Start your
comment with the wordsComplexity analysis X vs Y, Y2, Y3 and Y4”so that we can find it
easily in your code.
1 import matplotlib.pyplot as plt
2 plt.plot(X, Y)
3 plt.plot(X, Y2)
4 plt.plot(X, Y3)
5 plt.plot(X, Y4)
6 plt.legend([‘BST’,’Linear’,’Logarithmic’,’LL’])
7 plt.xlabel(‘Size of trees’)
8 plt.ylabel(‘Search time’)
9 plt.ticklabel_format(axis=’both’, style=’sci’, scilimits=(0,0))
10 plt.show()
1 | Listing 9:Adding a new plot to the mix |