TMA02 – Python

**Question 1 (30 marks)**

*The **TMA questions and guidance** page has all the files mentioned in this question.*

Imagine you are part of a team developing a crossword app that helps the user, e.g. by listing all 3-letter English words ending in ‘t’. For such a feature, it is better to store dictionary words sorted by length instead of alphabetically.

The team lead asks you to write a function that takes a non-empty list of non-empty strings (e.g. [‘hi’, ‘this’, ‘is’, ‘Jane’]) and returns a new list with the same strings in increasing length. Words of the same length must appear in the original order (that’s called a **stable** sorting). For this example, the result is [‘hi’, ‘is’, ‘this’, ‘Jane’]. You are asked to use this **bucket sort** algorithm:

- Compute the maximum length
*L*of the words in the list. For the example,*L*is 4. - Create a list with
*L*buckets, each bucket being initially the empty list. For the example, this step creates [[], [], [], []]. - Put each word in the right bucket: words of length 1 (like ‘a’ and ‘I’) go in the first bucket, words of length 2 go in the second bucket, etc. For the example, the result of this step is [[], [‘hi’, ‘is’], [], [‘this’, ‘Jane’]]. The first and third buckets remain empty.
- Compute the sorted list by first adding the words in bucket 1, then those in bucket 2, etc. For the example, the result of this step is [‘hi’, ‘is’, ‘this’, ‘Jane’].

*This part is about Python’s list operations (**Python activity 1.5**).*

Step 2 has been implemented for you in file TMA02_Q1.py. Implement the rest of the algorithm by replacing the pass statements by your code (about 12 lines). You can only use the Python list operations given in the Companion’s appendix. Copy the completed bucket_sorted function into your Solution Document.

We advise you add one step at a time and test it with file TMA02_Q1_tests.py. You should add your own tests. Your tutor will use a **different** test file, so make sure your code solves the general problem, not just the problem instances (tests) we provide. **(12 marks)**

- Explain how your implementation guarantees the sort to be stable, i.e. keeps the original order of words of the same length.
**(2 marks)** *This part is about algorithmic complexity (**Unit 2 Section 4.1**and the**Companion**).*

State and explain the worst-case Big-O complexity of your bucket sort implementation in terms of the number of words *n*, i.e. the length of the input list. The unit of computation is the assignment. Use the complexities in the Companion’s appendix.

Explain also why the problem size is just the number of words and not dependent on the number of buckets *L* too. **(5 marks)**

*This part is about heap sort (**Unit 3 Section 4.2**) and the professional skill of comparing alternative approaches.*

A team member suggests it would be easier to just take heap sort and in the percolation process replace every comparison of strings in the heap by the comparison of their lengths, i.e. x < y becomes len(x) <len(y). What is the worst-case complexity of the modified algorithm? Explain whether you would follow the team member’s suggestion. **(3 marks)**

*This part is about recursion (**Unit 3 Section 3.2**and**Companion**section ‘Recursion’).*

Rewrite function empty_buckets in file TMA02_Q1.py so that it is recursive. Add comments to indicate each of the base case and the reduction, recursive and inductive steps, as done in the Companion. Copy your completed function into your Solution Document. Use again file TMA02_Q1_tests.py to check your code. Don’t forget to add your own tests. **(8 marks)**

Python Question 2

Python Question 2

**(20 marks)**

*The **TMA questions and guidance** page has all the files mentioned in this question.*

A **bag **is an unordered collection that may contain duplicate items. File TMA02_Q2a.pyimplements a Bag ADT with the following operations:

- Bag()creates a new empty bag
- add(self, item)adds one copy of item to the bag self
- count(self, item)returns how often item occurs in bag self
- size(self) counts the total number of copies in bag self
- clear(self, item)removes all copies of item from bag self
- ordered(self)returns the content of bag self as a sequence of pairs (count, item), in decreasing order of count

*This part is about Python’s list operations (**Python activity 1.5** and **Companion**) and relates to the professional skill of reengineering code to improve its performance. *

*This part is about Python’s list operations (**Python activity 1.5**and**Companion**) and relates to the professional skill of reengineering code to improve its performance.*

File TMA02_Q2a.py represents a bag as an unsorted list, with as many copies of each item as there are in the bag. Completely rewrite the clear method, so that it has O(n)worst-case complexity. Only use the list operations in the Companion’s appendix. Check your code with TMA02_Q2_tests.py. We recommend you add your own tests. Your tutor will use a **different** test file. Copy the rewritten method to your Solution Document. Briefly justify why it has worst-case O(n) complexity. **(6 marks)**

*This part is about Python dictionaries (**Companion**section ‘Hash tables’).*

Reimplement the Bag ADT using Python dictionaries, by completing file TMA02_Q2b.py. You can only use the operations listed in the Companion’s appendix. Change the import statement in TMA02_Q2_tests.py to test your code. Copy the methods into your Solution Document. **(10 marks)**

*This part relates to the professional skill of choosing appropriate data structures.*

File TMA02_Q2c.py creates a bag with the words in Shakespeare’s *Hamlet* and prints the 50 most frequent words and their count. (Doing this for several texts can indicate the most common English words, e.g. to teach beginners.)

Running the code, you will note that the most common words are mainly stop words(articles, prepositions, etc.) and the play’s characters (Hamlet, Horatio, etc.). After reading the code, describe briefly how you would change it, and what data structure you would use, to prevent stop words and character names from being added to the bag. Justify your approach. Assume there are 18 character names in the play (lines 70-77 of hamlet.txt) and about 120 stop words (taken from this list). You can use any data structure from Units 1–5 of M269 and its operations. Assume the complexities listed in the Companion’s appendix. Note you are only asked to describe your changes,** not** to write or submit any code. **(4 marks)**

**Question 3 (25 marks)**

*This question is about graphs and their algorithms (**Unit 5 Section 2** and the **Companion**). *

- Consider an ADT for undirected graphs, named UGraph, that includes these operations:
- nodes, which returns a sequence of all nodes in the graph, in no particular order
- has_edge, which takes two nodes and returns true only if there is an edge between those nodes
- edges, which returns a sequence of node-node pairs (tuples), in no particular order. Each edge only appears once in the returned sequence, i.e. if the pair (node1, node2) is in the sequence, the pair (node2, node1) is not.

How each node is represented is irrelevant. Because the graph is undirected, has_edge(node1, node2) and has_edge(node2, node1) return the same. You can assume the graph is connected and has no edge between a node and itself.

*This part is about algorithmic complexity (**Unit 2 Section 4.1**and**Companion**)*

The following stand-alone Python function checks if an undirected graph is complete, i.e. if each node is connected to every other node. It assumes the ADT is implemented as a Python class.

- defis_complete(graph):
- nodes = graph.nodes()
- for node1 in nodes:
- for node2 in nodes:
- edge_exists = graph.has_edge(node1, node2)
- if node1 != node2 and not edge_exists:
- return False
- return True
- Assume that nodeshas complexity O(n), where
*n*is the number of nodes, and graph.has_edge has complexity O(1). State and justify a best-case scenario and a worst-case scenario for the above function. State and explain the Big-O complexity for each scenario. Assume the basic computational step is the assignment. State explicitly any other assumptions you make.**(7 marks)** *This part is about ADT specifications (**Unit 2 Section 3.6**and**Companion**).*

In graph theory, the number of nodes in a graph is called the *order *of the graph. The term ‘order’ is unrelated to sorting. Specify the problem of computing the order, as a UGraph operation.

**Creator / Inspector / Modifier**(delete as appropriate): order**Inputs:****Preconditions:****Outputs:****Postconditions:**- Justify the kind of operation.
**(5 marks)** *This part is about initial insights (**Unit 2 Section 2.3**and**Companion**).*

Give your initial insight for an algorithm that solves the order problem. You can use any of the M269 data structures listed in the Companion’s appendix but of the ADT operations given above you may only use edges. **(4 marks)**

- A city council is planning the city’s bus routes. It has decided which places will have a bus stop (schools, cinemas, hospital, etc.). Each bus route will start from the train station, visit a number of bus stops, and then return to the station, visiting the same bus stops in reverse order. Each bus stop has to be served by at least one bus route. The council wants to minimise the total amount of time that
*all*buses are on the road when following their routes.

This problem can be solved by first defining a graph and then applying to it a graph algorithm to obtain the answer.

*This part is about modelling with graphs (**Companion**).*

Define a graph that is suitable for this problem, by addressing the following questions. Justify your answers. Explicitly state any assumptions you make.

- What do the nodes and edges represent?
- Are edges directed? If so, what does the direction represent?
- Are edges weighted? If so, what do the weights represent?
- Is the graph acyclic or cyclic?
- Is the graph dense or sparse?
**(6 marks)** *This question is about graph algorithms (**Unit 5 Section 2**and the**Companion**).*

What algorithm(s) would you adopt or adapt to efficiently compute the bus routes according to the council’s criterion? You do not need to explain any algorithm given in M269, only how it could be used or modified to solve the problem. State explicitly any assumptions you make. **(3 marks)**