# Assignment 1: Auto Correct

Welcome to the first assignment of Course 2. This assignment will give you a chance to brush up on your python and probability skills. In doing so, you will implement an auto-correct system that is very effective and useful.

## 0. Overview

You use autocorrect every day on your cell phone and computer. In this assignment, you will explore what really goes on behind the scenes. Of course, the model you are about to implement is not identical to the one used in your phone, but it is still quite good.

By completing this assignment you will learn how to:

• Get a word count given a corpus
• Get a word probability in the corpus
• Manipulate strings
• Filter strings
• Implement Minimum edit distance to compare strings and to help find the optimal path for the edits.
• Understand how dynamic programming works

Similar systems are used everywhere.

• For example, if you type in the word “I am lerningg”, chances are very high that you meant to write “learning”, as shown in Figure 1. Figure 1

#### 0.1 Edit Distance

In this assignment, you will implement models that correct words that are 1 and 2 edit distances away.

• We say two words are n edit distance away from each other when we need n edits to change one word into another.

An edit could consist of one of the following options:

• Delete (remove a letter): ‘hat’ => ‘at, ha, ht’
• Switch (swap 2 adjacent letters): ‘eta’ => ‘eat, tea,…’
• Replace (change 1 letter to another): ‘jat’ => ‘hat, rat, cat, mat, …’
• Insert (add a letter): ‘te’ => ‘the, ten, ate, …’

You will be using the four methods above to implement an Auto-correct.

• To do so, you will need to compute probabilities that a certain word is correct given an input.

This auto-correct you are about to implement was first created by Peter Norvig in 2007.

The goal of our spell check model is to compute the following probability:

The equation above is Bayes Rule.

• Equation 1 says that the probability of a word being correct $P(c|w)$is equal to the probability of having a certain word $w$, given that it is correct $P(w|c)$, multiplied by the probability of being correct in general $P(C)$ divided by the probability of that word $w$ appearing $P(w)$ in general.
• To compute equation 1, you will first import a data set and then create all the probabilities that you need using that data set.

# Part 1: Data Preprocessing

As in any other machine learning task, the first thing you have to do is process your data set.

• Many courses load in pre-processed data for you.
• However, in the real world, when you build these NLP systems, you load the datasets and process them.
• So let’s get some real world practice in pre-processing the data!

Your first task is to read in a file called ‘shakespeare.txt’ which is found in your file directory. To look at this file you can go to File ==> Open.

### Exercise 1

Implement the function process_data which

1) Reads in a corpus (text file)

2) Changes everything to lowercase

3) Returns a list of words.

#### Options and Hints

• If you would like more of a real-life practice, don’t open the ‘Hints’ below (yet) and try searching the web to derive your answer.
• If you want a little help, click on the green “General Hints” section by clicking on it with your mouse.
• If you get stuck or are not getting the expected results, click on the green ‘Detailed Hints’ section to get hints for each step that you’ll take to complete this function.

General Hints

General Hints to get started

Detailed Hints

Detailed hints if you're stuck

• Use 'with' syntax to read a file
• Choose whether to use either str.lower() or str.lowercase(). What is the difference?
• Use re.findall(pattern, string)
• Look for the "Raw String Notation" section in the Python 're' documentation to understand the difference between r'\W', r'\W' and '\\W'.
• For the pattern, decide between using '\s', '\w', '\s+' or '\w+'. What do you think are the differences?

Note, in the following cell, ‘words’ is converted to a python set. This eliminates any duplicate entries.

The first ten words in the text are:
['o', 'for', 'a', 'muse', 'of', 'fire', 'that', 'would', 'ascend', 'the']
There are 6116 unique words in the vocabulary.


### Exercise 2

Implement a get_count function that returns a dictionary

• The dictionary’s keys are words
• The value for each word is the number of times that word appears in the corpus.

For example, given the following sentence: “I am happy because I am learning”, your dictionary should return the following:

 Key Value I 2 am 2 happy 1 because 1 learning 1

Instructions:
Implement a get_count which returns a dictionary where the key is a word and the value is the number of times the word appears in the list.

Hints

• Try implementing this using a for loop and a regular dictionary. This may be good practice for similar coding interview questions
• You can also use defaultdict instead of a regualr dictionary, along with the for loop
• Otherwise, to skip using a for loop, you can use Python's Counter class

There are 6116 key values pairs
The count for the word 'thee' is 240


### Exercise 3

Given the dictionary of word counts, compute the probability that each word will appear if randomly selected from the corpus of words.

where

$C(w_i)$ is the total number of times $w_i$ appears in the corpus.

$M$ is the total number of words in the corpus.

For example, the probability of the word ‘am’ in the sentence ‘I am happy because I am learning’ is:

Instructions: Implement get_probs function which gives you the probability
that a word occurs in a sample. This returns a dictionary where the keys are words, and the value for each word is its probability in the corpus of words.

Hints

• Use dictionary.values()
• Use sum()
• The cardinality (number of words in the corpus should be equal to len(word_l). You will calculate this same number, but using the word count dictionary.
If you're using a for loop:
• Use dictionary.keys()
If you're using a dictionary comprehension:
• Use dictionary.items()

Length of probs is 6116
P('thee') is 0.0045


# Part 2: String Manipulations

Now, that you have computed $P(w_i)$ for all the words in the corpus, you will write a few functions to manipulate strings so that you can edit the erroneous strings and return the right spellings of the words. In this section, you will implement four functions:

• delete_letter: given a word, it returns all the possible strings that have one character removed.
• switch_letter: given a word, it returns all the possible strings that have two adjacent letters switched.
• replace_letter: given a word, it returns all the possible strings that have one character replaced by another different letter.
• insert_letter: given a word, it returns all the possible strings that have an additional character inserted.

#### List comprehensions

String and list manipulation in python will often make use of a python feature called list comprehensions. The routines below will be described as using list comprehensions, but if you would rather implement them in another way, you are free to do so as long as the result is the same. Further, the following section will provide detailed instructions on how to use list comprehensions and how to implement the desired functions. If you are a python expert, feel free to skip the python hints and move to implementing the routines directly.

Python List Comprehensions embed a looping structure inside of a list declaration, collapsing many lines of code into a single line. If you are not familiar with them, they seem slightly out of order relative to for loops. Figure 2

The diagram above shows that the components of a list comprehension are the same components you would find in a typical for loop that appends to a list, but in a different order. With that in mind, we’ll continue the specifics of this assignment. We will be very descriptive for the first function, deletes(), and less so in later functions as you become familiar with list comprehensions.

### Exercise 4

Instructions for delete_letter(): Implement a delete_letter() function that, given a word, returns a list of strings with one character deleted.

For example, given the word nice, it would return the set: {‘ice’, ‘nce’, ‘nic’, ‘nie’}.

Step 1: Create a list of ‘splits’. This is all the ways you can split a word into Left and Right: For example,
‘nice is split into : [('', 'nice'), ('n', 'ice'), ('ni', 'ce'), ('nic', 'e'), ('nice', '')]
This is common to all four functions (delete, replace, switch, insert). Figure 3

Step 2: This is specific to delete_letter. Here, we are generating all words that result from deleting one character.
This can be done in a single line with a list comprehension. You can makes use of this type of syntax:
[f(a,b) for a, b in splits if condition]

For our ‘nice’ example you get:
[‘ice’, ‘nce’, ‘nie’, ‘nic’] Figure 4

#### Levels of assistance

Try this exercise with these levels of assistance.

• We hope that this will make it both a meaningful experience but also not a frustrating experience.
• Start with level 1, then move onto level 2, and 3 as needed.

• Level 1. Try to think this through and implement this yourself.
• Level 2. Click on the “Level 2 Hints” section for some hints to get started.
• Level 3. If you would prefer more guidance, please click on the “Level 3 Hints” cell for step by step instructions.
• If you are still stuck, look at the images in the “list comprehensions” section above.

Level 2 Hints

Level 3 Hints

• splits: Use array slicing, like my_str[0:2], to separate a string into two pieces.
• Do this in a loop or list comprehension, so that you have a list of tuples.
• For example, "cake" can get split into "ca" and "ke". They're stored in a tuple ("ca","ke"), and the tuple is appended to a list. We'll refer to these as L and R, so the tuple is (L,R)
• When choosing the range for your loop, if you input the word "cans" and generate the tuple ('cans',''), make sure to include an if statement to check the length of that right-side string (R) in the tuple (L,R)
• deletes: Go through the list of tuples and combine the two strings together. You can use the + operator to combine two strings
• When combining the tuples, make sure that you leave out a middle character.
• Use array slicing to leave out the first character of the right substring.

input word cans,
split_l = [('', 'cans'), ('c', 'ans'), ('ca', 'ns'), ('can', 's'), ('cans', '')],
delete_l = ['ans', 'cns', 'cas', 'can']


#### Note 1

You might get a slightly different result with split_l.

• Notice how it has the extra tuple ('cans', '').
• This will be fine as long as you have checked the size of the right-side substring in tuple (L,R).
• Can you explain why this will give you the same result for the list of deletion strings (delete_l)?

#### Note 2

If you end up getting the same word as your input word, like this:

• Check how you set the range.
• See if you check the length of the string on the right-side of the split.
Number of outputs of delete_letter('at') is 2


### Exercise 5

Instructions for switch_letter(): Now implement a function that switches two letters in a word. It takes in a word and returns a list of all the possible switches of two letters that are adjacent to each other.

• For example, given the word ‘eta’, it returns {‘eat’, ‘tea’}, but does not return ‘ate’.

Step 1: is the same as in delete_letter()
Step 2: A list comprehension or for loop which forms strings by swapping adjacent letters. This is of the form:
[f(L,R) for L, R in splits if condition] where ‘condition’ will test the length of R in a given iteration. See below. Figure 5

#### Levels of difficulty

Try this exercise with these levels of difficulty.

• Level 1. Try to think this through and implement this yourself.
• Level 2. Click on the “Level 2 Hints” section for some hints to get started.
• Level 3. If you would prefer more guidance, please click on the “Level 3 Hints” cell for step by step instructions.

Level 2 Hints

Level 3 Hints

• splits: Use array slicing, like my_str[0:2], to separate a string into two pieces.
• Splitting is the same as for delete_letter
• To perform the switch, go through the list of tuples and combine four strings together. You can use the + operator to combine strings
• The four strings will be the left substring from the split tuple, followed by the first (index 1) character of the right substring, then the zero-th character (index 0) of the right substring, and then the remaining part of the right substring.
• Unlike delete_letter, you will want to check that your right substring is at least a minimum length. To see why, review the previous hint bullet point (directly before this one).

Input word = eta
split_l = [('', 'eta'), ('e', 'ta'), ('et', 'a'), ('eta', '')]
switch_l = ['tea', 'eat']


#### Note 1

You may get this:

• Notice how it has the extra tuple ('eta', '').
• This is also correct.
• Can you think of why this is the case?

#### Note 2

If you get an error

• Please see if you have checked the length of the strings when switching characters.
Number of outputs of switch_letter('at') is 1


### Exercise 6

Instructions for replace_letter(): Now implement a function that takes in a word and returns a list of strings with one replaced letter from the original word.

Step 1: is the same as in delete_letter()

Step 2: A list comprehension or for loop which form strings by replacing letters. This can be of the form:
[f(a,b,c) for a, b in splits if condition for c in string] Note the use of the second for loop.
It is expected in this routine that one or more of the replacements will include the original word. For example, replacing the first letter of ‘ear’ with ‘e’ will return ‘ear’.

Step 3: Remove the original input letter from the output.

Hints

• To remove a word from a list, first store its contents inside a set()
• Use set.discard('the_word') to remove a word in a set (if the word does not exist in the set, then it will not throw a KeyError. Using set.remove('the_word') throws a KeyError if the word does not exist in the set.

Input word = can
split_l = [('', 'can'), ('c', 'an'), ('ca', 'n'), ('can', '')]
replace_l ['aan', 'ban', 'caa', 'cab', 'cac', 'cad', 'cae', 'caf', 'cag', 'cah', 'cai', 'caj', 'cak', 'cal', 'cam', 'cao', 'cap', 'caq', 'car', 'cas', 'cat', 'cau', 'cav', 'caw', 'cax', 'cay', 'caz', 'cbn', 'ccn', 'cdn', 'cen', 'cfn', 'cgn', 'chn', 'cin', 'cjn', 'ckn', 'cln', 'cmn', 'cnn', 'con', 'cpn', 'cqn', 'crn', 'csn', 'ctn', 'cun', 'cvn', 'cwn', 'cxn', 'cyn', 'czn', 'dan', 'ean', 'fan', 'gan', 'han', 'ian', 'jan', 'kan', 'lan', 'man', 'nan', 'oan', 'pan', 'qan', 'ran', 'san', 'tan', 'uan', 'van', 'wan', 'xan', 'yan', 'zan']


#### Expected Output**:

• Note how the input word ‘can’ should not be one of the output words.

#### Note 1

If you get something like this:

• Notice how split_l has an extra tuple ('can', ''), but the output is still the same, so this is okay.

#### Note 2

If you get something like this:

• Notice how there are strings that are 1 letter longer than the original word, such as cana.
• Please check for the case when there is an empty string '', and if so, do not use that empty string when setting replace_l.
Number of outputs of switch_letter('at') is 1


### Exercise 7

Instructions for insert_letter(): Now implement a function that takes in a word and returns a list with a letter inserted at every offset.

Step 1: is the same as in delete_letter()

Step 2: This can be a list comprehension of the form:
[f(a,b,c) for a, b in splits if condition for c in string]

Input word at
split_l = [('', 'at'), ('a', 't'), ('at', '')]
insert_l = ['aat', 'bat', 'cat', 'dat', 'eat', 'fat', 'gat', 'hat', 'iat', 'jat', 'kat', 'lat', 'mat', 'nat', 'oat', 'pat', 'qat', 'rat', 'sat', 'tat', 'uat', 'vat', 'wat', 'xat', 'yat', 'zat', 'aat', 'abt', 'act', 'adt', 'aet', 'aft', 'agt', 'aht', 'ait', 'ajt', 'akt', 'alt', 'amt', 'ant', 'aot', 'apt', 'aqt', 'art', 'ast', 'att', 'aut', 'avt', 'awt', 'axt', 'ayt', 'azt', 'ata', 'atb', 'atc', 'atd', 'ate', 'atf', 'atg', 'ath', 'ati', 'atj', 'atk', 'atl', 'atm', 'atn', 'ato', 'atp', 'atq', 'atr', 'ats', 'att', 'atu', 'atv', 'atw', 'atx', 'aty', 'atz']
Number of strings output by insert_letter('at') is 78


#### Note 1

If you get a split_l like this:

• Notice that split_l is missing the extra tuple (‘at’, ‘’). For insertion, we actually WANT this tuple.
• The function is not creating all the desired output strings.
• Check the range that you use for the for loop.

#### Note 2

If you see this:

• Even though you may have fixed the split_l so that it contains the tuple ('at', ''), notice that you’re still missing some output strings.
• Notice that it’s missing strings such as ‘ata’, ‘atb’, ‘atc’ all the way to ‘atz’.
• To fix this, make sure that when you set insert_l, you allow the use of the empty string ''.
Number of outputs of insert_letter('at') is 78


# Part 3: Combining the edits

Now that you have implemented the string manipulations, you will create two functions that, given a string, will return all the possible single and double edits on that string. These will be edit_one_letter() and edit_two_letters().

## 3.1 Edit one letter

### Exercise 8

Instructions: Implement the edit_one_letter function to get all the possible edits that are one edit away from a word. The edits consist of the replace, insert, delete, and optionally the switch operation. You should use the previous functions you have already implemented to complete this function. The ‘switch’ function is a less common edit function, so its use will be selected by an “allow_switches” input argument.

Note that those functions return lists while this function should return a python set. Utilizing a set eliminates any duplicate entries.

Hints

• Each of the functions returns a list. You can combine lists using the + operator.
• To get unique strings (avoid duplicates), you can use the set() function.

input word at
edit_one_l
['a', 'aa', 'aat', 'ab', 'abt', 'ac', 'act', 'ad', 'adt', 'ae', 'aet', 'af', 'aft', 'ag', 'agt', 'ah', 'aht', 'ai', 'ait', 'aj', 'ajt', 'ak', 'akt', 'al', 'alt', 'am', 'amt', 'an', 'ant', 'ao', 'aot', 'ap', 'apt', 'aq', 'aqt', 'ar', 'art', 'as', 'ast', 'ata', 'atb', 'atc', 'atd', 'ate', 'atf', 'atg', 'ath', 'ati', 'atj', 'atk', 'atl', 'atm', 'atn', 'ato', 'atp', 'atq', 'atr', 'ats', 'att', 'atu', 'atv', 'atw', 'atx', 'aty', 'atz', 'au', 'aut', 'av', 'avt', 'aw', 'awt', 'ax', 'axt', 'ay', 'ayt', 'az', 'azt', 'bat', 'bt', 'cat', 'ct', 'dat', 'dt', 'eat', 'et', 'fat', 'ft', 'gat', 'gt', 'hat', 'ht', 'iat', 'it', 'jat', 'jt', 'kat', 'kt', 'lat', 'lt', 'mat', 'mt', 'nat', 'nt', 'oat', 'ot', 'pat', 'pt', 'qat', 'qt', 'rat', 'rt', 'sat', 'st', 't', 'ta', 'tat', 'tt', 'uat', 'ut', 'vat', 'vt', 'wat', 'wt', 'xat', 'xt', 'yat', 'yt', 'zat', 'zt']

The type of the returned object should be a set <class 'set'>
Number of outputs from edit_one_letter('at') is 129


## Part 3.2 Edit two letters

### Exercise 9

Now you can generalize this to implement to get two edits on a word. To do so, you would have to get all the possible edits on a single word and then for each modified word, you would have to modify it again.

Instructions: Implement the edit_two_letters function that returns a set of words that are two edits away. Note that creating additional edits based on the edit_one_letter function may ‘restore’ some one_edits to zero or one edits. That is allowed here. This accounted for in get_corrections.

Hints

• You will likely want to take the union of two sets.
• You can either use set.union() or use the '|' (or operator) to union two sets
• See the documentation Python sets for examples of using operators or functions of the Python set.

Number of strings with edit distance of two: 2654
First 10 strings ['', 'a', 'aa', 'aaa', 'aab', 'aac', 'aad', 'aae', 'aaf', 'aag']
Last 10 strings ['zv', 'zva', 'zw', 'zwa', 'zx', 'zxa', 'zy', 'zya', 'zz', 'zza']
The data type of the returned object should be a set <class 'set'>
Number of strings that are 2 edit distances from 'at' is 7154


## Part 3-3: suggest spelling suggestions

Now you will use your edit_two_letters function to get a set of all the possible 2 edits on your word. You will then use those strings to get the most probable word you meant to type aka your typing suggestion.

### Exercise 10

Instructions: Implement get_corrections, which returns a list of zero to n possible suggestion tuples of the form (word, probability_of_word).

Step 1: Generate suggestions for a supplied word: You’ll use the edit functions you have developed. The ‘suggestion algorithm’ should follow this logic:

• If the word is in the vocabulary, suggest the word.
• Otherwise, if there are suggestions from edit_one_letter that are in the vocabulary, use those.
• Otherwise, if there are suggestions from edit_two_letters that are in the vocabulary, use those.
• Otherwise, suggest the input word.*
• The idea is that words generated from fewer edits are more likely than words with more edits.

Note:

• Edits of one or two letters may ‘restore’ strings to either zero or one edit. This algorithm accounts for this by preferentially selecting lower distance edits first.

#### Short circuit

In Python, logical operations such as and and or have two useful properties. They can operate on lists and they have ‘short-circuit’ behavior. Try these:

[]
['a', 'b']
['Most', 'Likely']
['least', 'of', 'all']


The logical or could be used to implement the suggestion algorithm very compactly. Alternately, if/then constructs could be used.

Step 2: Create a ‘best_words’ dictionary where the ‘key’ is a suggestion and the ‘value’ is the probability of that word in your vocabulary. If the word is not in the vocabulary, assign it a probability of 0.

Step 3: Select the n best suggestions. There may be fewer than n.

Hints

• edit_one_letter and edit_two_letters return *python sets*.
• Sets have a handy set.intersection feature
• To find the keys that have the highest values in a dictionary, you can use the Counter dictionary to create a Counter object from a regular dictionary. Then you can use Counter.most_common(n) to get the n most common keys.
• To find the intersection of two sets, you can use set.intersection or the & operator.
• If you are not as familiar with short circuit syntax (as shown above), feel free to use if else statements instead.
• To use an if statement to check of a set is empty, use 'if not x:' syntax

suggestions =  [('dye', 1.865184466743761e-05), ('days', 0.0004103405826836274)]
word 0: days, probability 0.000410
word 1: dye, probability 0.000019
data type of corrections <class 'list'>


# Part 4: Minimum Edit distance

Now that you have implemented your auto-correct, how do you evaluate the similarity between two strings? For example: ‘waht’ and ‘what’

Also how do you efficiently find the shortest path to go from the word, ‘waht’ to the word ‘what’?

You will implement a dynamic programming system that will tell you the minimum number of edits required to convert a string into another string.

### Part 4.1 Dynamic Programming

Dynamic Programming breaks a problem down into subproblems which can be combined to form the final solution. Here, given a string source[0..i] and a string target[0..j], we will compute all the combinations of substrings[i, j] and calculate their edit distance. To do this efficiently, we will use a table to maintain the previously computed substrings and use those to calculate larger substrings.

You have to create a matrix and update each element in the matrix as follows:

\begin{align}
D[0,0] &= 0 \\
D[i,0] &= D[i-1,0] + del_cost(source[i]) \tag{4}\\
D[0,j] &= D[0,j-1] + ins_cost(target[j]) \\
\end{align}

\begin{align}
\\
D[i,j] =min
\begin{cases}
D[i-1,j] + del_cost\\
D[i,j-1] + ins_cost\\
D[i-1,j-1] + \left\{\begin{matrix}
rep_cost; & if src[i]\neq tar[j]\\
0 ; & if src[i]=tar[j]
\end{matrix}\right.
\end{cases}
\tag{5}
\end{align}

So converting the source word play to the target word stay, using an input cost of one, a delete cost of 1, and replace cost of 2 would give you the following table:

 # s t a y # 0 1 2 3 4 p 1 2 3 4 5 l 2 3 4 5 6 a 3 4 5 4 5 y 4 5 6 5 4

The operations used in this algorithm are ‘insert’, ‘delete’, and ‘replace’. These correspond to the functions that you defined earlier: insert_letter(), delete_letter() and replace_letter(). switch_letter() is not used here.

The diagram below describes how to initialize the table. Each entry in D[i,j] represents the minimum cost of converting string source[0:i] to string target[0:j]. The first column is initialized to represent the cumulative cost of deleting the source characters to convert string “EER” to “”. The first row is initialized to represent the cumulative cost of inserting the target characters to convert from “” to “NEAR”. Figure 6 Initializing Distance Matrix

Filling in the remainder of the table utilizes the ‘Per Cell Operations’ in the equation (5) above. Note, the diagram below includes in the table some of the 3 sub-calculations shown in light grey. Only ‘min’ of those operations is stored in the table in the min_edit_distance() function. Figure 7 Filling Distance Matrix

Note that the formula for $D[i,j]$ shown in the image is equivalent to:

\begin{align}
\\
D[i,j] =min
\begin{cases}
D[i-1,j] + del_cost\\
D[i,j-1] + ins_cost\\
D[i-1,j-1] + \left\{\begin{matrix}
rep_cost; & if src[i]\neq tar[j]\\
0 ; & if src[i]=tar[j]
\end{matrix}\right.
\end{cases}
\tag{5}
\end{align}

The variable sub_cost (for substitution cost) is the same as rep_cost; replacement cost. We will stick with the term “replace” whenever possible.

Below are some examples of cells where replacement is used. This also shows the minimum path from the lower right final position where “EER” has been replaced by “NEAR” back to the start. This provides a starting point for the optional ‘backtrace’ algorithm below. Figure 8 Examples Distance Matrix

### Exercise 11

Again, the word “substitution” appears in the figure, but think of this as “replacement”.

Instructions: Implement the function below to get the minimum amount of edits required given a source string and a target string.

Hints

• The range(start, stop, step) function excludes 'stop' from its output
• words

minimum edits:  4

#  s  t  a  y
#  0  1  2  3  4
p  1  2  3  4  5
l  2  3  4  5  6
a  3  4  5  4  5
y  4  5  6  5  4


Expected Results:

minimum edits:  3

#  n  e  a  r
#  0  1  2  3  4
e  1  2  1  2  3
e  2  3  2  3  4
r  3  4  3  4  3


Expected Results

We can now test several of our routines at once:

Expected Results: (empty)

The ‘replace()’ routine utilizes all letters a-z one of which returns the original word.

eer eer 0


Expected Results: eer eer 0
We have to allow single edits here because some two_edits will restore a single edit.

# Submission

Make sure you submit your assignment before you modify anything below

# Part 5: Optional - Backtrace

Once you have computed your matrix using minimum edit distance, how would find the shortest path from the top left corner to the bottom right corner?

Note that you could use backtrace algorithm. Try to find the shortest path given the matrix that your min_edit_distance function returned.

You can use these lecture slides on minimum edit distance by Dan Jurafsky to learn about the algorithm for backtrace.

• Dan Jurafsky - Speech and Language Processing - Textbook
• This auto-correct explanation was first done by Peter Norvig in 2007