# Assignment 4: Chatbot

Welcome to the last assignment of Course 4. Before you get started, we want to congratulate you on getting here. It is your 16th programming assignment in this Specialization and we are very proud of you! In this assignment, you are going to use the Reformer, also known as the efficient Transformer, to generate a dialogue between two bots. You will feed conversations to your model and it will learn how to understand the context of each one. Not only will it learn how to answer questions but it will also know how to ask questions if it needs more info. For example, after a customer asks for a train ticket, the chatbot can ask what time the said customer wants to leave. You can use this concept to automate call centers, hotel receptions, personal trainers, or any type of customer service. By completing this assignment, you will:

• Understand how the Reformer works
• Explore the MultiWoz dataset
• Process the data to feed it into the model
• Generate a dialogue by feeding a question to the model

# Part 1: Exploring the MultiWoz dataset

You will start by exploring the MultiWoz dataset. The dataset you are about to use has more than 10,000 human annotated dialogues and spans multiple domains and topics. Some dialogues include multiple domains and others include single domains. In this section, you will load and explore this dataset, as well as develop a function to extract the dialogues.

Let’s first import the modules we will be using:

Let’s also declare some constants we will be using in the exercises.

Let’s now load the MultiWOZ 2.1 dataset. We have already provided it for you in your workspace. It is in JSON format so we should load it as such:

Let’s see how many dialogues we have in the dictionary. 1 key-value pair is one dialogue so we can just get the dictionary’s length.

The number of dialogues is: 10438


The dialogues are composed of multiple files and the filenames are used as keys in our dictionary. Those with multi-domain dialogues have “MUL” in their filenames while single domain dialogues have either “SNG” or “WOZ”.

['SNG01856.json', 'SNG0129.json', 'PMUL1635.json', 'MUL2168.json', 'SNG0073.json', 'SNG01445.json', 'MUL2105.json']


As you can see from the cells above, there are 10,438 conversations, each in its own file. You will train your model on all those conversations. Each file is also loaded into a dictionary and each has two keys which are the following:

dict_keys(['goal', 'log'])


The goal also points to a dictionary and it contains several keys pertaining to the objectives of the conversation. For example below, we can see that the conversation will be about booking a taxi.

{'taxi': {'info': {'leaveAt': '17:15',
'destination': 'pizza hut fen ditton',
'departure': "saint john's college"},
'reqt': ['car type', 'phone'],
'fail_info': {}},
'police': {},
'hospital': {},
'hotel': {},
'attraction': {},
'train': {},
'message': ["You want to book a <span class='emphasis'>taxi</span>. The taxi should go to <span class='emphasis'>pizza hut fen ditton</span> and should depart from <span class='emphasis'>saint john's college</span>",
"The taxi should <span class='emphasis'>leave after 17:15</span>",
"Make sure you get <span class='emphasis'>car type</span> and <span class='emphasis'>contact number</span>"],
'restaurant': {}}


The log on the other hand contains the dialog. It is a list of dictionaries and each element of this list contains several descriptions as well. Let’s look at an example:

{'text': "I would like a taxi from Saint John's college to Pizza Hut Fen Ditton.",
'dialog_act': {'Taxi-Inform': [['Dest', 'pizza hut fen ditton'],
['Depart', "saint john 's college"]]},
'span_info': [['Taxi-Inform', 'Dest', 'pizza hut fen ditton', 11, 14],
['Taxi-Inform', 'Depart', "saint john 's college", 6, 9]]}


For this assignment, we are only interested in the conversation which is in the text field.
The conversation goes back and forth between two persons. Let’s call them ‘Person 1’ and ‘Person 2’. This implies that
data[‘SNG0073.json’][‘log’][0][‘text’] is ‘Person 1’ and
data[‘SNG0073.json’][‘log’][1][‘text’] is ‘Person 2’ and so on. The even offsets are ‘Person 1’ and the odd offsets are ‘Person 2’.

 Person 1:  I would like a taxi from Saint John's college to Pizza Hut Fen Ditton.
Person 2:  What time do you want to leave and what time do you want to arrive by?


### Exercise 01

You will now implement the get_conversation() function that will extract the conversations from the dataset’s file.

Instructions: Implement a function to extract conversations from the input file.
As described above, the conversation is in the text field in each of the elements in the log list of the file. If the log list has x number of elements, then the function will get the text entries of each of those elements. Your function should return the conversation, prepending each field with either ‘ Person 1: ‘ if ‘x’ is even or ‘ Person 2: ‘ if ‘x’ is odd. You can use the Python modulus operator ‘%’ to help select the even/odd entries. Important note: Do not print a newline character (i.e. \n) when generating the string. For example, in the code cell above, your function should output something like:

and not:

 Person 1: am looking for a place to to stay that has cheap price range it should be in a type of hotel Person 2: Okay, do you have a specific area you want to stay in? Person 1: no, i just need to make sure it's cheap. oh, and i need parking Person 2: I found 1 cheap hotel for you that includes parking. Do you like me to book it? Person 1: Yes, please. 6 people 3 nights starting on tuesday. Person 2: I am sorry but I wasn't able to book that for you for Tuesday. Is there another day you would like to stay or perhaps a shorter stay? Person 1: how about only 2 nights. Person 2: Booking was successful.
Reference number is : 7GAWK763. Anything else I can do for you? Person 1: No, that will be all. Good bye. Person 2: Thank you for using our services.


Expected Result:

We can have a utility pretty print function just so we can visually follow the conversation more easily.

For this assignment, we will just use the outputs of the calls to get_conversation to train the model. But just to expound, there are also other information in the MultiWoz dataset that can be useful in other contexts. Each element of the log list has more information about it. For example, above, if you were to look at the other fields for the following, “am looking for a place to stay that has cheap price range it should be in a type of hotel”, you will get the following.

{'text': 'am looking for a place to to stay that has cheap price range it should be in a type of hotel',
'dialog_act': {'Hotel-Inform': [['Type', 'hotel'], ['Price', 'cheap']]},
'span_info': [['Hotel-Inform', 'Type', 'hotel', 20, 20],
['Hotel-Inform', 'Price', 'cheap', 10, 10]]}


The dataset also comes with hotel, hospital, taxi, train, police, and restaurant databases. For example, in case you need to call a doctor, or a hotel, or a taxi, this will allow you to automate the entire conversation. Take a look at the files accompanying the data set.

{'address': 'pool way, whitehill road, off newmarket road', 'area': 'east', 'entrance fee': '?', 'id': '1', 'location': [52.208789, 0.154883], 'name': 'abbey pool and astroturf pitch', 'openhours': '?', 'phone': '01223902088', 'postcode': 'cb58nt', 'pricerange': '?', 'type': 'swimmingpool'}

{'department': 'neurosciences critical care unit', 'id': 0, 'phone': '01223216297'}

{'address': '124 tenison road', 'area': 'east', 'internet': 'yes', 'parking': 'no', 'id': '0', 'location': [52.1963733, 0.1987426], 'name': 'a and b guest house', 'phone': '01223315702', 'postcode': 'cb12dp', 'price': {'double': '70', 'family': '90', 'single': '50'}, 'pricerange': 'moderate', 'stars': '4', 'takesbookings': 'yes', 'type': 'guesthouse'}

{'name': 'Parkside Police Station', 'address': 'Parkside, Cambridge', 'id': 0, 'phone': '01223358966'}

{'address': 'Regent Street City Centre', 'area': 'centre', 'food': 'italian', 'id': '19210', 'introduction': 'Pizza hut is a large chain with restaurants nationwide offering convenience pizzas pasta and salads to eat in or take away', 'location': [52.20103, 0.126023], 'name': 'pizza hut city centre', 'phone': '01223323737', 'postcode': 'cb21ab', 'pricerange': 'cheap', 'type': 'restaurant'}


For more information about the multiwoz 2.1 data set, please run the cell below to read the ReadMe.txt file. Feel free to open any other file to explore it.

#####################################################
#####################################################
#  Copyright Cambridge Dialogue Systems Group, 2018 #
#####################################################
#####################################################

Dataset contains the following files:
1. data.json: the woz dialogue dataset, which contains the conversation  users and wizards, as well as a set of coarse labels for each user turn. This file contains both system and user dialogue acts annotated at the turn level. Files with multi-domain dialogues have "MUL" in their names. Single domain dialogues have either "SNG" or "WOZ" in their names.
2. restaurant_db.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes.
3. attraction_db.json: the Cambridge attraction database file, contining attractions in the Cambridge UK area and a set of attributes.
4. hotel_db.json: the Cambridge hotel database file, containing hotels in the Cambridge UK area and a set of attributes.
5. train_db.json: the Cambridge train (with artificial connections) database file, containing trains in the Cambridge UK area and a set of attributes.
6. hospital_db.json: the Cambridge hospital database file, contatining information about departments.
7. police_db.json: the Cambridge police station information.
8. taxi_db.json: slot-value list for taxi domain.
9. valListFile.txt: list of dialogues for validation.
10. testListFile.txt: list of dialogues for testing.
11. system_acts.json:
There are 6 domains ('Booking', 'Restaurant', 'Hotel', 'Attraction', 'Taxi', 'Train') and 1 dummy domain ('general').
A domain-dependent dialogue act is defined as a domain token followed by a domain-independent dialogue act, e.g. 'Hotel-inform' means it is an 'inform' act in the Hotel domain.
Dialogue acts which cannot take slots, e.g., 'good bye', are defined under the 'general' domain.
A slot-value pair defined as a list with two elements. The first element is slot token and the second one is its value.
If a dialogue act takes no slots, e.g., dialogue act 'offer booking' for an utterance 'would you like to take a reservation?', its slot-value pair is ['none', 'none']
There are four types of values:
1) If a slot takes a binary value, e.g., 'has Internet' or 'has park', the value is either 'yes' or 'no'.
2) If a slot is under the act 'request', e.g., 'request' about 'area', the value is expressed as '?'.
3) The value that appears in the utterance e.g., the name of a restaurant.
4) If for some reason the turn does not have an annotation then it is labeled as "No Annotation."
12. ontology.json: Data-based ontology containing all the values for the different slots in the domains.
13. slot_descriptions.json: A collection of human-written slot descriptions for each slot in the dataset. Each slot has at least two descriptions.
14. tokenization.md: A description of the tokenization preprocessing we had to perform to maintain consistency between the dialogue act annotations of DSTC 8 Track 1 and the existing MultiWOZ 2.0 data.


As you can see, there are many other aspects of the MultiWoz dataset. Nonetheless, you’ll see that even with just the conversations, your model will still be able to generate useful responses. This concludes our exploration of the dataset. In the next section, we will do some preprocessing before we feed it into our model for training.

# Part 2: Processing the data for Reformer inputs

You will now use the get_conversation() function to process the data. The Reformer expects inputs of this form:

Person 1: Why am I so happy? Person 2: Because you are learning NLP Person 1: … Person 2: …*

And the conversation keeps going with some text. As you can see ‘Person 1’ and ‘Person 2’ act as delimiters so the model automatically recognizes the person and who is talking. It can then come up with the corresponding text responses for each person. Let’s proceed to process the text in this fashion for the Reformer. First, let’s grab all the conversation strings from all dialogue files and put them in a list.

 Person 1: am looking for a place to to stay that has cheap price range it should be in a type of hotel Person 2: Okay, do you have a specific area you want to stay in? Person 1: no, i just need to make sure it's cheap. oh, and i need parking Person 2: I found 1 cheap hotel for you that includes parking. Do you like me to book it? Person 1: Yes, please. 6 people 3 nights starting on tuesday. Person 2: I am sorry but I wasn't able to book that for you for Tuesday. Is there another day you would like to stay or perhaps a shorter stay? Person 1: how about only 2 nights. Person 2: Booking was successful.
Reference number is : 7GAWK763. Anything else I can do for you? Person 1: No, that will be all. Good bye. Person 2: Thank you for using our services.


Now let us split the list to a train and eval dataset.

number of conversations in the data set: 10438
number of conversations in train set: 9917
number of conversations in eval set: 521


## 2.1 Tokenizing, batching with bucketing

We can now proceed in generating tokenized batches of our data. Let’s first define a utility generator function to yield elements from our data sets:

Now let’s define our data pipeline for tokenizing and batching our data. As in the previous assignments, we will bucket by length and also have an upper bound on the token length.

Peek into the train stream.

input shape:  (4, 512)
Person 1: I need a place to stay that has free wifi.  Person 2: There are 32 options in Cambridge, what price range are you looking for? Person 1: I'm looking for something in the cheap price range, but I need it to have a 4 star rating. I don't need any parking though. Person 2: Again I have many to choose from that meet those criteria. Would you like a suggestion? Person 1: Ok, yes, if you could suggest one that comes with free parking that would be great! Person 2: I will book it for you,is there anything else I can do for you ? Person 1: I also need a Vietnamese restaurant. Person 2: My apologies it appears that I forgot to book your lodging. I recommend Alexander Bed and Breakfast, would you like me to book it for you? Person 1: Oh yes, please do. I need it for 8 people and 5 nights, beginning friday Person 2: You are booked with the reference number E9100B48. I can help you with the Vietnamese restaurant now. Do you have an area in mind? Person 1: I just want the restaurant to be in the same price range as my hotel Person 2: There is one cheap vietnamese restaurant in town. It is thanh binh. Do you want to book? Person 1: No, just provide me with the address and area for that restaurant if you could Person 2: The restaurant is located at 17 Magdalene Street City Centre in the West.  Can I help you with anything else? Person 1: Yes, will you book me a taxi to the restaurant from the hotel, please Person 2: And what time would you like that taxi? Person 1: I would like to leave the hotel by 22:15. Person 2: Your taxi service was book with a red volkswagen. The contact number is 07797935179 in case you need to contact them. Person 1: Thank you, that will be all. Person 2: You are welcome enjoy your meal. Have a good evenening


# Part 3: Reversible layers

When running large deep models, you will often run out of memory as each layer allocates memory to store activations for use in backpropagation. To save this resource, you need to be able to recompute these activations during the backward pass without storing them during the forward pass. Take a look first at the leftmost diagram below.

This is how the residual networks are implemented in the standard Transformer. It follows that, given F() is Attention and G() is Feed-forward(FF).
:

\begin{align}
\mathrm{y}_\mathrm{a} &= \mathrm{x} + \mathrm{F}\left(\mathrm{x}\right)\tag{1} \\
\mathrm{y}_{b}&=\mathrm{y}_{a}+\mathrm{G}\left(\mathrm{y}_{a}\right)\tag{2}\\
\end{align}

As you can see, it requires that $\mathrm{x}$ and $\mathrm{y}_{a}$ be saved so it can be used during backpropagation. We want to avoid this to conserve memory and this is where reversible residual connections come in. They are shown in the middle and rightmost diagrams above. The key idea is that we will start with two copies of the input to the model and at each layer we will only update one of them. The activations that we don’t update are the ones that will be used to compute the residuals.

Now in this reversible set up you get the following instead:

\begin{align}
\mathrm{y}_{1}&=\mathrm{x}_{1}+\mathrm{F}\left(\mathrm{x}_{2}\right)\tag{3}\\
\mathrm{y}_{2}&=\mathrm{x}_{2}+\mathrm{G}\left(\mathrm{y}_{1}\right)\tag{4}\\
\end{align}
To recover $\mathrm{(x_1,x_2)}$ from $\mathrm{(y_1, y_2)}$

\begin{align}
\mathrm{x}_{2}&=\mathrm{y}_{2}-\mathrm{G}\left(\mathrm{y}_{1}\right)\tag{5}\\
\mathrm{x}_{1}&=\mathrm{y}_{1}-\mathrm{F}\left(\mathrm{x}_{2}\right)\tag{6}\\
\end{align}

With this configuration, we’re now able to run the network fully in reverse. You’ll notice that during the backward pass, $\mathrm{x2}$ and $\mathrm{x1}$ can be recomputed based solely on the values of $\mathrm{y2}$ and $\mathrm{y1}$. No need to save it during the forward pass.

### Exercise 02

Instructions: You will implement the reversible_layer_forward function using equations 3 and 4 above. This function takes in the input vector x and the functions f and g and returns the concatenation of $y_1 and y_2$. For this exercise, we will be splitting x before going through the reversible residual steps$\mathrm{^1}$. We can then use those two vectors for the reversible_layer_reverse function. Utilize np.concatenate() to form the output being careful to match the axis of the np.split().

$\mathrm{^1}$Take note that this is just for demonstrating the concept in this exercise and there are other ways of processing the input. As you’ll see in the Reformer architecture later, the initial input (i.e. x) can instead be duplicated instead of split.

### Exercise 03

You will now implement the reversible_layer_reverse function which is possible because at every time step you have $x_1$ and $x_2$ and $y_2$ and $y_1$, along with the function f, and g. Where f is the attention and g is the feedforward. This allows you to compute equations 5 and 6.

Instructions: Implement the reversible_layer_reverse. Your function takes in the output vector from reversible_layer_forward and functions f and g. Using equations 5 and 6 above, it computes the inputs to the layer, $x_1$ and $x_2$. The output, x, is the concatenation of $x_1, x_2$. Utilize np.concatenate() to form the output being careful to match the axis of the np.split().

## 3.1 Reversible layers and randomness

This is why we were learning about fastmath’s random functions and keys in Course 3 Week 1. Utilizing the same key, trax.fastmath.random.uniform() will return the same values. This is required for the backward pass to return the correct layer inputs when random noise is introduced in the layer.

# Part 4: ReformerLM Training

You will now proceed to training your model. Since you have already know the two main components that differentiates it from the standard Transformer, LSH in Course 1 and reversible layers above, you can just use the pre-built model already implemented in Trax. It will have this architecture:

Similar to the Transformer you learned earlier, you want to apply an attention and feed forward layer to your inputs. For the Reformer, we improve the memory efficiency by using reversible decoder blocks and you can picture its implementation in Trax like below:

You can see that it takes the initial inputs x1 and x2 and does the first equation of the reversible networks you learned in Part 3. As you’ve also learned, the reversible residual has two equations for the forward-pass so doing just one of them will just constitute half of the reversible decoder block. Before doing the second equation (i.e. second half of the reversible residual), it first needs to swap the elements to take into account the stack semantics in Trax. It simply puts x2 on top of the stack so it can be fed to the add block of the half-residual layer. It then swaps the two outputs again so it can be fed to the next layer of the network. All of these arrives at the two equations in Part 3 and it can be used to recompute the activations during the backward pass.

These are already implemented for you in Trax and in the following exercise, you’ll get to practice how to call them to build your network.

### Exercise 04

Instructions: Implement a wrapper function that returns a Reformer Language Model. You can use Trax’s ReformerLM to do this quickly. It will have the same architecture as shown above.

Serial[
ShiftRight(1)
Embedding_train_512
Dropout
PositionalEncoding
Dup_out2
ReversibleSerial_in2_out2[
ReversibleHalfResidualV2_in2_out2[
Serial[
LayerNorm
]
SelfAttention
]
ReversibleSwap_in2_out2
ReversibleHalfResidualV2_in2_out2[
Serial[
LayerNorm
Dense_2048
Dropout
FastGelu
Dense_512
Dropout
]
]
ReversibleSwap_in2_out2
ReversibleHalfResidualV2_in2_out2[
Serial[
LayerNorm
]
SelfAttention
]
ReversibleSwap_in2_out2
ReversibleHalfResidualV2_in2_out2[
Serial[
LayerNorm
Dense_2048
Dropout
FastGelu
Dense_512
Dropout
]
]
ReversibleSwap_in2_out2
]
Concatenate_in2
LayerNorm
Dropout
Dense_train
LogSoftmax
]


### Exercise 05

You will now write a function that takes in your model and trains it.

Instructions: Implement the training_loop below to train the neural network above. Here is a list of things you should do:

• Create TrainTask and EvalTask
• Create the training loop trax.supervised.training.Loop
• Pass in the following depending to train_task :
• labeled_data=train_gen
• loss_layer=tl.CrossEntropyLoss()
• optimizer=trax.optimizers.Adam(0.01)
• lr_schedule=lr_schedule
• n_steps_per_checkpoint=10

• Pass in the following to eval_task:
• labeled_data=eval_gen
• metrics=[tl.CrossEntropyLoss(), tl.Accuracy()]

This function should return a training.Loop object. To read more about this check the docs.

<trax.supervised.training.TrainTask object at 0x7fd4ddf95dd0>

Step      1: Ran 1 train steps in 58.71 secs
Step      1: train CrossEntropyLoss |  10.41530514
Step      1: eval  CrossEntropyLoss |  10.41272354
Step      1: eval          Accuracy |  0.00000000

Step     10: Ran 9 train steps in 163.46 secs
Step     10: train CrossEntropyLoss |  10.25675583
Step     10: eval  CrossEntropyLoss |  9.94296360
Step     10: eval          Accuracy |  0.11201393


Approximate Expected output:

We can now initialize our model from a file containing the pretrained weights. We will save this starting state so we can reset the model state when we generate a new conversation. This will become clearer in the generate_dialogue() function later.

Let’s define a few utility functions as well to help us tokenize and detokenize. We can use the tokenize() and detokenize() from trax.data.tf_inputs to do this.

We are now ready to define our decoding function. This will return a generator that yields that next symbol output by the model. It will be able to predict the next words by just feeding it a starting sentence.

### Exercise 06

Instructions: Implement the function below to return a generator that predicts the next word of the conversation.

[1, 0, 4, 3, 0, 4]


Expected value:

[1, 0, 4, 3, 0, 4]

Great! Now you will be able to see the model in action. The utility function below will call the generator you just implemented and will just format the output to be easier to read.

We can now feed in different starting sentences and see how the model generates the dialogue. You can even input your own starting sentence. Just remember to ask a question that covers the topics in the Multiwoz dataset so you can generate a meaningful conversation.

Person 1: Are there theatres in town?
Person 2: : There are 4 theatres in town. Do you have a preference on area?
Person 1: No, I don't care. Which one do you recommend?
Person 2: I would recommend the Mumford Theatre. Would you like more information on it?
Person 1: Yes, could I get the postcode and phone number please?
Person 2: The phone number is 08451962320 and the postcode is cb11pt. The phone number is 084519/ 15/15 - would you like to book a table?

Person 1: Is there a hospital nearby?
Person 2: : Addensbrookes Hospital is located at Hills Rd, Cambridge, postcode CB20QQ. Do you need anything else?
Person 1: No, that's all I need. Thanks.
Person 2: You're welcome. Have a good day.Good bye.
Person 1: Thanks again. Goodbye.
Person 2: You're welcome. Have a good day.Good bye.

Person 1: Can you book a taxi?
Person 2: : I sure can. Where are you going?
Person 1: I'm going to be picked up from the city centre north b and b.
Person 2: I have booked you a grey volkswagen. The contact number is 0783212843.
Person 1: Thank you. That's all I need.
Person 2: Thank you for using our services. Have a great day!k you.Good bye.
Person 1: Actually, I'ry about there.


Congratulations! You just wrapped up the final assignment of this course and the entire specialization!