Assignment 1: Sentiment with Deep Neural Networks
Welcome to the first assignment of course 3. In this assignment, you will explore sentiment analysis using deep neural networks.
Outline
- Part 1: Import libraries and try out Trax
- Part 2: Importing the data
- Part 3: Defining classes
- Part 4: Training
- Part 5: Evaluation
- Part 6: Testing with your own input
In course 1, you implemented Logistic regression and Naive Bayes for sentiment analysis. However if you were to give your old models an example like:
Your model would have predicted a positive sentiment for that review. However, that sentence has a negative sentiment and indicates that the movie was not good. To solve those kinds of misclassifications, you will write a program that uses deep neural networks to identify sentiment in text. By completing this assignment, you will:
- Understand how you can build/design a model using layers
- Train a model using a training loop
- Use a binary cross-entropy loss function
- Compute the accuracy of your model
- Predict using your own input
As you can tell, this model follows a similar structure to the one you previously implemented in the second course of this specialization.
- Indeed most of the deep nets you will be implementing will have a similar structure. The only thing that changes is the model architecture, the inputs, and the outputs. Before starting the assignment, we will introduce you to the Google library
trax
that we use for building and training models.
Now we will show you how to compute the gradient of a certain function f
by just using .grad(f)
.
Part 1: Import libraries and try out Trax
- Let’s import libraries and look at an example of using the Trax library.
1 | import os |
INFO:tensorflow:tokens_length=568 inputs_length=512 targets_length=114 noise_density=0.15 mean_noise_span_length=3.0
[nltk_data] Downloading package twitter_samples to
[nltk_data] /home/jovyan/nltk_data...
[nltk_data] Package twitter_samples is already up-to-date!
[nltk_data] Downloading package stopwords to /home/jovyan/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
1 | # Create an array using trax.fastmath.numpy |
DeviceArray(5., dtype=float32)
<class 'jax.interpreters.xla.DeviceArray'>
Notice that trax.fastmath.numpy returns a DeviceArray from the jax library.
1 | # Define a function that will use the trax.fastmath.numpy array |
1 | # Call the function |
f(a) for a=5.0 is 25.0
The gradient (derivative) of function f
with respect to its input x
is the derivative of $x^2$.
- The derivative of $x^2$ is $2x$.
- When x is 5, then $2x=10$.
You can calculate the gradient of a function by using trax.fastmath.grad(fun=)
and passing in the name of the function.
- In this case the function you want to take the gradient of is
f
. - The object returned (saved in
grad_f
in this example) is a function that can calculate the gradient of f for a given trax.fastmath.numpy array.
1 | # Directly use trax.fastmath.grad to calculate the gradient (derivative) of the function |
function
1 | # Call the newly created function and pass in a value for x (the DeviceArray stored in 'a') |
DeviceArray(10., dtype=float32)
The function returned by trax.fastmath.grad takes in x=5 and calculates the gradient of f, which is 2*x, which is 10. The value is also stored as a DeviceArray from the jax library.
Part 2: Importing the data
2.1 Loading in the data
Import the data set.
- You may recognize this from earlier assignments in the specialization.
- Details of process_tweet function are available in utils.py file
1 | ## DO NOT EDIT THIS CELL |
The number of positive tweets: 5000
The number of negative tweets: 5000
length of train_x 8000
length of val_x 2000
Now import a function that processes tweets (we’ve provided this in the utils.py file).
- `process_tweets’ removes unwanted characters e.g. hashtag, hyperlinks, stock tickers from tweet.
- It also returns a list of words (it tokenizes the original string).
1 | # Import a function that processes the tweets |
original tweet at training position 0
#FollowFriday @France_Inte @PKuchly57 @Milipol_Paris for being top engaged members in my community this week :)
Tweet at training position 0 after processing:
['followfriday', 'top', 'engag', 'member', 'commun', 'week', ':)']
Notice that the function process_tweet
keeps key words, removes the hash # symbol, and ignores usernames (words that begin with ‘@’). It also returns a list of the words.
2.2 Building the vocabulary
Now build the vocabulary.
- Map each word in each tweet to an integer (an “index”).
- The following code does this for you, but please read it and understand what it’s doing.
- Note that you will build the vocabulary based on the training data.
- To do so, you will assign an index to everyword by iterating over your training set.
The vocabulary will also include some special tokens
__PAD__
: padding</e>
: end of line__UNK__
: a token representing any word that is not in the vocabulary.
1 | # Build the vocabulary |
Total words in vocab are 9088
{'__PAD__': 0,
'__</e>__': 1,
'__UNK__': 2,
'followfriday': 3,
'top': 4,
'engag': 5,
'member': 6,
'commun': 7,
'week': 8,
':)': 9,
'hey': 10,
'jame': 11,
...}
The dictionary Vocab
will look like this:1
2
3
4
5
6
7{'__PAD__': 0,
'__</e>__': 1,
'__UNK__': 2,
'followfriday': 3,
'top': 4,
'engag': 5,
...
- Each unique word has a unique integer associated with it.
- The total number of words in Vocab: 9088
2.3 Converting a tweet to a tensor
Write a function that will convert each tweet to a tensor (a list of unique integer IDs representing the processed tweet).
- Note, the returned data type will be a regular Python
list()
- You won’t use TensorFlow in this function
- You also won’t use a numpy array
- You also won’t use trax.fastmath.numpy array
- For words in the tweet that are not in the vocabulary, set them to the unique ID for the token
__UNK__
.
Example
Input a tweet:1
'@happypuppy, is Maria happy?'
The tweet_to_tensor will first conver the tweet into a list of tokens (including only relevant words)1
['maria', 'happi']
Then it will convert each word into its unique integer
1 | [2, 56] |
- Notice that the word “maria” is not in the vocabulary, so it is assigned the unique integer associated with the
__UNK__
token, because it is considered “unknown.”
Exercise 01
Instructions: Write a program tweet_to_tensor
that takes in a tweet and converts it to an array of numbers. You can use the Vocab
dictionary you just found to help create the tensor.
- Use the vocab_dict parameter and not a global variable.
- Do not hard code the integer value for the
__UNK__
token.
- Map each word in tweet to corresponding token in 'Vocab'
- Use Python's Dictionary.get(key,value) so that the function returns a default value if the key is not found in the dictionary.
1 | # UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) |
1 | print("Actual tweet is\n", val_pos[0]) |
Actual tweet is
Bro:U wan cut hair anot,ur hair long Liao bo
Me:since ord liao,take it easy lor treat as save $ leave it longer :)
Bro:LOL Sibei xialan
Tensor of tweet:
[1065, 136, 479, 2351, 745, 8148, 1123, 745, 53, 2, 2672, 791, 2, 2, 349, 601, 2, 3489, 1017, 597, 4559, 9, 1065, 157, 2, 2]
Expected output
1 | Actual tweet is |
1 | # test tweet_to_tensor |
2.4 Creating a batch generator
Most of the time in Natural Language Processing, and AI in general we use batches when training our data sets.
- If instead of training with batches of examples, you were to train a model with one example at a time, it would take a very long time to train the model.
- You will now build a data generator that takes in the positive/negative tweets and returns a batch of training examples. It returns the model inputs, the targets (positive or negative labels) and the weight for each target (ex: this allows us to can treat some examples as more important to get right than others, but commonly this will all be 1.0).
Once you create the generator, you could include it in a for loop
1 | for batch_inputs, batch_targets, batch_example_weights in data_generator: |
You can also get a single batch like this:
1 | batch_inputs, batch_targets, batch_example_weights = next(data_generator) |
The generator returns the next batch each time it’s called.
- This generator returns the data in a format (tensors) that you could directly use in your model.
- It returns a triple: the inputs, targets, and loss weights:
— Inputs is a tensor that contains the batch of tweets we put into the model.
— Targets is the corresponding batch of labels that we train to generate.
— Loss weights here are just 1s with same shape as targets. Next week, you will use it to mask input padding.
Exercise 02
Implement data_generator
.
1 | # UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) |
Now you can use your data generator to create a data generator for the training data, and another data generator for the validation data.
We will create a third data generator that does not loop, for testing the final accuracy of the model.
1 | # Set the random number generator for the shuffle procedure |
Inputs: [[2005 4451 3201 9 0 0 0 0 0 0 0]
[4954 567 2000 1454 5174 3499 141 3499 130 459 9]
[3761 109 136 583 2930 3969 0 0 0 0 0]
[ 250 3761 0 0 0 0 0 0 0 0 0]]
Targets: [1 1 0 0]
Example Weights: [1 1 1 1]
1 | # Test the train_generator |
The inputs shape is (4, 14)
The targets shape is (4,)
The example weights shape is (4,)
input tensor: [3 4 5 6 7 8 9 0 0 0 0 0 0 0]; target 1; example weights 1
input tensor: [10 11 12 13 14 15 16 17 18 19 20 9 21 22]; target 1; example weights 1
input tensor: [5738 2901 3761 0 0 0 0 0 0 0 0 0 0 0]; target 0; example weights 1
input tensor: [ 858 256 3652 5739 307 4458 567 1230 2767 328 1202 3761 0 0]; target 0; example weights 1
Expected output
1 | The inputs shape is (4, 14) |
Now that you have your train/val generators, you can just call them and they will return tensors which correspond to your tweets in the first column and their corresponding labels in the second column. Now you can go ahead and start building your neural network.
Part 3: Defining classes
In this part, you will write your own library of layers. It will be very similar
to the one used in Trax and also in Keras and PyTorch. Writing your own small
framework will help you understand how they all work and use them effectively
in the future.
Your framework will be based on the following Layer
class from utils.py.
1 | class Layer(object): |
3.1 ReLU class
You will now implement the ReLU activation function in a class below. The ReLU function looks as follows:
Exercise 03
Instructions: Implement the ReLU activation function below. Your function should take in a matrix or vector and it should transform all the negative numbers into 0 while keeping all the positive numbers intact.
- Please use numpy.maximum(A,k) to find the maximum between each element in A and a scalar k
1 | # UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) |
1 | # Test your relu function |
Test data is:
[[-2. -1. 0.]
[ 0. 1. 2.]]
Output of Relu is:
[[0. 0. 0.]
[0. 1. 2.]]
Expected Outout
1 | Test data is: |
3.2 Dense class
Exercise
Implement the forward function of the Dense class.
- The forward function multiplies the input to the layer (
x
) by the weight matrix (W
)
- You can use
numpy.dot
to perform the matrix multiplication.
Note that for more efficient code execution, you will use the trax version of math
, which includes a trax version of numpy
and also random
.
Implement the weight initializer new_weights
function
- Weights are initialized with a random key.
- The second parameter is a tuple for the desired shape of the weights (num_rows, num_cols)
- The num of rows for weights should equal the number of columns in x, because for forward propagation, you will multiply x times weights.
Please use trax.fastmath.random.normal(key, shape, dtype=tf.float32)
to generate random values for the weight matrix. The key difference between this function
and the standard numpy
randomness is the explicit use of random keys, which
need to be passed. While it can look tedious at the first sight to pass the random key everywhere, you will learn in Course 4 why this is very helpful when
implementing some advanced models.
key
can be generated by callingrandom.get_prng(seed=)
and passing in a number for theseed
.shape
is a tuple with the desired shape of the weight matrix.- The number of rows in the weight matrix should equal the number of columns in the variable
x
. Sincex
may have 2 dimensions if it reprsents a single training example (row, col), or three dimensions (batch_size, row, col), get the last dimension from the tuple that holds the dimensions of x. - The number of columns in the weight matrix is the number of units chosen for that dense layer. Look at the
__init__
function to see which variable stores the number of units.
- The number of rows in the weight matrix should equal the number of columns in the variable
dtype
is the data type of the values in the generated matrix; keep the default oftf.float32
. In this case, don’t explicitly set the dtype (just let it use the default value).
Set the standard deviation of the random values to 0.1
- The values generated have a mean of 0 and standard deviation of 1.
- Set the default standard deviation
stdev
to be 0.1 by multiplying the standard deviation to each of the values in the weight matrix.
1 | # use the fastmath module within trax |
1 | # See how the fastmath.trax.random.normal function works |
The random seed generated by random.get_prng
DeviceArray([0, 1], dtype=uint32)
choose a matrix with 2 rows and 3 columns
(2, 3)
Weight matrix generated with a normal distribution with mean 0 and stdev of 1
DeviceArray([[ 0.95730704, -0.96992904, 1.0070664 ],
[ 0.36619025, 0.17294823, 0.29092228]], dtype=float32)
Exercise 04
Implement the Dense
class.
1 | # UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) |
1 | # Testing your Dense layer |
Weights are
[[-0.02837108 0.09368162 -0.10050076 0.14165013 0.10543301 0.09108126
-0.04265672 0.0986188 -0.05575325 0.00153249]
[-0.20785688 0.0554837 0.09142365 0.05744595 0.07227863 0.01210617
-0.03237354 0.16234995 0.02450038 -0.13809784]
[-0.06111237 0.01403724 0.08410042 -0.1094358 -0.10775021 -0.11396459
-0.05933381 -0.01557652 -0.03832145 -0.11144515]]
Foward function output is [[-3.0395496 0.9266802 2.5414743 -2.050473 -1.9769388 -2.582209
-1.7952735 0.94427425 -0.8980402 -3.7497487 ]]
Expected Outout
1 | Weights are |
3.3 Model
Now you will implement a classifier using neural networks. Here is the model architecture you will be implementing.
For the model implementation, you will use the Trax layers library tl
.
Note that the second character of tl
is the lowercase of letter L
, not the number 1. Trax layers are very similar to the ones you implemented above,
but in addition to trainable weights also have a non-trainable state.
State is used in layers like batch normalization and for inference, you will learn more about it in course 4.
First, look at the code of the Trax Dense layer and compare to your implementation above.
- tl.Dense: Trax Dense layer implementation
One other important layer that you will use a lot is one that allows to execute one layer after another in sequence.
- tl.Serial: Combinator that applies layers serially.
- You can pass in the layers as arguments to
Serial
, separated by commas. - For example:
tl.Serial(tl.Embeddings(...), tl.Mean(...), tl.Dense(...), tl.LogSoftmax(...))
- You can pass in the layers as arguments to
Please use the help
function to view documentation for each layer.
1 | # View documentation on tl.Dense |
Help on class Dense in module trax.layers.core:
class Dense(trax.layers.base.Layer)
| Dense(n_units, kernel_initializer=<function ScaledInitializer.<locals>.Init at 0x7fb32d622620>, bias_initializer=<function RandomNormalInitializer.<locals>.<lambda> at 0x7fb32d6226a8>, use_bias=True)
|
| A dense (a.k.a. fully-connected, affine) layer.
|
| Dense layers are the prototypical example of a trainable layer, i.e., a layer
| with trainable weights. Each node in a dense layer computes a weighted sum of
| all node values from the preceding layer and adds to that sum a node-specific
| bias term. The full layer computation is expressed compactly in linear
| algebra as an affine map `y = Wx + b`, where `W` is a matrix and `y`, `x`,
| and `b` are vectors. The layer is trained, or "learns", by updating the
| values in `W` and `b`.
|
| Less commonly, a dense layer can omit the bias term and be a pure linear map:
| `y = Wx`.
|
| Method resolution order:
| Dense
| trax.layers.base.Layer
| builtins.object
|
| Methods defined here:
|
| __init__(self, n_units, kernel_initializer=<function ScaledInitializer.<locals>.Init at 0x7fb32d622620>, bias_initializer=<function RandomNormalInitializer.<locals>.<lambda> at 0x7fb32d6226a8>, use_bias=True)
| Returns a dense (fully connected) layer of width `n_units`.
|
| A dense layer maps collections of `R^m` vectors to `R^n`, where `n`
| (`= n_units`) is fixed at layer creation time, and `m` is set at layer
| initialization time.
|
| Args:
| n_units: Number of nodes in the layer, also known as the width of the
| layer.
| kernel_initializer: Function that creates a matrix of (random) initial
| connection weights `W` for the layer.
| bias_initializer: Function that creates a vector of (random) initial
| bias weights `b` for the layer.
| use_bias: If `True`, compute an affine map `y = Wx + b`; else compute
| a linear map `y = Wx`.
|
| forward(self, x)
| Executes this layer as part of a forward pass through the model.
|
| Args:
| x: Tensor of same shape and dtype as the input signature used to
| initialize this layer.
|
| Returns:
| Tensor of same shape and dtype as the input, except the final dimension
| is the layer's `n_units` value.
|
| init_weights_and_state(self, input_signature)
| Returns newly initialized weights for this layer.
|
| Weights are a `(w, b)` tuple for layers created with `use_bias=True` (the
| default case), or a `w` tensor for layers created with `use_bias=False`.
|
| Args:
| input_signature: `ShapeDtype` instance characterizing the input this layer
| should compute on.
|
| ----------------------------------------------------------------------
| Methods inherited from trax.layers.base.Layer:
|
| __call__(self, x, weights=None, state=None, rng=None)
| Makes layers callable; for use in tests or interactive settings.
|
| This convenience method helps library users play with, test, or otherwise
| probe the behavior of layers outside of a full training environment. It
| presents the layer as callable function from inputs to outputs, with the
| option of manually specifying weights and non-parameter state per individual
| call. For convenience, weights and non-parameter state are cached per layer
| instance, starting from default values of `EMPTY_WEIGHTS` and `EMPTY_STATE`,
| and acquiring non-empty values either by initialization or from values
| explicitly provided via the weights and state keyword arguments.
|
| Args:
| x: Zero or more input tensors, packaged as described in the `Layer` class
| docstring.
| weights: Weights or `None`; if `None`, use self's cached weights value.
| state: State or `None`; if `None`, use self's cached state value.
| rng: Single-use random number generator (JAX PRNG key), or `None`;
| if `None`, use a default computed from an integer 0 seed.
|
| Returns:
| Zero or more output tensors, packaged as described in the `Layer` class
| docstring.
|
| __repr__(self)
| Return repr(self).
|
| backward(self, inputs, output, grad, weights, state, new_state, rng)
| Custom backward pass to propagate gradients in a custom way.
|
| Args:
| inputs: Input tensors; can be a (possibly nested) tuple.
| output: The result of running this layer on inputs.
| grad: Gradient signal computed based on subsequent layers; its structure
| and shape must match output.
| weights: This layer's weights.
| state: This layer's state prior to the current forward pass.
| new_state: This layer's state after the current forward pass.
| rng: Single-use random number generator (JAX PRNG key).
|
| Returns:
| The custom gradient signal for the input. Note that we need to return
| a gradient for each argument of forward, so it will usually be a tuple
| of signals: the gradient for inputs and weights.
|
| init(self, input_signature, rng=None, use_cache=False)
| Initializes weights/state of this layer and its sublayers recursively.
|
| Initialization creates layer weights and state, for layers that use them.
| It derives the necessary array shapes and data types from the layer's input
| signature, which is itself just shape and data type information.
|
| For layers without weights or state, this method safely does nothing.
|
| This method is designed to create weights/state only once for each layer
| instance, even if the same layer instance occurs in multiple places in the
| network. This enables weight sharing to be implemented as layer sharing.
|
| Args:
| input_signature: `ShapeDtype` instance (if this layer takes one input)
| or list/tuple of `ShapeDtype` instances.
| rng: Single-use random number generator (JAX PRNG key), or `None`;
| if `None`, use a default computed from an integer 0 seed.
| use_cache: If `True`, and if this layer instance has already been
| initialized elsewhere in the network, then return special marker
| values -- tuple `(GET_WEIGHTS_FROM_CACHE, GET_STATE_FROM_CACHE)`.
| Else return this layer's newly initialized weights and state.
|
| Returns:
| A `(weights, state)` tuple.
|
| init_from_file(self, file_name, weights_only=False, input_signature=None)
| Initializes this layer and its sublayers from a pickled checkpoint.
|
| In the common case (`weights_only=False`), the file must be a gziped pickled
| dictionary containing items with keys `'flat_weights', `'flat_state'` and
| `'input_signature'`, which are used to initialize this layer.
| If `input_signature` is specified, it's used instead of the one in the file.
| If `weights_only` is `True`, the dictionary does not need to have the
| `'flat_state'` item and the state it not restored either.
|
| Args:
| file_name: Name/path of the pickeled weights/state file.
| weights_only: If `True`, initialize only the layer's weights. Else
| initialize both weights and state.
| input_signature: Input signature to be used instead of the one from file.
|
| output_signature(self, input_signature)
| Returns output signature this layer would give for `input_signature`.
|
| pure_fn(self, x, weights, state, rng, use_cache=False)
| Applies this layer as a pure function with no optional args.
|
| This method exposes the layer's computation as a pure function. This is
| especially useful for JIT compilation. Do not override, use `forward`
| instead.
|
| Args:
| x: Zero or more input tensors, packaged as described in the `Layer` class
| docstring.
| weights: A tuple or list of trainable weights, with one element for this
| layer if this layer has no sublayers, or one for each sublayer if
| this layer has sublayers. If a layer (or sublayer) has no trainable
| weights, the corresponding weights element is an empty tuple.
| state: Layer-specific non-parameter state that can update between batches.
| rng: Single-use random number generator (JAX PRNG key).
| use_cache: if `True`, cache weights and state in the layer object; used
| to implement layer sharing in combinators.
|
| Returns:
| A tuple of `(tensors, state)`. The tensors match the number (`n_out`)
| promised by this layer, and are packaged as described in the `Layer`
| class docstring.
|
| weights_and_state_signature(self, input_signature)
| Return a pair containing the signatures of weights and state.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from trax.layers.base.Layer:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| has_backward
| Returns `True` if this layer provides its own custom backward pass code.
|
| A layer subclass that provides custom backward pass code (for custom
| gradients) must override this method to return `True`.
|
| n_in
| Returns how many tensors this layer expects as input.
|
| n_out
| Returns how many tensors this layer promises as output.
|
| name
| Returns the name of this layer.
|
| rng
| Returns a single-use random number generator without advancing it.
|
| state
| Returns a tuple containing this layer's state; may be empty.
|
| sublayers
| Returns a tuple containing this layer's sublayers; may be empty.
|
| weights
| Returns this layer's weights.
|
| Depending on the layer, the weights can be in the form of:
|
| - an empty tuple
| - a tensor (ndarray)
| - a nested structure of tuples and tensors
1 | # View documentation on tl.Serial |
Help on class Serial in module trax.layers.combinators:
class Serial(trax.layers.base.Layer)
| Serial(*sublayers, name=None, sublayers_to_print=None)
|
| Combinator that applies layers serially (by function composition).
|
| This combinator is commonly used to construct deep networks, e.g., like this::
|
| mlp = tl.Serial(
| tl.Dense(128),
| tl.Relu(),
| tl.Dense(10),
| tl.LogSoftmax()
| )
|
| A Serial combinator uses stack semantics to manage data for its sublayers.
| Each sublayer sees only the inputs it needs and returns only the outputs it
| has generated. The sublayers interact via the data stack. For instance, a
| sublayer k, following sublayer j, gets called with the data stack in the
| state left after layer j has applied. The Serial combinator then:
|
| - takes n_in items off the top of the stack (n_in = k.n_in) and calls
| layer k, passing those items as arguments; and
|
| - takes layer k's n_out return values (n_out = k.n_out) and pushes
| them onto the data stack.
|
| A Serial instance with no sublayers acts as a special-case (but useful)
| 1-input 1-output no-op.
|
| Method resolution order:
| Serial
| trax.layers.base.Layer
| builtins.object
|
| Methods defined here:
|
| __init__(self, *sublayers, name=None, sublayers_to_print=None)
| Creates a partially initialized, unconnected layer instance.
|
| Args:
| n_in: Number of inputs expected by this layer.
| n_out: Number of outputs promised by this layer.
| name: Class-like name for this layer; for use when printing this layer.
| sublayers_to_print: Sublayers to display when printing out this layer;
| By default (when None) we display all sublayers.
|
| forward(self, xs)
| Computes this layer's output as part of a forward pass through the model.
|
| Authors of new layer subclasses should override this method to define the
| forward computation that their layer performs. Use `self.weights` to access
| trainable weights of this layer. If you need to use local non-trainable
| state or randomness, use `self.rng` for the random seed (no need to set it)
| and use `self.state` for non-trainable state (and set it to the new value).
|
| Args:
| inputs: Zero or more input tensors, packaged as described in the `Layer`
| class docstring.
|
| Returns:
| Zero or more output tensors, packaged as described in the `Layer` class
| docstring.
|
| init_weights_and_state(self, input_signature)
| Initializes weights and state for inputs with the given signature.
|
| Authors of new layer subclasses should override this method if their layer
| uses trainable weights or non-trainable state. To initialize trainable
| weights, set `self.weights` and to initialize non-trainable state,
| set `self.state` to the intended value.
|
| Args:
| input_signature: A `ShapeDtype` instance (if this layer takes one input)
| or a list/tuple of `ShapeDtype` instances; signatures of inputs.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| state
| Returns a tuple containing this layer's state; may be empty.
|
| weights
| Returns this layer's weights.
|
| Depending on the layer, the weights can be in the form of:
|
| - an empty tuple
| - a tensor (ndarray)
| - a nested structure of tuples and tensors
|
| ----------------------------------------------------------------------
| Methods inherited from trax.layers.base.Layer:
|
| __call__(self, x, weights=None, state=None, rng=None)
| Makes layers callable; for use in tests or interactive settings.
|
| This convenience method helps library users play with, test, or otherwise
| probe the behavior of layers outside of a full training environment. It
| presents the layer as callable function from inputs to outputs, with the
| option of manually specifying weights and non-parameter state per individual
| call. For convenience, weights and non-parameter state are cached per layer
| instance, starting from default values of `EMPTY_WEIGHTS` and `EMPTY_STATE`,
| and acquiring non-empty values either by initialization or from values
| explicitly provided via the weights and state keyword arguments.
|
| Args:
| x: Zero or more input tensors, packaged as described in the `Layer` class
| docstring.
| weights: Weights or `None`; if `None`, use self's cached weights value.
| state: State or `None`; if `None`, use self's cached state value.
| rng: Single-use random number generator (JAX PRNG key), or `None`;
| if `None`, use a default computed from an integer 0 seed.
|
| Returns:
| Zero or more output tensors, packaged as described in the `Layer` class
| docstring.
|
| __repr__(self)
| Return repr(self).
|
| backward(self, inputs, output, grad, weights, state, new_state, rng)
| Custom backward pass to propagate gradients in a custom way.
|
| Args:
| inputs: Input tensors; can be a (possibly nested) tuple.
| output: The result of running this layer on inputs.
| grad: Gradient signal computed based on subsequent layers; its structure
| and shape must match output.
| weights: This layer's weights.
| state: This layer's state prior to the current forward pass.
| new_state: This layer's state after the current forward pass.
| rng: Single-use random number generator (JAX PRNG key).
|
| Returns:
| The custom gradient signal for the input. Note that we need to return
| a gradient for each argument of forward, so it will usually be a tuple
| of signals: the gradient for inputs and weights.
|
| init(self, input_signature, rng=None, use_cache=False)
| Initializes weights/state of this layer and its sublayers recursively.
|
| Initialization creates layer weights and state, for layers that use them.
| It derives the necessary array shapes and data types from the layer's input
| signature, which is itself just shape and data type information.
|
| For layers without weights or state, this method safely does nothing.
|
| This method is designed to create weights/state only once for each layer
| instance, even if the same layer instance occurs in multiple places in the
| network. This enables weight sharing to be implemented as layer sharing.
|
| Args:
| input_signature: `ShapeDtype` instance (if this layer takes one input)
| or list/tuple of `ShapeDtype` instances.
| rng: Single-use random number generator (JAX PRNG key), or `None`;
| if `None`, use a default computed from an integer 0 seed.
| use_cache: If `True`, and if this layer instance has already been
| initialized elsewhere in the network, then return special marker
| values -- tuple `(GET_WEIGHTS_FROM_CACHE, GET_STATE_FROM_CACHE)`.
| Else return this layer's newly initialized weights and state.
|
| Returns:
| A `(weights, state)` tuple.
|
| init_from_file(self, file_name, weights_only=False, input_signature=None)
| Initializes this layer and its sublayers from a pickled checkpoint.
|
| In the common case (`weights_only=False`), the file must be a gziped pickled
| dictionary containing items with keys `'flat_weights', `'flat_state'` and
| `'input_signature'`, which are used to initialize this layer.
| If `input_signature` is specified, it's used instead of the one in the file.
| If `weights_only` is `True`, the dictionary does not need to have the
| `'flat_state'` item and the state it not restored either.
|
| Args:
| file_name: Name/path of the pickeled weights/state file.
| weights_only: If `True`, initialize only the layer's weights. Else
| initialize both weights and state.
| input_signature: Input signature to be used instead of the one from file.
|
| output_signature(self, input_signature)
| Returns output signature this layer would give for `input_signature`.
|
| pure_fn(self, x, weights, state, rng, use_cache=False)
| Applies this layer as a pure function with no optional args.
|
| This method exposes the layer's computation as a pure function. This is
| especially useful for JIT compilation. Do not override, use `forward`
| instead.
|
| Args:
| x: Zero or more input tensors, packaged as described in the `Layer` class
| docstring.
| weights: A tuple or list of trainable weights, with one element for this
| layer if this layer has no sublayers, or one for each sublayer if
| this layer has sublayers. If a layer (or sublayer) has no trainable
| weights, the corresponding weights element is an empty tuple.
| state: Layer-specific non-parameter state that can update between batches.
| rng: Single-use random number generator (JAX PRNG key).
| use_cache: if `True`, cache weights and state in the layer object; used
| to implement layer sharing in combinators.
|
| Returns:
| A tuple of `(tensors, state)`. The tensors match the number (`n_out`)
| promised by this layer, and are packaged as described in the `Layer`
| class docstring.
|
| weights_and_state_signature(self, input_signature)
| Return a pair containing the signatures of weights and state.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from trax.layers.base.Layer:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| has_backward
| Returns `True` if this layer provides its own custom backward pass code.
|
| A layer subclass that provides custom backward pass code (for custom
| gradients) must override this method to return `True`.
|
| n_in
| Returns how many tensors this layer expects as input.
|
| n_out
| Returns how many tensors this layer promises as output.
|
| name
| Returns the name of this layer.
|
| rng
| Returns a single-use random number generator without advancing it.
|
| sublayers
| Returns a tuple containing this layer's sublayers; may be empty.
- tl.Embedding: Layer constructor function for an embedding layer.
tl.Embedding(vocab_size, d_feature)
.vocab_size
is the number of unique words in the given vocabulary.d_feature
is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).
1 | # View documentation for tl.Embedding |
Help on class Embedding in module trax.layers.core:
class Embedding(trax.layers.base.Layer)
| Embedding(vocab_size, d_feature, kernel_initializer=<function RandomNormalInitializer.<locals>.<lambda> at 0x7fb32d6228c8>)
|
| Trainable layer that maps discrete tokens/ids to vectors.
|
| Method resolution order:
| Embedding
| trax.layers.base.Layer
| builtins.object
|
| Methods defined here:
|
| __init__(self, vocab_size, d_feature, kernel_initializer=<function RandomNormalInitializer.<locals>.<lambda> at 0x7fb32d6228c8>)
| Returns an embedding layer with given vocabulary size and vector size.
|
| The layer clips input values (token ids) to the range `[0, vocab_size)`.
| That is, negative token ids all clip to `0` before being mapped to a
| vector, and token ids with value `vocab_size` or greater all clip to
| `vocab_size - 1` before being mapped to a vector.
|
| Args:
| vocab_size: Size of the input vocabulary. The layer will assign a unique
| vector to each id in `range(vocab_size)`.
| d_feature: Dimensionality/depth of the output vectors.
| kernel_initializer: Function that creates (random) initial vectors for
| the embedding.
|
| forward(self, x)
| Returns embedding vectors corresponding to input token id's.
|
| Args:
| x: Tensor of token id's.
|
| Returns:
| Tensor of embedding vectors.
|
| init_weights_and_state(self, input_signature)
| Returns tensor of newly initialized embedding vectors.
|
| ----------------------------------------------------------------------
| Methods inherited from trax.layers.base.Layer:
|
| __call__(self, x, weights=None, state=None, rng=None)
| Makes layers callable; for use in tests or interactive settings.
|
| This convenience method helps library users play with, test, or otherwise
| probe the behavior of layers outside of a full training environment. It
| presents the layer as callable function from inputs to outputs, with the
| option of manually specifying weights and non-parameter state per individual
| call. For convenience, weights and non-parameter state are cached per layer
| instance, starting from default values of `EMPTY_WEIGHTS` and `EMPTY_STATE`,
| and acquiring non-empty values either by initialization or from values
| explicitly provided via the weights and state keyword arguments.
|
| Args:
| x: Zero or more input tensors, packaged as described in the `Layer` class
| docstring.
| weights: Weights or `None`; if `None`, use self's cached weights value.
| state: State or `None`; if `None`, use self's cached state value.
| rng: Single-use random number generator (JAX PRNG key), or `None`;
| if `None`, use a default computed from an integer 0 seed.
|
| Returns:
| Zero or more output tensors, packaged as described in the `Layer` class
| docstring.
|
| __repr__(self)
| Return repr(self).
|
| backward(self, inputs, output, grad, weights, state, new_state, rng)
| Custom backward pass to propagate gradients in a custom way.
|
| Args:
| inputs: Input tensors; can be a (possibly nested) tuple.
| output: The result of running this layer on inputs.
| grad: Gradient signal computed based on subsequent layers; its structure
| and shape must match output.
| weights: This layer's weights.
| state: This layer's state prior to the current forward pass.
| new_state: This layer's state after the current forward pass.
| rng: Single-use random number generator (JAX PRNG key).
|
| Returns:
| The custom gradient signal for the input. Note that we need to return
| a gradient for each argument of forward, so it will usually be a tuple
| of signals: the gradient for inputs and weights.
|
| init(self, input_signature, rng=None, use_cache=False)
| Initializes weights/state of this layer and its sublayers recursively.
|
| Initialization creates layer weights and state, for layers that use them.
| It derives the necessary array shapes and data types from the layer's input
| signature, which is itself just shape and data type information.
|
| For layers without weights or state, this method safely does nothing.
|
| This method is designed to create weights/state only once for each layer
| instance, even if the same layer instance occurs in multiple places in the
| network. This enables weight sharing to be implemented as layer sharing.
|
| Args:
| input_signature: `ShapeDtype` instance (if this layer takes one input)
| or list/tuple of `ShapeDtype` instances.
| rng: Single-use random number generator (JAX PRNG key), or `None`;
| if `None`, use a default computed from an integer 0 seed.
| use_cache: If `True`, and if this layer instance has already been
| initialized elsewhere in the network, then return special marker
| values -- tuple `(GET_WEIGHTS_FROM_CACHE, GET_STATE_FROM_CACHE)`.
| Else return this layer's newly initialized weights and state.
|
| Returns:
| A `(weights, state)` tuple.
|
| init_from_file(self, file_name, weights_only=False, input_signature=None)
| Initializes this layer and its sublayers from a pickled checkpoint.
|
| In the common case (`weights_only=False`), the file must be a gziped pickled
| dictionary containing items with keys `'flat_weights', `'flat_state'` and
| `'input_signature'`, which are used to initialize this layer.
| If `input_signature` is specified, it's used instead of the one in the file.
| If `weights_only` is `True`, the dictionary does not need to have the
| `'flat_state'` item and the state it not restored either.
|
| Args:
| file_name: Name/path of the pickeled weights/state file.
| weights_only: If `True`, initialize only the layer's weights. Else
| initialize both weights and state.
| input_signature: Input signature to be used instead of the one from file.
|
| output_signature(self, input_signature)
| Returns output signature this layer would give for `input_signature`.
|
| pure_fn(self, x, weights, state, rng, use_cache=False)
| Applies this layer as a pure function with no optional args.
|
| This method exposes the layer's computation as a pure function. This is
| especially useful for JIT compilation. Do not override, use `forward`
| instead.
|
| Args:
| x: Zero or more input tensors, packaged as described in the `Layer` class
| docstring.
| weights: A tuple or list of trainable weights, with one element for this
| layer if this layer has no sublayers, or one for each sublayer if
| this layer has sublayers. If a layer (or sublayer) has no trainable
| weights, the corresponding weights element is an empty tuple.
| state: Layer-specific non-parameter state that can update between batches.
| rng: Single-use random number generator (JAX PRNG key).
| use_cache: if `True`, cache weights and state in the layer object; used
| to implement layer sharing in combinators.
|
| Returns:
| A tuple of `(tensors, state)`. The tensors match the number (`n_out`)
| promised by this layer, and are packaged as described in the `Layer`
| class docstring.
|
| weights_and_state_signature(self, input_signature)
| Return a pair containing the signatures of weights and state.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from trax.layers.base.Layer:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| has_backward
| Returns `True` if this layer provides its own custom backward pass code.
|
| A layer subclass that provides custom backward pass code (for custom
| gradients) must override this method to return `True`.
|
| n_in
| Returns how many tensors this layer expects as input.
|
| n_out
| Returns how many tensors this layer promises as output.
|
| name
| Returns the name of this layer.
|
| rng
| Returns a single-use random number generator without advancing it.
|
| state
| Returns a tuple containing this layer's state; may be empty.
|
| sublayers
| Returns a tuple containing this layer's sublayers; may be empty.
|
| weights
| Returns this layer's weights.
|
| Depending on the layer, the weights can be in the form of:
|
| - an empty tuple
| - a tensor (ndarray)
| - a nested structure of tuples and tensors
1 | tmp_embed = tl.Embedding(vocab_size=3, d_feature=2) |
Embedding_3_2
- tl.Mean: Calculates means across an axis. In this case, please choose axis = 1 to get an average embedding vector (an embedding vector that is an average of all words in the vocabulary).
- For example, if the embedding matrix is 300 elements and vocab size is 10,000 words, taking the mean of the embedding matrix along axis=1 will yield a vector of 300 elements.
1 | # view the documentation for tl.mean |
Help on function Mean in module trax.layers.core:
Mean(axis=-1, keepdims=False)
Returns a layer that computes mean values using one tensor axis.
`Mean` uses one tensor axis to form groups of values and replaces each group
with the mean value of that group. The resulting values can either remain
in their own size 1 axis (`keepdims=True`), or that axis can be removed from
the overall tensor (default `keepdims=False`), lowering the rank of the
tensor by one.
Args:
axis: Axis along which values are grouped for computing a mean.
keepdims: If `True`, keep the resulting size 1 axis as a separate tensor
axis; else, remove that axis.
1 | # Pretend the embedding matrix uses |
The mean along axis 0 creates a vector whose length equals the vocabulary size
DeviceArray([2.5, 3.5, 4.5], dtype=float32)
The mean along axis 1 creates a vector whose length equals the number of elements in a word embedding
DeviceArray([2., 5.], dtype=float32)
- tl.LogSoftmax: Implements log softmax function
- Here, you don’t need to set any parameters for
LogSoftMax()
.
1 | help(tl.LogSoftmax) |
Help on function LogSoftmax in module trax.layers.core:
LogSoftmax(axis=-1)
Returns a layer that applies log softmax along one tensor axis.
`LogSoftmax` acts on a group of values and normalizes them to look like a set
of log probability values. (Probability values must be non-negative, and as
a set must sum to 1. A group of log probability values can be seen as the
natural logarithm function applied to a set of probability values.)
Args:
axis: Axis along which values are grouped for computing log softmax.
Online documentation
Exercise 05
Implement the classifier function.
1 | # UNQ_C5 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) |
1 | tmp_model = classifier() |
1 | print(type(tmp_model)) |
<class 'trax.layers.combinators.Serial'>
Serial[
Embedding_9088_256
Mean
Dense_2
LogSoftmax
]
Expected Outout
1 | <class 'trax.layers.combinators.Serial'> |
Part 4: Training
To train a model on a task, Trax defines an abstraction trax.supervised.training.TrainTask
which packages the train data, loss and optimizer (among other things) together into an object.
Similarly to evaluate a model, Trax defines an abstraction trax.supervised.training.EvalTask
which packages the eval data and metrics (among other things) into another object.
The final piece tying things together is the trax.supervised.training.Loop
abstraction that is a very simple and flexible way to put everything together and train the model, all the while evaluating it and saving checkpoints.
Using Loop
will save you a lot of code compared to always writing the training loop by hand, like you did in courses 1 and 2. More importantly, you are less likely to have a bug in that code that would ruin your training.
1 |
|
Help on class TrainTask in module trax.supervised.training:
class TrainTask(builtins.object)
| TrainTask(labeled_data, loss_layer, optimizer, lr_schedule=None, n_steps_per_checkpoint=100)
|
| A supervised task (labeled data + feedback mechanism) for training.
|
| Methods defined here:
|
| __init__(self, labeled_data, loss_layer, optimizer, lr_schedule=None, n_steps_per_checkpoint=100)
| Configures a training task.
|
| Args:
| labeled_data: Iterator of batches of labeled data tuples. Each tuple has
| 1+ data (input value) tensors followed by 1 label (target value)
| tensor. All tensors are NumPy ndarrays or their JAX counterparts.
| loss_layer: Layer that computes a scalar value (the "loss") by comparing
| model output :math:`\hat{y}=f(x)` to the target :math:`y`.
| optimizer: Optimizer object that computes model weight updates from
| loss-function gradients.
| lr_schedule: Learning rate schedule, a function step -> learning_rate.
| n_steps_per_checkpoint: How many steps to run between checkpoints.
|
| learning_rate(self, step)
| Return the learning rate for the given step.
|
| next_batch(self)
| Returns one batch of labeled data: a tuple of input(s) plus label.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| labeled_data
|
| loss_layer
|
| n_steps_per_checkpoint
|
| optimizer
|
| sample_batch
1 | # View documentation for trax.supervised.training.EvalTask |
Help on class EvalTask in module trax.supervised.training:
class EvalTask(builtins.object)
| EvalTask(labeled_data, metrics, metric_names=None, n_eval_batches=1)
|
| Labeled data plus scalar functions for (periodically) measuring a model.
|
| An eval task specifies how (`labeled_data` + `metrics`) and with what
| precision (`n_eval_batches`) to measure a model as it is training.
| The variance of each scalar output is reduced by measuring over multiple
| (`n_eval_batches`) batches and reporting the average from those measurements.
|
| Methods defined here:
|
| __init__(self, labeled_data, metrics, metric_names=None, n_eval_batches=1)
| Configures an eval task: named metrics run with a given data source.
|
| Args:
| labeled_data: Iterator of batches of labeled data tuples. Each tuple has
| 1+ data tensors (NumPy ndarrays) followed by 1 label (target value)
| tensor.
| metrics: List of layers; each computes a scalar value per batch by
| comparing model output :math:`\hat{y}=f(x)` to the target :math:`y`.
| metric_names: List of names, one for each item in `metrics`, in matching
| order, to be used when recording/reporting eval output. If None,
| generate default names using layer names from metrics.
| n_eval_batches: Integer N that specifies how many eval batches to run;
| the output is then the average of the outputs from the N batches.
|
| next_batch(self)
| Returns one batch of labeled data: a tuple of input(s) plus label.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| labeled_data
|
| metric_names
|
| metrics
|
| n_eval_batches
|
| sample_batch
1 | # View documentation for trax.supervised.training.Loop |
Help on class Loop in module trax.supervised.training:
class Loop(builtins.object)
| Loop(model, task, eval_model=None, eval_task=None, output_dir=None, checkpoint_at=None, eval_at=None)
|
| Loop that can run for a given number of steps to train a supervised model.
|
| The typical supervised training process randomly initializes a model and
| updates its weights via feedback (loss-derived gradients) from a training
| task, by looping through batches of labeled data. A training loop can also
| be configured to run periodic evals and save intermediate checkpoints.
|
| For speed, the implementation takes advantage of JAX's composable function
| transformations (specifically, `jit` and `grad`). It creates JIT-compiled
| pure functions derived from variants of the core model; schematically:
|
| - training variant: jit(grad(pure_function(model+loss)))
| - evals variant: jit(pure_function(model+evals))
|
| In training or during evals, these variants are called with explicit
| arguments for all relevant input data, model weights/state, optimizer slots,
| and random number seeds:
|
| - batch: labeled data
| - model weights/state: trainable weights and input-related state (e.g., as
| used by batch norm)
| - optimizer slots: weights in the optimizer that evolve during the training
| process
| - random number seeds: JAX PRNG keys that enable high-quality, distributed,
| repeatable generation of pseudo-random numbers
|
| Methods defined here:
|
| __init__(self, model, task, eval_model=None, eval_task=None, output_dir=None, checkpoint_at=None, eval_at=None)
| Configures a training `Loop`, including a random initialization.
|
| Args:
| model: Trax layer, representing the core model to be trained. Loss
| functions and eval functions (a.k.a. metrics) are considered to be
| outside the core model, taking core model output and data labels as
| their two inputs.
| task: TrainTask instance, which defines the training data, loss function,
| and optimizer to be used in this training loop.
| eval_model: Optional Trax layer, representing model used for evaluation,
| e.g., with dropout turned off. If None, the training model (model)
| will be used.
| eval_task: EvalTask instance or None. If None, don't do any evals.
| output_dir: Path telling where to save outputs (evals and checkpoints).
| Can be None if both `eval_task` and `checkpoint_at` are None.
| checkpoint_at: Function (integer --> boolean) telling, for step n, whether
| that step should have its checkpoint saved. If None, the default is
| periodic checkpointing at `task.n_steps_per_checkpoint`.
| eval_at: Function (integer --> boolean) that says, for training step n,
| whether that step should run evals. If None, run when checkpointing.
|
| new_rng(self)
| Returns a new single-use random number generator (JAX PRNG key).
|
| run(self, n_steps=1)
| Runs this training loop for n steps.
|
| Optionally runs evals and saves checkpoints at specified points.
|
| Args:
| n_steps: Stop training after completing n steps.
|
| run_evals(self, weights=None, state=None)
| Runs and records evals for this training session.
|
| Args:
| weights: Current weights from model in training.
| state: Current state from model in training.
|
| save_checkpoint(self, weights=None, state=None, slots=None)
| Saves checkpoint to disk for the current training step.
|
| Args:
| weights: Weights from model being trained.
| state: State (non-weight parameters) from model being trained.
| slots: Updatable weights for the optimizer in this training loop.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| current_step
| Returns current step number in this training session.
|
| eval_model
| Returns the model used for evaluation.
|
| model
| Returns the model that is training.
1 | # View optimizers that you could choose from |
Help on package trax.optimizers in trax:
NAME
trax.optimizers - Optimizers for use with Trax layers.
PACKAGE CONTENTS
adafactor
adam
base
momentum
optimizers_test
rms_prop
sm3
FUNCTIONS
opt_configure(*args, **kwargs)
FILE
/opt/conda/lib/python3.7/site-packages/trax/optimizers/__init__.py
Notice some available optimizers include:1
2
3
4
5adafactor
adam
momentum
rms_prop
sm3
4.1 Training the model
Now you are going to train your model.
Let’s define the TrainTask
, EvalTask
and Loop
in preparation to train the model.
1 | from trax.supervised import training |
This defines a model trained using tl.CrossEntropyLoss
optimized with the trax.optimizers.Adam
optimizer, all the while tracking the accuracy using tl.Accuracy
metric. We also track tl.CrossEntropyLoss
on the validation set.
Now let’s make an output directory and train the model.
1 | output_dir = '~/model/' |
/home/jovyan/model/
Exercise 06
Instructions: Implement train_model
to train the model (classifier
that you wrote earlier) for the given number of training steps (n_steps
) using TrainTask
, EvalTask
and Loop
.
1 | # UNQ_C6 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) |
1 | training_loop = train_model(model, train_task, eval_task, 100, output_dir_expand) |
Step 1: train CrossEntropyLoss | 0.88939196
Step 1: eval CrossEntropyLoss | 0.68833977
Step 1: eval Accuracy | 0.50000000
Step 10: train CrossEntropyLoss | 0.61036736
Step 10: eval CrossEntropyLoss | 0.52182281
Step 10: eval Accuracy | 0.68750000
Step 20: train CrossEntropyLoss | 0.34137666
Step 20: eval CrossEntropyLoss | 0.20654774
Step 20: eval Accuracy | 1.00000000
Step 30: train CrossEntropyLoss | 0.20208922
Step 30: eval CrossEntropyLoss | 0.21594886
Step 30: eval Accuracy | 0.93750000
Step 40: train CrossEntropyLoss | 0.19611198
Step 40: eval CrossEntropyLoss | 0.17582777
Step 40: eval Accuracy | 1.00000000
Step 50: train CrossEntropyLoss | 0.11203773
Step 50: eval CrossEntropyLoss | 0.07589275
Step 50: eval Accuracy | 1.00000000
Step 60: train CrossEntropyLoss | 0.09375446
Step 60: eval CrossEntropyLoss | 0.09290724
Step 60: eval Accuracy | 1.00000000
Step 70: train CrossEntropyLoss | 0.08785903
Step 70: eval CrossEntropyLoss | 0.09610598
Step 70: eval Accuracy | 1.00000000
Step 80: train CrossEntropyLoss | 0.08858261
Step 80: eval CrossEntropyLoss | 0.02319432
Step 80: eval Accuracy | 1.00000000
Step 90: train CrossEntropyLoss | 0.05699894
Step 90: eval CrossEntropyLoss | 0.01778970
Step 90: eval Accuracy | 1.00000000
Step 100: train CrossEntropyLoss | 0.03663783
Step 100: eval CrossEntropyLoss | 0.00210550
Step 100: eval Accuracy | 1.00000000
Expected output (Approximately)
1 | Step 1: train CrossEntropyLoss | 0.88939196 |
4.2 Practice Making a prediction
Now that you have trained a model, you can access it as training_loop.model
object. We will actually use training_loop.eval_model
and in the next weeks you will learn why we sometimes use a different model for evaluation, e.g., one without dropout. For now, make predictions with your model.
Use the training data just to see how the prediction process works.
- Later, you will use validation data to evaluate your model’s performance.
1 | # Create a generator object |
The batch is a tuple of length 3 because position 0 contains the tweets, and position 1 contains the targets.
The shape of the tweet tensors is (16, 15) (num of examples, length of tweet tensors)
The shape of the labels is (16,), which is the batch size.
The shape of the example_weights is (16,), which is the same as inputs/targets size.
1 | # feed the tweet tensors into the model to get a prediction |
The prediction shape is (16, 2), num of tensor_tweets as rows
Column 0 is the probability of a negative sentiment (class 0)
Column 1 is the probability of a positive sentiment (class 1)
View the prediction array
DeviceArray([[-4.9417334e+00, -7.1678162e-03],
[-6.5846415e+00, -1.3823509e-03],
[-5.4463043e+00, -4.3215752e-03],
[-4.3487482e+00, -1.3007164e-02],
[-4.9131694e+00, -7.3764324e-03],
[-4.7097692e+00, -9.0477467e-03],
[-5.2801600e+00, -5.1045418e-03],
[-4.1103225e+00, -1.6538620e-02],
[-1.8327236e-03, -6.3028107e+00],
[-4.7376156e-03, -5.3545618e+00],
[-3.4697056e-03, -5.6654320e+00],
[-1.1444092e-05, -1.1379558e+01],
[-1.0051131e-02, -4.6050973e+00],
[-1.0130405e-03, -6.8951964e+00],
[-6.1047077e-03, -5.1017356e+00],
[-7.4422359e-03, -4.9043016e+00]], dtype=float32)
To turn these probabilities into categories (negative or positive sentiment prediction), for each row:
- Compare the probabilities in each column.
- If column 1 has a value greater than column 0, classify that as a positive tweet.
- Otherwise if column 1 is less than or equal to column 0, classify that example as a negative tweet.
1 | # turn probabilites into category predictions |
Neg log prob -4.9417 Pos log prob -0.0072 is positive? True actual 1
Neg log prob -6.5846 Pos log prob -0.0014 is positive? True actual 1
Neg log prob -5.4463 Pos log prob -0.0043 is positive? True actual 1
Neg log prob -4.3487 Pos log prob -0.0130 is positive? True actual 1
Neg log prob -4.9132 Pos log prob -0.0074 is positive? True actual 1
Neg log prob -4.7098 Pos log prob -0.0090 is positive? True actual 1
Neg log prob -5.2802 Pos log prob -0.0051 is positive? True actual 1
Neg log prob -4.1103 Pos log prob -0.0165 is positive? True actual 1
Neg log prob -0.0018 Pos log prob -6.3028 is positive? False actual 0
Neg log prob -0.0047 Pos log prob -5.3546 is positive? False actual 0
Neg log prob -0.0035 Pos log prob -5.6654 is positive? False actual 0
Neg log prob -0.0000 Pos log prob -11.3796 is positive? False actual 0
Neg log prob -0.0101 Pos log prob -4.6051 is positive? False actual 0
Neg log prob -0.0010 Pos log prob -6.8952 is positive? False actual 0
Neg log prob -0.0061 Pos log prob -5.1017 is positive? False actual 0
Neg log prob -0.0074 Pos log prob -4.9043 is positive? False actual 0
Notice that since you are making a prediction using a training batch, it’s more likely that the model’s predictions match the actual targets (labels).
- Every prediction that the tweet is positive is also matching the actual target of 1 (positive sentiment).
- Similarly, all predictions that the sentiment is not positive matches the actual target of 0 (negative sentiment)
One more useful thing to know is how to compare if the prediction is matching the actual target (label).
- The result of calculation
is_positive
is a boolean. - The target is a type trax.fastmath.numpy.int32
- If you expect to be doing division, you may prefer to work with decimal numbers with the data type type trax.fastmath.numpy.int32
1 | # View the array of booleans |
Array of booleans
DeviceArray([ True, True, True, True, True, True, True, True,
False, False, False, False, False, False, False, False], dtype=bool)
Array of integers
DeviceArray([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
Array of floats
DeviceArray([1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.,
0.], dtype=float32)
1 | tmp_pred.shape |
(16, 2)
Note that Python usually does type conversion for you when you compare a boolean to an integer
- True compared to 1 is True, otherwise any other integer is False.
- False compared to 0 is True, otherwise any ohter integer is False.
1 | print(f"True == 1: {True == 1}") |
True == 1: True
True == 2: False
False == 0: True
False == 2: False
However, we recommend that you keep track of the data type of your variables to avoid unexpected outcomes. So it helps to convert the booleans into integers
- Compare 1 to 1 rather than comparing True to 1.
Hopefully you are now familiar with what kinds of inputs and outputs the model uses when making a prediction.
- This will help you implement a function that estimates the accuracy of the model’s predictions.
Part 5: Evaluation
5.1 Computing the accuracy on a batch
You will now write a function that evaluates your model on the validation set and returns the accuracy.
preds
contains the predictions.- Its dimensions are
(batch_size, output_dim)
.output_dim
is two in this case. Column 0 contains the probability that the tweet belongs to class 0 (negative sentiment). Column 1 contains probability that it belongs to class 1 (positive sentiment). - If the probability in column 1 is greater than the probability in column 0, then interpret this as the model’s prediction that the example has label 1 (positive sentiment).
- Otherwise, if the probabilities are equal or the probability in column 0 is higher, the model’s prediction is 0 (negative sentiment).
- Its dimensions are
y
contains the actual labels.y_weights
contains the weights to give to predictions.
Exercise 07
Implement compute_accuracy
.
1 | # UNQ_C7 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) |
1 | # test your function |
Model's prediction accuracy on a single training batch is: 100.0%
Weighted number of correct predictions 64.0; weighted number of total observations predicted 64
Expected output (Approximately)
1 | Model's prediction accuracy on a single training batch is: 100.0% |
5.2 Testing your model on Validation Data
Now you will write test your model’s prediction accuracy on validation data.
This program will take in a data generator and your model.
- The generator allows you to get batches of data. You can use it with a
for
loop:
1 | for batch in iterator: |
batch
has dimensions (X, Y, weights)
.
- Column 0 corresponds to the tweet as a tensor (input).
- Column 1 corresponds to its target (actual label, positive or negative sentiment).
- Column 2 corresponds to the weights associated (example weights)
- You can feed the tweet into model and it will return the predictions for the batch.
Exercise 08
Instructions:
- Compute the accuracy over all the batches in the validation iterator.
- Make use of
compute_accuracy
, which you recently implemented, and return the overall accuracy.
1 | # UNQ_C8 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) |
1 | # DO NOT EDIT THIS CELL |
The accuracy of your model on the validation set is 0.9931
Expected Output (Approximately)
1 | The accuracy of your model on the validation set is 0.9931 |
Part 6: Testing with your own input
Finally you will test with your own input. You will see that deepnets are more powerful than the older methods you have used before. Although you go close to 100% accuracy on the first two assignments, the task was way easier.
1 | # this is used to predict on your own sentnece |
1 | # try a positive sentence |
The sentiment of the sentence
***
"It's such a nice day, think i'll be taking Sid to Ramsgate fish and chips for lunch at Peter's fish factory and then the beach maybe"
***
is positive.
The sentiment of the sentence
***
"I hated my day, it was the worst, I'm so sad."
***
is negative.
Notice that the model works well even for complex sentences.
On Deep Nets
Deep nets allow you to understand and capture dependencies that you would have not been able to capture with a simple linear regression, or logistic regression.
- It also allows you to better use pre-trained embeddings for classification and tends to generalize better.