In this notebook, you’re going to implement a U-Net for a biomedical imaging segmentation task. Specifically, you’re going to be labeling neurons, so one might call this a neural neural network! ;)
Note that this is not a GAN, generative model, or unsupervised learning task. This is a supervised learning task, so there’s only one correct answer (like a classifier!) You will see how this component underlies the Generator component of Pix2Pix in the next notebook this week.
- Implement your own U-Net.
- Observe your U-Net’s performance on a challenging segmentation task.
You will start by importing libraries, defining a visualization function, and getting the neural dataset that you will be using.
For this notebook, you will be using a dataset of electron microscopy
images and segmentation data. The information about the dataset you’ll be using can be found here!
Arganda-Carreras et al. “Crowdsourcing the creation of image
segmentation algorithms for connectomics”. Front. Neuroanat. 2015. https://www.frontiersin.org/articles/10.3389/fnana.2015.00142/full
Now you can build your U-Net from its components. The figure below is from the paper, U-Net: Convolutional Networks for Biomedical Image Segmentation, by Ronneberger et al. 2015. It shows the U-Net architecture and how it contracts and then expands.
In other words, images are first fed through many convolutional layers which reduce height and width while increasing the channels, which the authors refer to as the “contracting path.” For example, a set of two 2 x 2 convolutions with a stride of 2, will take a 1 x 28 x 28 (channels, height, width) grayscale image and result in a 2 x 14 x 14 representation. The “expanding path” does the opposite, gradually growing the image with fewer and fewer channels.
You will first implement the contracting blocks for the contracting path. This path is the encoder section of the U-Net, which has several downsampling steps as part of it. The authors give more detail of the remaining parts in the following paragraph from the paper (Renneberger, 2015):
The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3 x 3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2 x 2 max pooling operation with stride 2 for downsampling. At each downsampling step we double the number of feature channels.
Optional hints for
1. Both convolutions should use 3 x 3 kernels.
2. The max pool should use a 2 x 2 kernel with a stride 2.
# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
Next, you will implement the expanding blocks for the expanding path. This is the decoding section of U-Net which has several upsampling steps as part of it. In order to do this, you’ll also need to write a crop function. This is so you can crop the image from the contracting path and concatenate it to the current image on the expanding path—this is to form a skip connection. Again, the details are from the paper (Renneberger, 2015):
Every step in the expanding path consists of an upsampling of the feature map followed by a 2 x 2 convolution (“up-convolution”) that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3 x 3 convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution.
Fun fact: later models based on this architecture often use padding in the convolutions to prevent the size of the image from changing outside of the upsampling / downsampling steps!
Optional hint for
1. The concatenation means the number of channels goes back to being input_channels, so you need to halve it again for the next convolution.
# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
Now you will write the final feature mapping block, which takes in a tensor with arbitrarily many tensors and produces a tensor with the same number of pixels but with the correct number of output channels. From the paper (Renneberger, 2015):
At the final layer a 1x1 convolution is used to map each 64-component feature vector to the desired number of classes. In total the network has 23 convolutional layers.
# UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# UNIT TEST
Now you can put it all together! Here, you’ll write a
UNet class which will combine a series of the three kinds of blocks you’ve implemented.
# UNQ_C5 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
Finally, you will put this into action!
Remember that these are your parameters:
- criterion: the loss function
- n_epochs: the number of times you iterate through the entire dataset when training
- input_dim: the number of channels of the input image
- label_dim: the number of channels of the output image
- display_step: how often to display/visualize the images
- batch_size: the number of images per forward/backward pass
- lr: the learning rate
- initial_shape: the size of the input image (in pixels)
- target_shape: the size of the output image (in pixels)
- device: the device type
This should take only a few minutes to train!
import torch.nn.functional as F
from skimage import io