Evaluating GANs

Goals

In this notebook, you’re going to gain a better understanding of some of the challenges that come with evaluating GANs and a response you can take to alleviate some of them called Fréchet Inception Distance (FID).

Learning Objectives

1. Understand the challenges associated with evaluating GANs.
2. Write code to evaluate the Fréchet Inception Distance.

Challenges With Evaluating GANs

Loss is Uninformative of Performance

One aspect that makes evaluating GANs challenging is that the loss tells us little about their performance. Unlike with classifiers, where a low loss on a test set indicates superior performance, a low loss for the generator or discriminator suggests that learning has stopped.

No Clear Non-human Metric

If you define the goal of a GAN as “generating images which look real to people” then it’s technically possible to measure this directly: you can ask people to act as a discriminator. However, this takes significant time and money so ideally you can use a proxy for this. There is also no “perfect” discriminator that can differentiate reals from fakes - if there were, a lot of machine learning tasks would be solved ;)

In this notebook, you will implement Fréchet Inception Distance, one method which aims to solve these issues.

Getting Started

For this notebook, you will again be using CelebA. You will start by loading a pre-trained generator which has been trained on CelebA.

Here, you will import some useful libraries and packages. You will also be provided with the generator and noise code from earlier assignments.

Loading the Pre-trained Model

Now, you can set the arguments for the model and load the dataset:

• z_dim: the dimension of the noise vector
• image_size: the image size of the input to Inception (more details in the following section)
• device: the device type

Then, you can load and initialize the model with weights from a pre-trained model. This allows you to use the pre-trained model as if you trained it yourself.

Inception-v3 Network

Inception-V3 is a neural network trained on ImageNet to classify objects. You may recall from the lectures that ImageNet has over 1 million images to train on. As a result, Inception-V3 does a good job detecting features and classifying images. Here, you will load Inception-V3 as inception_model.

Fréchet Inception Distance

Fréchet Inception Distance (FID) was proposed as an improvement over Inception Score and still uses the Inception-v3 network as part of its calculation. However, instead of using the classification labels of the Inception-v3 network, it uses the output from an earlier layer—the layer right before the labels. This is often called the feature layer. Research has shown that deep convolutional neural networks trained on difficult tasks, like classifying many classes, build increasingly sophisticated representations of features going deeper into the network. For example, the first few layers may learn to detect different kinds of edges and curves, while the later layers may have neurons that fire in response to human faces.

To get the feature layer of a convolutional neural network, you can replace the final fully connected layer with an identity layer that simply returns whatever input it received, unchanged. This essentially removes the final classification layer and leaves you with the intermediate outputs from the layer before.

Optional hint for inception_model.fc 1. You may find [torch.nn.Identity()](https://pytorch.org/docs/master/generated/torch.nn.Identity.html) helpful.
Success!


Fréchet Distance

Fréchet distance uses the values from the feature layer for two sets of images, say reals and fakes, and compares different statistical properties between them to see how different they are. Specifically, Fréchet distance finds the shortest distance needed to walk along two lines, or two curves, simultaneously. The most intuitive explanation of Fréchet distance is as the “minimum leash distance” between two points. Imagine yourself and your dog, both moving along two curves. If you walked on one curve and your dog, attached to a leash, walked on the other at the same pace, what is the least amount of leash that you can give your dog so that you never need to give them more slack during your walk? Using this, the Fréchet distance measures the similarity between these two curves.

The basic idea is similar for calculating the Fréchet distance between two probability distributions. You’ll start by seeing what this looks like in one-dimensional, also called univariate, space.

Univariate Fréchet Distance

You can calculate the distance between two normal distributions $X$ and $Y$ with means $\mu_X$ and $\mu_Y$ and standard deviations $\sigma_X$ and $\sigma_Y$, as:

Pretty simple, right? Now you can see how it can be converted to be used in multi-dimensional, which is also called multivariate, space.

Multivariate Fréchet Distance

Covariance

To find the Fréchet distance between two multivariate normal distributions, you first need to find the covariance instead of the standard deviation. The covariance, which is the multivariate version of variance (the square of standard deviation), is represented using a square matrix where the side length is equal to the number of dimensions. Since the feature vectors you will be using have 2048 values/weights, the covariance matrix will be 2048 x 2048. But for the sake of an example, this is a covariance matrix in a two-dimensional space:

$\Sigma = \left(\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right)$

The value at location $(i, j)$ corresponds to the covariance of vector $i$ with vector $j$. Since the covariance of $i$ with $j$ and $j$ with $i$ are equivalent, the matrix will always be symmetric with respect to the diagonal. The diagonal is the covariance of that element with itself. In this example, there are zeros everywhere except the diagonal. That means that the two dimensions are independent of one another, they are completely unrelated.

The following code cell will visualize this matrix.

Now, here’s an example of a multivariate normal distribution that has covariance:

$\Sigma = \left(\begin{array}{cc} 2 & -1\\ -1 & 2 \end{array}\right)$

And see how it looks:

Formula

Based on the paper, “The Fréchet distance between multivariate normal distributions“ by Dowson and Landau (1982), the Fréchet distance between two multivariate normal distributions $X$ and $Y$ is:

$d(X, Y) = \Vert\mu_X-\mu_Y\Vert^2 + \mathrm{Tr}\left(\Sigma_X+\Sigma_Y - 2 \sqrt{\Sigma_X \Sigma_Y}\right)$

Similar to the formula for univariate Fréchet distance, you can calculate the distance between the means and the distance between the standard deviations. However, calculating the distance between the standard deviations changes slightly here, as it includes the matrix product and matrix square root. $\mathrm{Tr}$ refers to the trace, the sum of the diagonal elements of a matrix.

Now you can implement this!

Optional hints for frechet_distance 1. You want to implement the above equation in code. 2. You might find the functions torch.norm and torch.trace helpful here. 3. A matrix_sqrt function is defined for you above -- you need to use it instead of torch.sqrt() which only gets the elementwise square root instead of the matrix square root. 4. You can also use the @ symbol for matrix multiplication.
Success!


Putting it all together!

Now, you can apply FID to your generator from earlier.

You will start by defining a bit of helper code to preprocess the image for the Inception-v3 network:

Then, you’ll define a function to calculate the covariance of the features that returns a covariance matrix given a list of values:

Finally, you can use the pre-trained Inception-v3 model to compute features of the real and fake images. With these features, you can then get the covariance and means of these features across many samples.

First, you get the features of the real and fake images using the Inception-v3 model:

HBox(children=(FloatProgress(value=0.0, max=128.0), HTML(value='')))


Then, you can combine all of the values that you collected for the reals and fakes into large tensors:

And calculate the covariance and means of these real and fake features:

Success!


At this point, you can also visualize what the pairwise multivariate distributions of the inception features look like!

<seaborn.axisgrid.PairGrid at 0x7fa847b2ab38>


Lastly, you can use your earlier frechet_distance function to calculate the FID and evaluate your GAN. You can see how similar/different the features of the generated images are to the features of the real images. The next cell might take five minutes or so to run in Coursera.

86.48429107666016


You’ll notice this model gets a pretty high FID, likely over 30. Since lower is better, and the best models on CelebA get scores in the single-digits, there’s clearly a ways to go with this model. You can use FID to compare different models, as well as different stages of training of the same model.

Donate article here