0%

Generative Adversarial Network Project

Generating human faces with Adversarial Networks

© research.nvidia.com

This time we'll train a neural net to generate plausible human faces in all their subtlty: appearance, expression, accessories, etc. 'Cuz when us machines gonna take over Earth, there won't be any more faces left. We want to preserve this data for future iterations. Yikes...

Based on https://github.com/Lasagne/Recipes/pull/94 .

1
2
3
4
5
import sys
sys.path.append("..")
import grading
import download_utils
import tqdm_utils
1
download_utils.link_week_4_resources()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
plt.rcParams.update({'axes.titlesize': 'small'})

from sklearn.datasets import load_digits
#The following line fetches you two datasets: images, usable for autoencoder training and attributes.
#Those attributes will be required for the final part of the assignment (applying smiles), so please keep them in mind
from lfw_dataset import load_lfw_dataset
data,attrs = load_lfw_dataset(dimx=36,dimy=36)

#preprocess faces
data = np.float32(data)/255.

IMG_SHAPE = data.shape[1:]
HBox(children=(IntProgress(value=0, max=13233), HTML(value='')))
1
2
#print random image
plt.imshow(data[np.random.randint(data.shape[0])], cmap="gray", interpolation="none")
<matplotlib.image.AxesImage at 0x11dfdf358>
png

Generative adversarial nets 101

© torch.github.io

Deep learning is simple, isn't it? * build some network that generates the face (small image) * make up a measure of how good that face is * optimize with gradient descent :)

The only problem is: how can we engineers tell well-generated faces from bad? And i bet you we won't ask a designer for help.

If we can't tell good faces from bad, we delegate it to yet another neural network!

That makes the two of them: * __G__enerator - takes random noize for inspiration and tries to generate a face sample. * Let's call him G(z), where z is a gaussian noize. * __D__iscriminator - takes a face sample and tries to tell if it's great or fake. * Predicts the probability of input image being a real face * Let's call him D(x), x being an image. * D(x) is a predition for real image and D(G(z)) is prediction for the face made by generator.

Before we dive into training them, let's construct the two networks.

1
2
3
4
5
6
7
import tensorflow as tf
from keras_utils import reset_tf_session
s = reset_tf_session()

import keras
from keras.models import Sequential
from keras import layers as L
Using TensorFlow backend.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CODE_SIZE = 256

generator = Sequential()
generator.add(L.InputLayer([CODE_SIZE],name='noise'))
generator.add(L.Dense(10*8*8, activation='elu'))

generator.add(L.Reshape((8,8,10)))
generator.add(L.Deconv2D(64,kernel_size=(5,5),activation='elu'))
generator.add(L.Deconv2D(64,kernel_size=(5,5),activation='elu'))
generator.add(L.UpSampling2D(size=(2,2)))
generator.add(L.Deconv2D(32,kernel_size=3,activation='elu'))
generator.add(L.Deconv2D(32,kernel_size=3,activation='elu'))
generator.add(L.Deconv2D(32,kernel_size=3,activation='elu'))

generator.add(L.Conv2D(3,kernel_size=3,activation=None))

1
generator.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
noise (InputLayer)           (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 640)               164480    
_________________________________________________________________
reshape_1 (Reshape)          (None, 8, 8, 10)          0         
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 12, 12, 64)        16064     
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 16, 16, 64)        102464    
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 32, 32, 64)        0         
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 34, 34, 32)        18464     
_________________________________________________________________
conv2d_transpose_4 (Conv2DTr (None, 36, 36, 32)        9248      
_________________________________________________________________
conv2d_transpose_5 (Conv2DTr (None, 38, 38, 32)        9248      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 36, 36, 3)         867       
=================================================================
Total params: 320,835
Trainable params: 320,835
Non-trainable params: 0
_________________________________________________________________
1
assert generator.output_shape[1:] == IMG_SHAPE, "generator must output an image of shape %s, but instead it produces %s"%(IMG_SHAPE,generator.output_shape[1:])

Discriminator

  • Discriminator is your usual convolutional network with interlooping convolution and pooling layers
  • The network does not include dropout/batchnorm to avoid learning complications.
  • We also regularize the pre-output layer to prevent discriminator from being too certain.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
discriminator = Sequential()

discriminator.add(L.InputLayer(IMG_SHAPE))

# <build discriminator body>
discriminator.add(L.Conv2D(32, kernel_size=3, activation='elu'))
discriminator.add(L.MaxPooling2D(pool_size=(2, 2)))
discriminator.add(L.Conv2D(64, kernel_size=3, activation='elu'))
discriminator.add(L.MaxPooling2D(pool_size=(2, 2)))
discriminator.add(L.Conv2D(128, kernel_size=3, activation='elu'))
discriminator.add(L.MaxPooling2D(pool_size=(2, 2)))

discriminator.add(L.Flatten())
discriminator.add(L.Dense(256,activation='tanh'))
discriminator.add(L.Dense(2,activation=tf.nn.log_softmax))

Training

We train the two networks concurrently: * Train discriminator to better distinguish real data from current generator * Train generator to make discriminator think generator is real * Since discriminator is a differentiable neural network, we train both with gradient descent.

© deeplearning4j.org

Training is done iteratively until discriminator is no longer able to find the difference (or until you run out of patience).

Tricks:

  • Regularize discriminator output weights to prevent explosion
  • Train generator with adam to speed up training. Discriminator trains with SGD to avoid problems with momentum.
  • More: https://github.com/soumith/ganhacks
1
2
3
4
5
6
7
8
noise = tf.placeholder('float32',[None,CODE_SIZE])
real_data = tf.placeholder('float32',[None,]+list(IMG_SHAPE))

logp_real = discriminator(real_data)

generated_data = generator(noise) #<gen(noise)>

logp_gen = discriminator(generated_data) #<log P(real | gen(noise))
1
2
3
4
5
6
7
8
9
10
11
########################
#discriminator training#
########################

d_loss = -tf.reduce_mean(logp_real[:,1] + logp_gen[:,0])

#regularize
d_loss += tf.reduce_mean(discriminator.layers[-1].kernel**2)

#optimize
disc_optimizer = tf.train.GradientDescentOptimizer(1e-3).minimize(d_loss,var_list=discriminator.trainable_weights)
1
2
3
4
5
6
7
########################
###generator training###
########################

g_loss = -tf.reduce_mean(logp_gen[:,1]) # <generator loss>

gen_optimizer = tf.train.AdamOptimizer(1e-4).minimize(g_loss,var_list=generator.trainable_weights)
1
s.run(tf.global_variables_initializer())

Auxiliary functions

Here we define a few helper functions that draw current data distributions and sample training batches.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def sample_noise_batch(bsize):
return np.random.normal(size=(bsize, CODE_SIZE)).astype('float32')

def sample_data_batch(bsize):
idxs = np.random.choice(np.arange(data.shape[0]), size=bsize)
return data[idxs]

def sample_images(nrow,ncol, sharp=False):
images = generator.predict(sample_noise_batch(bsize=nrow*ncol))
if np.var(images)!=0:
images = images.clip(np.min(data),np.max(data))
for i in range(nrow*ncol):
plt.subplot(nrow,ncol,i+1)
if sharp:
plt.imshow(images[i].reshape(IMG_SHAPE),cmap="gray", interpolation="none")
else:
plt.imshow(images[i].reshape(IMG_SHAPE),cmap="gray")
plt.show()

def sample_probas(bsize):
plt.title('Generated vs real data')
plt.hist(np.exp(discriminator.predict(sample_data_batch(bsize)))[:,1],
label='D(x)', alpha=0.5,range=[0,1])
plt.hist(np.exp(discriminator.predict(generator.predict(sample_noise_batch(bsize))))[:,1],
label='D(G(z))',alpha=0.5,range=[0,1])
plt.legend(loc='best')
plt.show()

Training

Main loop. We just train generator and discriminator in a loop and plot results once every N iterations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from IPython import display

for epoch in tqdm_utils.tqdm_notebook_failsafe(range(50000)):

feed_dict = {
real_data:sample_data_batch(100),
noise:sample_noise_batch(100)
}

for i in range(5):
s.run(disc_optimizer,feed_dict)

s.run(gen_optimizer,feed_dict)

if epoch %100==0:
display.clear_output(wait=True)
sample_images(2,3,True)
sample_probas(1000)

png
png
1
2
from submit_honor import submit_honor
submit_honor((generator, discriminator), <YOUR_EMAIL>, <YOUR_TOKEN>)
1
2
3
4
#The network was trained for about 15k iterations. 
#Training for longer yields MUCH better results
plt.figure(figsize=[16,24])
sample_images(16,8)
png
1