Generating human faces with Adversarial Networks

© research.nvidia.com

This time we'll train a neural net to generate plausible human faces in all their subtlty: appearance, expression, accessories, etc. 'Cuz when us machines gonna take over Earth, there won't be any more faces left. We want to preserve this data for future iterations. Yikes...

Based on https://github.com/Lasagne/Recipes/pull/94 .

import sys
sys.path.append("..")
import grading
import download_utils
import tqdm_utils

1	download_utils.link_week_4_resources()

import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
plt.rcParams.update({'axes.titlesize': 'small'})

from sklearn.datasets import load_digits
#The following line fetches you two datasets: images, usable for autoencoder training and attributes.
#Those attributes will be required for the final part of the assignment (applying smiles), so please keep them in mind
from lfw_dataset import load_lfw_dataset 
data,attrs = load_lfw_dataset(dimx=36,dimy=36)

#preprocess faces
data = np.float32(data)/255.

IMG_SHAPE = data.shape[1:]

HBox(children=(IntProgress(value=0, max=13233), HTML(value='')))

1 2	#print random image plt.imshow(data[np.random.randint(data.shape[0])], cmap="gray", interpolation="none")

<matplotlib.image.AxesImage at 0x11dfdf358>

Generative adversarial nets 101

© torch.github.io

Deep learning is simple, isn't it? * build some network that generates the face (small image) * make up a measure of how good that face is * optimize with gradient descent :)

The only problem is: how can we engineers tell well-generated faces from bad? And i bet you we won't ask a designer for help.

If we can't tell good faces from bad, we delegate it to yet another neural network!

That makes the two of them: * __G__enerator - takes random noize for inspiration and tries to generate a face sample. * Let's call him G(z), where z is a gaussian noize. * __D__iscriminator - takes a face sample and tries to tell if it's great or fake. * Predicts the probability of input image being a real face * Let's call him D(x), x being an image. * D(x) is a predition for real image and D(G(z)) is prediction for the face made by generator.

Before we dive into training them, let's construct the two networks.

import tensorflow as tf
from keras_utils import reset_tf_session
s = reset_tf_session()

import keras
from keras.models import Sequential
from keras import layers as L

Using TensorFlow backend.

CODE_SIZE = 256

generator = Sequential()
generator.add(L.InputLayer([CODE_SIZE],name='noise'))
generator.add(L.Dense(10*8*8, activation='elu'))

generator.add(L.Reshape((8,8,10)))
generator.add(L.Deconv2D(64,kernel_size=(5,5),activation='elu'))
generator.add(L.Deconv2D(64,kernel_size=(5,5),activation='elu'))
generator.add(L.UpSampling2D(size=(2,2)))
generator.add(L.Deconv2D(32,kernel_size=3,activation='elu'))
generator.add(L.Deconv2D(32,kernel_size=3,activation='elu'))
generator.add(L.Deconv2D(32,kernel_size=3,activation='elu'))

generator.add(L.Conv2D(3,kernel_size=3,activation=None))

1	generator.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
noise (InputLayer)           (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 640)               164480    
_________________________________________________________________
reshape_1 (Reshape)          (None, 8, 8, 10)          0         
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 12, 12, 64)        16064     
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 16, 16, 64)        102464    
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 32, 32, 64)        0         
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 34, 34, 32)        18464     
_________________________________________________________________
conv2d_transpose_4 (Conv2DTr (None, 36, 36, 32)        9248      
_________________________________________________________________
conv2d_transpose_5 (Conv2DTr (None, 38, 38, 32)        9248      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 36, 36, 3)         867       
=================================================================
Total params: 320,835
Trainable params: 320,835
Non-trainable params: 0
_________________________________________________________________

1	assert generator.output_shape[1:] == IMG_SHAPE, "generator must output an image of shape %s, but instead it produces %s"%(IMG_SHAPE,generator.output_shape[1:])

Discriminator

Discriminator is your usual convolutional network with interlooping convolution and pooling layers
The network does not include dropout/batchnorm to avoid learning complications.
We also regularize the pre-output layer to prevent discriminator from being too certain.

discriminator = Sequential()

discriminator.add(L.InputLayer(IMG_SHAPE))

# <build discriminator body>
discriminator.add(L.Conv2D(32, kernel_size=3, activation='elu'))
discriminator.add(L.MaxPooling2D(pool_size=(2, 2)))
discriminator.add(L.Conv2D(64, kernel_size=3, activation='elu'))
discriminator.add(L.MaxPooling2D(pool_size=(2, 2)))
discriminator.add(L.Conv2D(128, kernel_size=3, activation='elu'))
discriminator.add(L.MaxPooling2D(pool_size=(2, 2)))

discriminator.add(L.Flatten())
discriminator.add(L.Dense(256,activation='tanh'))
discriminator.add(L.Dense(2,activation=tf.nn.log_softmax))

Training

We train the two networks concurrently: * Train discriminator to better distinguish real data from current generator * Train generator to make discriminator think generator is real * Since discriminator is a differentiable neural network, we train both with gradient descent.

© deeplearning4j.org

Training is done iteratively until discriminator is no longer able to find the difference (or until you run out of patience).

Tricks:

Regularize discriminator output weights to prevent explosion
Train generator with adam to speed up training. Discriminator trains with SGD to avoid problems with momentum.
More: https://github.com/soumith/ganhacks

noise = tf.placeholder('float32',[None,CODE_SIZE])
real_data = tf.placeholder('float32',[None,]+list(IMG_SHAPE))

logp_real = discriminator(real_data)

generated_data = generator(noise) #<gen(noise)>

logp_gen = discriminator(generated_data) #<log P(real | gen(noise))

########################
#discriminator training#
########################

d_loss = -tf.reduce_mean(logp_real[:,1] + logp_gen[:,0])

#regularize
d_loss += tf.reduce_mean(discriminator.layers[-1].kernel**2)

#optimize
disc_optimizer =  tf.train.GradientDescentOptimizer(1e-3).minimize(d_loss,var_list=discriminator.trainable_weights)

########################
###generator training###
########################

g_loss = -tf.reduce_mean(logp_gen[:,1]) # <generator loss>

gen_optimizer = tf.train.AdamOptimizer(1e-4).minimize(g_loss,var_list=generator.trainable_weights)

1	s.run(tf.global_variables_initializer())

Auxiliary functions

Here we define a few helper functions that draw current data distributions and sample training batches.

def sample_noise_batch(bsize):
    return np.random.normal(size=(bsize, CODE_SIZE)).astype('float32')

def sample_data_batch(bsize):
    idxs = np.random.choice(np.arange(data.shape[0]), size=bsize)
    return data[idxs]

def sample_images(nrow,ncol, sharp=False):
    images = generator.predict(sample_noise_batch(bsize=nrow*ncol))
    if np.var(images)!=0:
        images = images.clip(np.min(data),np.max(data))
    for i in range(nrow*ncol):
        plt.subplot(nrow,ncol,i+1)
        if sharp:
            plt.imshow(images[i].reshape(IMG_SHAPE),cmap="gray", interpolation="none")
        else:
            plt.imshow(images[i].reshape(IMG_SHAPE),cmap="gray")
    plt.show()

def sample_probas(bsize):
    plt.title('Generated vs real data')
    plt.hist(np.exp(discriminator.predict(sample_data_batch(bsize)))[:,1],
             label='D(x)', alpha=0.5,range=[0,1])
    plt.hist(np.exp(discriminator.predict(generator.predict(sample_noise_batch(bsize))))[:,1],
             label='D(G(z))',alpha=0.5,range=[0,1])
    plt.legend(loc='best')
    plt.show()

Training

Main loop. We just train generator and discriminator in a loop and plot results once every N iterations.

from IPython import display

for epoch in tqdm_utils.tqdm_notebook_failsafe(range(50000)):
    
    feed_dict = {
        real_data:sample_data_batch(100),
        noise:sample_noise_batch(100)
    }
    
    for i in range(5):
        s.run(disc_optimizer,feed_dict)
    
    s.run(gen_optimizer,feed_dict)
    
    if epoch %100==0:
        display.clear_output(wait=True)
        sample_images(2,3,True)
        sample_probas(1000)

1 2	from submit_honor import submit_honor submit_honor((generator, discriminator), <YOUR_EMAIL>, <YOUR_TOKEN>)

#The network was trained for about 15k iterations. 
#Training for longer yields MUCH better results
plt.figure(figsize=[16,24])
sample_images(16,8)

RUOCHI.AI

Generative Adversarial Network Project

Generating human faces with Adversarial Networks

Generative adversarial nets 101

Discriminator

Training

Tricks:

Auxiliary functions

Training