Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Category: Article
Created: January 24, 2022 6:36 PM
Status: Open
URL: https://arxiv.org/pdf/1506.05751.pdf
Updated: January 25, 2022 10:21 AM

Background

Building a model capable of producing high quality samples of natural images.

Highlights

Our approach uses a cascade of convolutional networks within a Laplacian pyramid framework to generate images in a coarse-to-fine fashion.
At each level of the pyramid, a separate generative convnet model is trained using the Generative Adversarial Nets (GAN) approach.

Methods

Generative Adversarial Networks

\[ \min _{G} \max _{D} V(D, G)=\mathbb{E}_{\boldsymbol{x} \sim p_{\text {data }}(\boldsymbol{x})}[\log D(\boldsymbol{x})]+\mathbb{E}_{\boldsymbol{z} \sim p_{\boldsymbol{z}}(\boldsymbol{z})}[\log (1-D(G(\boldsymbol{z})))] \]

Conditional generative adversarial net (CGAN)

\[ \min _{G} \max _{D} \mathbb{E}_{h, l \sim p_{\text {Data }}(\mathbf{h}, \mathbf{l})}[\log D(h, l)]+\mathbb{E}_{z \sim p_{\text {Noise }}(\mathbf{z}), l \sim p_{l}(\mathbf{l})}[\log (1-D(G(z, l), l))] \]

Laplacian Pyramid

The Laplacian pyramid is built from a Gaussian pyramid using upsampling \(u(.)\) and downsampling \(d(.)\) functions.
Let \(G(I) = [I_0;I_1; ...;I_K]\) be the Gaussian pyramid where \(I_0 = I\) and \(I_K\) is \(k\) repeated applications of \(d(.)\) to \(I\). Then, the coefficient \(h_k\) **at level \(k\) of the Laplacian pyramid is given by the difference between the adjacent levels in Gaussian pyramid, upsampling the smaller one with \(u(.)\).

\[ h_{k}=L_{k}(I)=G_{k}(I)-u\left(G_{k+1}(I)\right)=I_{k}-u\left(I_{k+1}\right) \]

Reconstruction of the Laplacian pyramid coefficients \([h_0;h_1; ...;h_K]\) can be performed through backward recurrence as follows:

\[ I_k = u(I_{k+1} + h_k) \]

Laplacian Generative Adversarial Networks

The sampling procedure for our LAPGAN

Following training (explained below), we have a set of generative convnet models \({G_0 , . . . , G_K }\), each of which captures the distribution of coefficients \(h_k\) for natural images at a different level of the Laplacian pyramid.

The training procedure for LAPGAN

Conclusion

Breaking the generation into successive refinements is the key idea in this work. Note that we give up any “global” notion of fidelity; we never make any attempt to train a network to discriminate between the output of a cascade and a real image and instead focus on making each step plausible. Furthermore, the independent training of each pyramid level has the advantage that it is far more difficult for the model to memorize training examples – a hazard when high capacity deep networks are used.