A Style-Based Generator Architecture for Generative Adversarial Networks
- Category: Article
- Created: February 17, 2022 3:46 PM
- Status: Open
- URL: https://arxiv.org/pdf/1812.04948.pdf
- Updated: February 17, 2022 6:07 PM
Reference: https://towardsdatascience.com/explained-a-style-based-generator-architecture-for-gans-generating-and-tuning-realistic-6cb2be0f431
Highlights
- The new generator architecture leads to an automatically learned, unsupervised separation of high-level attributes (eg. pose and identity when trained on human faces) and stochastic variation in the generated images (freckles, hair). and it enables intuitive, scale-specific control of the synthesis.
- The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation.
Methods
Mapping Network
Given a latent code z in the input latent space \(z\), a non-linear mapping network \(f: z \rightarrow w\) first produces \(\mathbf{w} \in \mathcal{W}\). For simplicity, we set the dimensionality of both spaces to 512, and the mapping f is implemented using an 8-layer MLP.
1 | # The mapping network in StyleGAN is composed of 8 layers, but for your implementation, you will use a neural network with 3 layers. This is to save time training later. |
Style Modules
The AdaIN (Adaptive Instance Normalization) module transfers the encoded information ⱳ, created by the Mapping Network, into the generated image.
Learned affine transformations then specialize w to styles \(y = (y_s,y_b)\) that control adaptive instance normalization(AdaIN) operations after each convolution layer of the synthesis network \(g\). The AdaIN operation is defined as
\[ \operatorname{AdaIN}\left(\mathbf{x}_{i}, \mathbf{y}\right)=\mathbf{y}_{s, i} \frac{\mathbf{x}_{i}-\mu\left(\mathbf{x}_{i}\right)}{\sigma\left(\mathbf{x}_{i}\right)}+\mathbf{y}_{b, i} \]
where each feature map \(x_i\) is normalized separately, and then scaled and biased using the corresponding scalar components from style \(y\).
1 | class AdaIN(nn.Module): |
****Removing traditional input****
it’s easier for the network to learn only using ⱳ without relying on the entangled input vector.
****Stochastic variation****
A scaled noise is added to each channel before the AdaIN module and changes a bit the visual expression of the features of the resolution level it operates on.
1 | class InjectNoise(nn.Module): |
Style mixing
When generating such an image, we simply switch from one latent code to another — an operation we refer to as style mixing — at a randomly selected point in the synthesis network. To be specific, we run two latent codes \(z_1,z_2\) through the mapping network, and have the corresponding \(w_1,w_2\) control the styles so that \(w_1\) applies before the crossover point and \(w_2\) after it.
Truncation Trick
If we consider the distribution of training data, it is clear that areas of low density are poorly represented and thus likely to be difficult for the generator to learn. We can follow a similar strategy. To begin, we compute the center of mass of \(\mathcal{W}\). We can then scale the deviation of a given \(\mathcal{W}\) from the center.
\[ \overline{\mathbf{w}}=\mathbb{E}_{\mathbf{z} \sim P(\mathbf{z})}[f(\mathbf{z})] \]
\[ \mathbf{w}^{\prime}=\overline{\mathbf{w}}+\psi(\mathbf{w}-\overline{\mathbf{w}}) \]
Full Generator Architecture
1 | class MicroStyleGANGeneratorBlock(nn.Module): |
Progressive growing.
1 | class MicroStyleGANGenerator(nn.Module): |