0%

# A Style-Based Generator Architecture for Generative Adversarial Networks

• Category: Article
• Created: February 17, 2022 3:46 PM
• Status: Open
• URL: https://arxiv.org/pdf/1812.04948.pdf
• Updated: February 17, 2022 6:07 PM

# Highlights

1. The new generator architecture leads to an automatically learned, unsupervised separation of high-level attributes (eg. pose and identity when trained on human faces) and stochastic variation in the generated images (freckles, hair). and it enables intuitive, scale-specific control of the synthesis.
2. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation.

# Methods

### Mapping Network

Given a latent code z in the input latent space $$z$$, a non-linear mapping network $$f: z \rightarrow w$$ first produces $$\mathbf{w} \in \mathcal{W}$$. For simplicity, we set the dimensionality of both spaces to 512, and the mapping f is implemented using an 8-layer MLP.

### Style Modules

The AdaIN (Adaptive Instance Normalization) module transfers the encoded information ⱳ, created by the Mapping Network, into the generated image.

Learned affine transformations then specialize w to styles $$y = (y_s,y_b)$$ that control adaptive instance normalization(AdaIN) operations after each convolution layer of the synthesis network $$g$$. The AdaIN operation is defined as

$\operatorname{AdaIN}\left(\mathbf{x}_{i}, \mathbf{y}\right)=\mathbf{y}_{s, i} \frac{\mathbf{x}_{i}-\mu\left(\mathbf{x}_{i}\right)}{\sigma\left(\mathbf{x}_{i}\right)}+\mathbf{y}_{b, i}$

where each feature map $$x_i$$ is normalized separately, and then scaled and biased using the corresponding scalar components from style $$y$$.

it’s easier for the network to learn only using ⱳ without relying on the entangled input vector.

### ****Stochastic variation****

A scaled noise is added to each channel before the AdaIN module and changes a bit the visual expression of the features of the resolution level it operates on.

## Style mixing

When generating such an image, we simply switch from one latent code to another — an operation we refer to as style mixing — at a randomly selected point in the synthesis network. To be specific, we run two latent codes $$z_1,z_2$$ through the mapping network, and have the corresponding $$w_1,w_2$$ control the styles so that $$w_1$$ applies before the crossover point and $$w_2$$ after it.

## Truncation Trick

If we consider the distribution of training data, it is clear that areas of low density are poorly represented and thus likely to be difficult for the generator to learn. We can follow a similar strategy. To begin, we compute the center of mass of $$\mathcal{W}$$. We can then scale the deviation of a given $$\mathcal{W}$$ from the center.

$\overline{\mathbf{w}}=\mathbb{E}_{\mathbf{z} \sim P(\mathbf{z})}[f(\mathbf{z})]$

$\mathbf{w}^{\prime}=\overline{\mathbf{w}}+\psi(\mathbf{w}-\overline{\mathbf{w}})$