0%

A Style-Based Generator Architecture for Generative Adversarial Networks

A Style-Based Generator Architecture for Generative Adversarial Networks

  • Category: Article
  • Created: February 17, 2022 3:46 PM
  • Status: Open
  • URL: https://arxiv.org/pdf/1812.04948.pdf
  • Updated: February 17, 2022 6:07 PM

Reference: https://towardsdatascience.com/explained-a-style-based-generator-architecture-for-gans-generating-and-tuning-realistic-6cb2be0f431

Highlights

  1. The new generator architecture leads to an automatically learned, unsupervised separation of high-level attributes (eg. pose and identity when trained on human faces) and stochastic variation in the generated images (freckles, hair). and it enables intuitive, scale-specific control of the synthesis.
  2. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation.

Methods

Mapping Network

Given a latent code z in the input latent space \(z\), a non-linear mapping network \(f: z \rightarrow w\) first produces \(\mathbf{w} \in \mathcal{W}\). For simplicity, we set the dimensionality of both spaces to 512, and the mapping f is implemented using an 8-layer MLP.

Screen Shot 2022-02-17 at 18.07.16.png
Screen Shot 2022-02-17 at 16.37.39.png
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# The mapping network in StyleGAN is composed of 8 layers, but for your implementation, you will use a neural network with 3 layers. This is to save time training later.

class MappingLayers(nn.Module):
'''
Mapping Layers Class
Values:
z_dim: the dimension of the noise vector, a scalar
hidden_dim: the inner dimension, a scalar
w_dim: the dimension of the intermediate noise vector, a scalar
'''

def __init__(self, z_dim, hidden_dim, w_dim):
super().__init__()
self.mapping = nn.Sequential(
# Please write a neural network which takes in tensors of
# shape (n_samples, z_dim) and outputs (n_samples, w_dim)
# with a hidden layer with hidden_dim neurons
#### START CODE HERE ####
nn.Linear(z_dim,hidden_dim,bias=True),
nn.ReLU(inplace=True),
nn.Linear(hidden_dim,hidden_dim,bias=True),
nn.ReLU(inplace=True),
nn.Linear(hidden_dim,w_dim,bias=True)
#### END CODE HERE ####
)

def forward(self, noise):
'''
Function for completing a forward pass of MappingLayers:
Given an initial noise tensor, returns the intermediate noise tensor.
Parameters:
noise: a noise tensor with dimensions (n_samples, z_dim)
'''
return self.mapping(noise)

#UNIT TEST COMMENT: Required for grading
def get_mapping(self):
return self.mapping

Style Modules

The AdaIN (Adaptive Instance Normalization) module transfers the encoded information ⱳ, created by the Mapping Network, into the generated image.

Learned affine transformations then specialize w to styles \(y = (y_s,y_b)\) that control adaptive instance normalization(AdaIN) operations after each convolution layer of the synthesis network \(g\). The AdaIN operation is defined as

\[ \operatorname{AdaIN}\left(\mathbf{x}_{i}, \mathbf{y}\right)=\mathbf{y}_{s, i} \frac{\mathbf{x}_{i}-\mu\left(\mathbf{x}_{i}\right)}{\sigma\left(\mathbf{x}_{i}\right)}+\mathbf{y}_{b, i} \]

Screen Shot 2022-02-17 at 16.40.30.png

where each feature map \(x_i\) is normalized separately, and then scaled and biased using the corresponding scalar components from style \(y\).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class AdaIN(nn.Module):
'''
AdaIN Class
Values:
channels: the number of channels the image has, a scalar
w_dim: the dimension of the intermediate noise vector, a scalar
'''

def __init__(self, channels, w_dim):
super().__init__()

# Normalize the input per-dimension
self.instance_norm = nn.InstanceNorm2d(channels)

# You want to map w to a set of style weights per channel.
# Replace the Nones with the correct dimensions - keep in mind that
# both linear maps transform a w vector into style weights
# corresponding to the number of image channels.
#### START CODE HERE ####
self.style_scale_transform = nn.Linear(w_dim, channels)
self.style_shift_transform = nn.Linear(w_dim, channels)
#### END CODE HERE ####

def forward(self, image, w):
'''
Function for completing a forward pass of AdaIN: Given an image and intermediate noise vector w,
returns the normalized image that has been scaled and shifted by the style.
Parameters:
image: the feature map of shape (n_samples, channels, width, height)
w: the intermediate noise vector
'''
normalized_image = self.instance_norm(image)
style_scale = self.style_scale_transform(w)[:, :, None, None]
style_shift = self.style_shift_transform(w)[:, :, None, None]

# Calculate the transformed image
#### START CODE HERE ####
transformed_image = style_scale * normalized_image + style_shift
#### END CODE HERE ####
return transformed_image

#UNIT TEST COMMENT: Required for grading
def get_style_scale_transform(self):
return self.style_scale_transform

#UNIT TEST COMMENT: Required for grading
def get_style_shift_transform(self):
return self.style_shift_transform

#UNIT TEST COMMENT: Required for grading
def get_self(self):
return self

****Removing traditional input****

it’s easier for the network to learn only using ⱳ without relying on the entangled input vector.

Screen Shot 2022-02-17 at 16.42.29.png

****Stochastic variation****

A scaled noise is added to each channel before the AdaIN module and changes a bit the visual expression of the features of the resolution level it operates on.

Screen Shot 2022-02-17 at 16.43.52.png
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class InjectNoise(nn.Module):
'''
Inject Noise Class
Values:
channels: the number of channels the image has, a scalar
'''
def __init__(self, channels):
super().__init__()
self.weight = nn.Parameter( # You use nn.Parameter so that these weights can be optimized
# Initiate the weights for the channels from a random normal distribution
#### START CODE HERE ####
torch.randn(size = (1, channels, 1, 1))
#### END CODE HERE ####
)

def forward(self, image):
'''
Function for completing a forward pass of InjectNoise: Given an image,
returns the image with random noise added.
Parameters:
image: the feature map of shape (n_samples, channels, width, height)
'''
# Set the appropriate shape for the noise!

#### START CODE HERE ####
n_samples, channels, width, height = image.shape
noise_shape = (n_samples, 1, width, height)
#### END CODE HERE ####

noise = torch.randn(noise_shape, device=image.device) # Creates the random noise
return image + self.weight * noise # Applies to image after multiplying by the weight for each channel

#UNIT TEST COMMENT: Required for grading
def get_weight(self):
return self.weight

#UNIT TEST COMMENT: Required for grading
def get_self(self):
return self

Style mixing

When generating such an image, we simply switch from one latent code to another — an operation we refer to as style mixing — at a randomly selected point in the synthesis network. To be specific, we run two latent codes \(z_1,z_2\) through the mapping network, and have the corresponding \(w_1,w_2\) control the styles so that \(w_1\) applies before the crossover point and \(w_2\) after it.

Screen Shot 2022-02-17 at 17.38.53.png

Truncation Trick

If we consider the distribution of training data, it is clear that areas of low density are poorly represented and thus likely to be difficult for the generator to learn. We can follow a similar strategy. To begin, we compute the center of mass of \(\mathcal{W}\). We can then scale the deviation of a given \(\mathcal{W}\) from the center.

\[ \overline{\mathbf{w}}=\mathbb{E}_{\mathbf{z} \sim P(\mathbf{z})}[f(\mathbf{z})] \]

\[ \mathbf{w}^{\prime}=\overline{\mathbf{w}}+\psi(\mathbf{w}-\overline{\mathbf{w}}) \]

Full Generator Architecture

Screen Shot 2022-02-17 at 17.31.04.png
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class MicroStyleGANGeneratorBlock(nn.Module):
'''
Micro StyleGAN Generator Block Class
Values:
in_chan: the number of channels in the input, a scalar
out_chan: the number of channels wanted in the output, a scalar
w_dim: the dimension of the intermediate noise vector, a scalar
kernel_size: the size of the convolving kernel
starting_size: the size of the starting image
'''

def __init__(self, in_chan, out_chan, w_dim, kernel_size, starting_size, use_upsample=True):
super().__init__()
self.use_upsample = use_upsample
# Replace the Nones in order to:
# 1. Upsample to the starting_size, bilinearly (https://pytorch.org/docs/master/generated/torch.nn.Upsample.html)
# 2. Create a kernel_size convolution which takes in
# an image with in_chan and outputs one with out_chan (https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)
# 3. Create an object to inject noise
# 4. Create an AdaIN object
# 5. Create a LeakyReLU activation with slope 0.2

#### START CODE HERE ####
if self.use_upsample:
self.upsample = nn.Upsample((starting_size, starting_size), mode="bilinear")
self.conv = nn.Conv2d(in_chan, out_chan, 3, padding=1) # Padding is used to maintain the image size
self.inject_noise = InjectNoise(out_chan)
self.adain = AdaIN(out_chan, w_dim)
self.activation = nn.LeakyReLU(negative_slope = 0.2, inplace = True)
#### END CODE HERE ####

def forward(self, x, w):
'''
Function for completing a forward pass of MicroStyleGANGeneratorBlock: Given an x and w,
computes a StyleGAN generator block.
Parameters:
x: the input into the generator, feature map of shape (n_samples, channels, width, height)
w: the intermediate noise vector
'''
if self.use_upsample:
x = self.upsample(x)
x = self.conv(x)
x = self.inject_noise(x)
x = self.activation(x)
x = self.adain(x, w)
return x

#UNIT TEST COMMENT: Required for grading
def get_self(self):
return self;

Progressive growing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
class MicroStyleGANGenerator(nn.Module):
'''
Micro StyleGAN Generator Class
Values:
z_dim: the dimension of the noise vector, a scalar
map_hidden_dim: the mapping inner dimension, a scalar
w_dim: the dimension of the intermediate noise vector, a scalar
in_chan: the dimension of the constant input, usually w_dim, a scalar
out_chan: the number of channels wanted in the output, a scalar
kernel_size: the size of the convolving kernel
hidden_chan: the inner dimension, a scalar
'''

def __init__(self,
z_dim,
map_hidden_dim,
w_dim,
in_chan,
out_chan,
kernel_size,
hidden_chan):
super().__init__()
self.map = MappingLayers(z_dim, map_hidden_dim, w_dim)
# Typically this constant is initiated to all ones, but you will initiate to a
# Gaussian to better visualize the network's effect
self.starting_constant = nn.Parameter(torch.randn(1, in_chan, 4, 4))
self.block0 = MicroStyleGANGeneratorBlock(in_chan, hidden_chan, w_dim, kernel_size, 4, use_upsample=False)
self.block1 = MicroStyleGANGeneratorBlock(hidden_chan, hidden_chan, w_dim, kernel_size, 8)
self.block2 = MicroStyleGANGeneratorBlock(hidden_chan, hidden_chan, w_dim, kernel_size, 16)
# You need to have a way of mapping from the output noise to an image,
# so you learn a 1x1 convolution to transform the e.g. 512 channels into 3 channels
# (Note that this is simplified, with clipping used in the real StyleGAN)
self.block1_to_image = nn.Conv2d(hidden_chan, out_chan, kernel_size=1)
self.block2_to_image = nn.Conv2d(hidden_chan, out_chan, kernel_size=1)
self.alpha = 0.2

def upsample_to_match_size(self, smaller_image, bigger_image):
'''
Function for upsampling an image to the size of another: Given a two images (smaller and bigger),
upsamples the first to have the same dimensions as the second.
Parameters:
smaller_image: the smaller image to upsample
bigger_image: the bigger image whose dimensions will be upsampled to
'''
return F.interpolate(smaller_image, size=bigger_image.shape[-2:], mode='bilinear')

def forward(self, noise, return_intermediate=False):
'''
Function for completing a forward pass of MicroStyleGANGenerator: Given noise,
computes a StyleGAN iteration.
Parameters:
noise: a noise tensor with dimensions (n_samples, z_dim)
return_intermediate: a boolean, true to return the images as well (for testing) and false otherwise
'''
x = self.starting_constant
w = self.map(noise)
x = self.block0(x, w)
x_small = self.block1(x, w) # First generator run output
x_small_image = self.block1_to_image(x_small)
x_big = self.block2(x_small, w) # Second generator run output
x_big_image = self.block2_to_image(x_big)
x_small_upsample = self.upsample_to_match_size(x_small_image, x_big_image) # Upsample first generator run output to be same size as second generator run output
# Interpolate between the upsampled image and the image from the generator using alpha

#### START CODE HERE ####
interpolation = torch.lerp(x_small_upsample,x_big_image,self.alpha)
#### END CODE HERE ####

if return_intermediate:
return interpolation, x_small_upsample, x_big_image
return interpolation

#UNIT TEST COMMENT: Required for grading
def get_self(self):
return self;