InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
- Category: Article
- Created: February 8, 2022 7:26 PM
- Status: Open
- URL: https://arxiv.org/pdf/1606.03657.pdf
- Updated: February 15, 2022 6:12 PM
Highlights
- Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner.
- In this paper, we present a simple modification to the generative adversarial network objective that encourages it to learn interpretable and meaningful representations.
Methods
Mutual Information
- In information theory, mutual information between \(X\) and \(Y\) , \(I(X,Y)\), measures the “amount of information” learned from knowledge of random variable \(Y\) about the other random variable \(X\).
- The mutual information can be expressed as the difference of two entropy terms:
\[ I(X ; Y)=H(X)-H(X \mid Y)=H(Y)-H(Y \mid X) \]
- This definition has an intuitive interpretation: \(I(X,Y)\) is the reduction of uncertainty in \(X\) when \(Y\) is observed. If \(X\) and are independent, then \(I(X,Y) = 0\), because knowing one variable reveals nothing about the other.
Mutual Information for Inducing Latent Codes
- \(z\) : which is treated as source of incompressible noise;
- \(c\) : which we will call the latent code and will target the salient structured semantic features of the data distribution. ****
- we provide the generator network with both the incompressible noise z and the latent code \(c\), so the form of the generator becomes \(G(z,c)\).
- we propose an information-theoretic regularization: there should be high mutual information between latent codes \(c\) and generator distribution \(G(z,c)\). Thus \(I(c, G(z,c))\) should be high.
Variational Mutual Information Maximization
Hence, InfoGAN is defined as the following minimax game with a variational regularization of mutual information
\[ \min _{G, Q} \max _{D} V_{\text {InfoGAN }}(D, G, Q)=V(D, G)-\lambda L_{I}(G, Q) \]
In practice, the mutual information term \(I(c, G(z,c))\) is hard to maximize directly as it requires access to the posterior \(P(x|z)\). Fortunately we can obtain a lower bound of it by defining an auxiliary distribution \(Q(c|x)\) to approximate \(P(c|x)\).
\[ \begin{aligned}I(c ; G(z, c)) &=H(c)-H(c \mid G(z, c)) \\&=\mathbb{E}_{x \sim G(z, c)}\left[\mathbb{E}_{c^{\prime} \sim P(c \mid x)}\left[\log P\left(c^{\prime} \mid x\right)\right]\right]+H(c) \\&=\mathbb{E}_{x \sim G(z, c)}[\underbrace{D_{\mathrm{KL}}(P(\cdot \mid x) \| Q(\cdot \mid x))}_{\geq 0}+\mathbb{E}_{c^{\prime} \sim P(c \mid x)}\left[\log Q\left(c^{\prime} \mid x\right)\right]]+H(c) \\& \geq \mathbb{E}_{x \sim G(z, c)}\left[\mathbb{E}_{c^{\prime} \sim P(c \mid x)}\left[\log Q\left(c^{\prime} \mid x\right)\right]\right]+H(c)\end{aligned} \]
In practice, we parametrize the auxiliary distribution \(Q\) as a neural network. In most experiments, \(Q\) and \(D\) share all convolutional layers and there is one final fully connected layer to output parameters for the conditional distribution \(Q(c|x)\), which means InfoGAN only adds a negligible computation cost to GAN.
Code
- reference from https://github.com/eriklindernoren/PyTorch-GAN/blob/master/implementations/infogan/infogan.py
Generator
1 | class Generator(nn.Module): |
Discriminator
1 | class Discriminator(nn.Module): |
Training
1 | for epoch in range(opt.n_epochs): |