0%

# InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

• Category: Article
• Created: February 8, 2022 7:26 PM
• Status: Open
• URL: https://arxiv.org/pdf/1606.03657.pdf
• Updated: February 15, 2022 6:12 PM

# Highlights

1. Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner.
2. In this paper, we present a simple modification to the generative adversarial network objective that encourages it to learn interpretable and meaningful representations.

# Methods

## Mutual Information

1. In information theory, mutual information between $$X$$ and $$Y$$ , $$I(X,Y)$$, measures the “amount of information” learned from knowledge of random variable $$Y$$ about the other random variable $$X$$.
2. The mutual information can be expressed as the difference of two entropy terms:

$I(X ; Y)=H(X)-H(X \mid Y)=H(Y)-H(Y \mid X)$

1. This definition has an intuitive interpretation: $$I(X,Y)$$ is the reduction of uncertainty in $$X$$ when $$Y$$ is observed. If $$X$$ and are independent, then $$I(X,Y) = 0$$, because knowing one variable reveals nothing about the other.

## Mutual Information for Inducing Latent Codes

• $$z$$ : which is treated as source of incompressible noise;
• $$c$$ : which we will call the latent code and will target the salient structured semantic features of the data distribution. ****
1. we provide the generator network with both the incompressible noise z and the latent code $$c$$, so the form of the generator becomes $$G(z,c)$$.
2. we propose an information-theoretic regularization: there should be high mutual information between latent codes $$c$$ and generator distribution $$G(z,c)$$. Thus $$I(c, G(z,c))$$ should be high.

## Variational Mutual Information Maximization

Hence, InfoGAN is defined as the following minimax game with a variational regularization of mutual information

$\min _{G, Q} \max _{D} V_{\text {InfoGAN }}(D, G, Q)=V(D, G)-\lambda L_{I}(G, Q)$

In practice, the mutual information term $$I(c, G(z,c))$$ is hard to maximize directly as it requires access to the posterior $$P(x|z)$$. Fortunately we can obtain a lower bound of it by defining an auxiliary distribution $$Q(c|x)$$ to approximate $$P(c|x)$$.

\begin{aligned}I(c ; G(z, c)) &=H(c)-H(c \mid G(z, c)) \\&=\mathbb{E}_{x \sim G(z, c)}\left[\mathbb{E}_{c^{\prime} \sim P(c \mid x)}\left[\log P\left(c^{\prime} \mid x\right)\right]\right]+H(c) \\&=\mathbb{E}_{x \sim G(z, c)}[\underbrace{D_{\mathrm{KL}}(P(\cdot \mid x) \| Q(\cdot \mid x))}_{\geq 0}+\mathbb{E}_{c^{\prime} \sim P(c \mid x)}\left[\log Q\left(c^{\prime} \mid x\right)\right]]+H(c) \\& \geq \mathbb{E}_{x \sim G(z, c)}\left[\mathbb{E}_{c^{\prime} \sim P(c \mid x)}\left[\log Q\left(c^{\prime} \mid x\right)\right]\right]+H(c)\end{aligned}

In practice, we parametrize the auxiliary distribution $$Q$$ as a neural network. In most experiments, $$Q$$ and $$D$$ share all convolutional layers and there is one final fully connected layer to output parameters for the conditional distribution $$Q(c|x)$$, which means InfoGAN only adds a negligible computation cost to GAN.