How it works?

Encoder & Latent (Hidden) Feature $z$

For every input image $x$ , we try to guess “which latent feature $z$ will generate this image $x$ ?” We achieve this guess by using an encoder to encode the input image into latent feature $z$

However, since we are just guessing the latent feature, we don’t know the exact $z$ . Hence, we express $z$ as probability distribution

In summary, the encoder in variational autoencoder output a mean and a variance which tells us the distribution of $z$ (normally Gaussian distribution)

Decoder

Step 1: Sample $z$

The decoder input $z$ and output $x$ . In last section, we mentioned $z$ is a probability distribution, thus in order to pass it into the decoder, we sample a specific $z$ using the given distribution

Step 2: Decode

Now, we pass our sampled $z$ into the network. The output of the decoder is also a distribution, which mean the decoder also output a mean and a variance

Step 3: Get generated image

To get the final output image, we have two choices

sample from the distribution
use the mean

$θ$ means the learnable parameters of the network

Hence, $p_{θ} (z)$ means “the probability of $z$ , given learnable parameters $θ$ “

How do we train the model?

Basic Idea

If we input $x$ , we want to maximize the output image probability distribution of generating exact input $x$

Idea 1: Integration

Mathematical Expression

p_{θ} (x) = \int p_{θ} (x, z) d z = \int p_{θ} (x ∣ z) p_{θ} (z) d z

We want to find $θ$ that maximize $p_{θ} (x)$ - the probability of generating $x$ given learnable parameters $x$

Since $z$ is also a distribution, we need to marginalize the calculation

Problem

However, we can’t integrate over $d z$ since $z$ has infinite number of possibilities

Idea 2: Bayes’ Formula

Mathematical Expression

p_{θ} (x) = \frac{p _{θ} ( x ∣ z ) p ( z )}{p _{θ} ( z ∣ x )}

Problem: Unable to Compute $p_{θ} (z ∣ x)$

$p_{θ} (z ∣ x)$ means “given input image $x$ , what is the probability of latent feature $z$ ”

This is what we want the encoder to learn!!! The encoder wants to learn the best probability distribution for $z$ , so we can’t know the exact $p_{θ} (z ∣ x)$ for sure

Solution: Approximate $p_{θ} (z ∣ x)$ by $q_{ϕ} (z ∣ x)$

We use a new network $q_{ϕ} (z ∣ x)$ to approximate $p_{θ} (z ∣ x)$

$p (z)$ is fixed. It is our belief in what should $z$ distribution looks like before we look at input data $x$ . Normally we choose Gaussian distribution with mean 0 and variance 1

Mathematical Detail Finding Lower Bound for $p_{θ} (x)$

What is $q_{ϕ}$ ?

We train encoder $q_{ϕ}$ and decoder $p_{θ}$ together in the training process

Steps

Step 1: Change Bayes’ Formula Representation

lo g p_{θ} (x) = lo g \frac{p _{θ} ( x ∣ z ) p ( z )}{p _{θ} ( z ∣ x )} = lo g \frac{p _{θ} ( x ∣ z ) p ( z ) q _{ϕ} ( z ∣ x )}{p _{θ} ( z ∣ x ) q _{ϕ} ( z ∣ x )} = lo g p_{θ} (x ∣ z) - lo g \frac{q _{ϕ} ( z ∣ x )}{p ( z )} + lo g \frac{q _{ϕ} ( z ∣ x )}{p _{θ} ( z ∣ x )}

Step 2: Wrap

Since $lo g p_{θ} (x)$ doesn’t depends on $z$ , so we can wrap it with expectation:

E_{z} (c) = c ⟹ E_{z \sim q_{ϕ} (z ∣ x)} lo g p_{θ} (x) = lo g p_{θ} (x)

Thus the Bayes’ formula can then be organize to

lo g p_{θ} (x) = E_{z} lo g p_{θ} (x ∣ z) - E_{z} [lo g \frac{q _{ϕ} ( z ∣ x )}{p ( z )}] + E_{z} [lo g \frac{q _{ϕ} ( z ∣ x )}{p _{θ} ( z ∣ x )}] = E_{z \sim q_{ϕ} (z ∣ x)} [lo g p_{θ} (x ∣ z)] - D_{K L} (q_{ϕ} (z ∣ x), p (z)) + D_{K L} (q_{ϕ} (z ∣ x), p_{θ} (z ∣ x))

Step 3: Observe Lower Bound

KL >= 0, so dropping the last term gives lower bound on $p_{θ} (x)$

lo g p_{θ} (x) \geq E_{z \sim q_{ϕ} (z ∣ x)} [lo g p_{θ} (x ∣ z)] - D_{K L} (q_{ϕ} (z ∣ x), p (z))

This gives us the lower bound of $p_{θ} (x)$

Training Process

Encoder

The corresponding term for encoder in lower bound formula is:

- D_{K L} (q_{ϕ} (z ∣ x), p (z))

This term tells us in order to let the lower bound higher, we want the distribution of $q_{ϕ} (z ∣ x)$ and $p (z)$ to be close, which will give us divergence nearly zero

KL divergence goes to zero when the two distributions is about the same, and large if two distributions are different

$p (z)$ is predefined, normally $N (0, 1)$

Decoder

The corresponding term for decoder in lower bound formula is:

E_{z \sim q_{ϕ} (z ∣ x)} [lo g p_{θ} (x ∣ z)]

This term mean

Sample $z$ from $q_{ϕ} (z ∣ x)$
Compute the weighted average of $lo g p_{θ} (x ∣ z)$ across all possible $z$ values, where the weights $z$ are given by $q_{ϕ} (z ∣ x)$
This measures: “For the given $z$ distribution, how likely can it reconstruct the input image $x$ ?” We want to maximize this expectation

Generating Data

Sample $z$ from prior $p (z)$
Pass $z$ into decoder
We should get $\overset{x}{^}$ which resembles the training input image

Pros & Cons

Pros

The mathematical foundation of the model make them interpretable and theoretically well-understood
The encoder $q_{ϕ} (z ∣ x)$ learns meaningful latent feature, which can be used for downstream tasks

Cons

VAEs optimize the lower bound, but not the exact likelihood
Samples blurrier and low quality compared to GANs

Chilfox

目錄

D-DL4CV-Lec19e-Variational_Autoencoders

How it works?

Encoder & Latent (Hidden) Feature $z$

Decoder

Step 1: Sample $z$

Step 2: Decode

Step 3: Get generated image

How do we train the model?

Basic Idea

Idea 1: Integration

Mathematical Expression

Problem

Idea 2: Bayes’ Formula

Mathematical Expression

Problem: Unable to Compute $p_{θ} (z ∣ x)$

Solution: Approximate $p_{θ} (z ∣ x)$ by $q_{ϕ} (z ∣ x)$

Mathematical Detail Finding Lower Bound for $p_{θ} (x)$

What is $q_{ϕ}$ ?

Steps

Step 1: Change Bayes’ Formula Representation

Step 2: Wrap

Step 3: Observe Lower Bound

Training Process

Encoder

Decoder

Generating Data

Pros & Cons

Pros

Cons

關係圖譜

反向連結

Chilfox

目錄

D-DL4CV-Lec19e-Variational_Autoencoders

How it works?

Encoder & Latent (Hidden) Feature z

Decoder

Step 1: Sample z

Step 2: Decode

Step 3: Get generated image

How do we train the model?

Basic Idea

Idea 1: Integration

Mathematical Expression

Problem

Idea 2: Bayes’ Formula

Mathematical Expression

Problem: Unable to Compute pθ​(z∣x)

Solution: Approximate pθ​(z∣x) by qϕ​(z∣x)

Mathematical Detail Finding Lower Bound for pθ​(x)

What is qϕ​ ?

Steps

Step 1: Change Bayes’ Formula Representation

Step 2: Wrap

Step 3: Observe Lower Bound

Training Process

Encoder

Decoder

Generating Data

Pros & Cons

Pros

Cons

關係圖譜

反向連結

Encoder & Latent (Hidden) Feature $z$

Step 1: Sample $z$

Problem: Unable to Compute $p_{θ} (z ∣ x)$

Solution: Approximate $p_{θ} (z ∣ x)$ by $q_{ϕ} (z ∣ x)$

Mathematical Detail Finding Lower Bound for $p_{θ} (x)$

What is $q_{ϕ}$ ?