What is Vanilla RNNs?

h_{t} y_{t} = tanh (W_{hh} h_{t - 1} + W_{x h} x_{t} + b_{h}) = W_{h y} h_{t} + b_{y}

Vanilla RNNs Gradient Flow

For computation efficiency, we simplify $h_{t}$ calculation with:

h_{t} = tanh (W_{hh} h_{t - 1} + W_{x h} x_{t} + b_{h}) = tanh ((W_{hh} W_{h x}) (h_{t - 1} x_{t}) + b_{h}) = tanh (W (h_{t - 1} x_{t}) + b_{h})

Problem with Vanilla RNNs Gradient Flow

Repeated $tanh$ and $W$ in backpropagation

When doing backpropagation, we can observe from the computational graph that we’ll need to repeatedly multiply the gradient by $d tanh (x) / d x$ and $d y / d W$

Problem with $d tanh (x) / d x$

Saturated neurons “kill” the gradients

Problem with $d y / d W$

Largest singular value > 1 → Exploding Gradients Largest singular value < 1 → Vanishing Gradients

Solution

We give up Vanilla RNNs and use LSTM instead

Chilfox

目錄

D-DL4CV-Lec12c-VanillaRNNs

What is Vanilla RNNs?

Vanilla RNNs Gradient Flow

Problem with Vanilla RNNs Gradient Flow

Repeated $tanh$ and $W$ in backpropagation

Problem with $d tanh (x) / d x$

Problem with $d y / d W$

Solution

關係圖譜

反向連結

Chilfox

目錄

D-DL4CV-Lec12c-VanillaRNNs

What is Vanilla RNNs?

Vanilla RNNs Gradient Flow

Problem with Vanilla RNNs Gradient Flow

Repeated tanh and W in backpropagation

Problem with dtanh(x)/dx

Problem with dy/dW

Solution

關係圖譜

反向連結

Repeated $tanh$ and $W$ in backpropagation

Problem with $d tanh (x) / d x$

Problem with $d y / d W$