What is Vanilla RNNs?
Vanilla RNNs Gradient Flow
For computation efficiency, we simplify calculation with:
Problem with Vanilla RNNs Gradient Flow
Repeated and in backpropagation
When doing backpropagation, we can observe from the computational graph that we’ll need to repeatedly multiply the gradient by and
Problem with
Problem with
Largest singular value > 1 → Exploding Gradients Largest singular value < 1 → Vanishing Gradients
Solution
We give up Vanilla RNNs and use LSTM instead