Intuition

Instead of using the same context vector for every time step, we let decoder at each time step decide itself which combination of hidden state does it think is more relevant to this time steps


Implementation

Use to assist finding hidden state relevant currently

For time step , memorizes all the information from the previous time steps, we can use it as a reference to decide which hidden states are more important in step

Steps

Step 1: Calculate Alignment Score

Alignment score is a scalar which represent how important the hidden state is to the time step

Computation of relies on

Step 2: Normalize Alignment Scores to get Attention Weights

Attention weights are the normalized probabilities derived from alignment scores that determine how much each encoder hidden state contributes to the current context vector

Step 3: Compute Context Vector

Step 4: Computer Decoder State

Example: English to French Translation


Image Captioning with RNNs and Attention

Implementation

very similar to basic attention implementation

Intuition

The attention weights concentrated on the position which the current word (output) refers to

For example, when we generate the word “bird”, the weights will concentrate on the pixels of bird