Related Notes
Three Ways of Processing Sequences
RNN
Pros:
- Good at processing long sequences
Cons:
- Not parallelizable, need to compute hidden states sequentially
1D Convolution
Pros:
- Highly parallelizable
Cons:
- Bad at dealing long sequences
Self-Attention
Pros:
- Good at processing long sequences
- Highly parallel
Cons:
- Very memory extensive