Attention timeline (2014

2014

Seq2Seq

Encoder squeezes the whole source into one fixed vector.

2014

Bahdanau attention

Learned soft alignment per decoded token.

2015

Show, Attend & Tell

Attention over CNN feature grids for image captioning.

2015

Luong attention

Dot & general scoring; local windows.

2015

Listen, Attend, Spell

End-to-end attention-based speech recognition.

2015

End-to-End MemNets

Multi-hop soft attention over a memory bank.

2016

GNMT

Attention + deep LSTMs in production translation.

2017

Transformer

Drop recurrence; scaled dot-product, multi-head, positional encoding.