What is Attention Mechanism?
Attention Mechanism — A technique in neural networks that allows models to dynamically focus on specific parts of the input sequence.
Attention allows a model to focus on relevant parts of the input when generating each output word. When translating ‘the cat sat on the mat,’ attention helps the model focus on ‘cat’ when generating the subject in the target language. It is the key innovation that makes Transformers work.
Frequently Asked Questions
What is self-attention?
Self-attention computes relationships between all words in the same sequence. Each word attends to every other word, capturing dependencies regardless of distance in the text.
Why was attention a breakthrough?
Before attention, models struggled with long sequences because information degraded over distance. Attention lets models directly connect any two positions, solving the long-range dependency problem.
What is multi-head attention?
Running multiple attention computations in parallel, each focusing on different relationship types. One head might track grammatical structure while another tracks semantic meaning.