This is the fourth and final video on attention mechanisms. In the previous video we introduced multiheaded keys, queries and values and in this video we're introducing the final bits you need to get to a transformer.
While making these videos we've found that these sources are very useful to have around. Not only because they help the conceptual understanding but also because some of them offer code examples.
Try to answer the following questions to test your knowledge.
- What is the purpose of the positional encoding in the transformer architecture?
- Why are transformers easies to parallize than recurrent neural networks?