Logo

dev-resources.site

for different kinds of informations.

Positional Embeddings

Published at
2/25/2022
Categories
machinelearning
transformer
neuralnetwork
deeplearning
Author
rohitgupta24
Author
12 person written this
rohitgupta24
open
Positional Embeddings

Positional Embeddings always looked like a different thing to me, so this post is all about explaining the same in plain english..
We all hear and read this word ("Positional Embeddings") wherever Transformer Neural Network comes up and now as Transformers are everywhere from Natural Language Processing to Image Classification(after ViT), it becomes more important to understand them.

What are Positional Embeddings or Positional Encodings?
Let's take an example : Consider the input as "King and Queen". Now if we change the order of the input as ""Queen and King", than the meaning of the input might get change. Same happens if the input is in the form of 16*16 images(as it happens in ViT), if order of images changes, than everything changes.
Input in right sequence
Input in Distorted Sequence

Also, the transformer doesn't process the input sequentially, Input is processed in parallel.
Image description
Image description For each element it combines the information from the other element through self-attention, but each element does this aggregation on its own independently of what other elements do.
The Transformer model doesn't model the sequence of input anywhere explicitly. So to know the exact sequence of input Positional Embeddings comes into picture. They works as the hints to Transformers and tells the model about the sequence of inputs.

These embeddings are added to initial vector representations of the input.
Image description Also,Every position have same identifier irrespective of what exactly the input is.
Image description
Image description

There is no notion of word order (1st word, 2nd word, ..) in the initial architecture. All words of input sequence are fed to the network with no special order or position; in contrast, in RNN architecture, n-th word is fed at step n, and in ConvNet, it is fed to specific input indices. Therefore model has no idea how the words are ordered. Consequently, a position-dependent signal is added to each word-embedding to help the model incorporate the order of words. Based on experiments, this addition not only avoids destroying the embedding information but also adds the vital position information.

The specific choice of (sin, cos) pair helps the model in learning patterns that rely on relative positions.

Further Reading: Article by Jay Alammar explains the paper with excellent visualizations.
The example on positional encoding calculates PE(.)the same, with the only difference that it puts sin in the first half of embedding dimensions (as opposed to even indices) and cos in the second half (as opposed to odd indices). This difference does not matter since vector operations would be invariant to the permutation of dimensions.

This article is inspired by this Youtube Video from AI Coffee Break with Letitia

That's all folks.

If you have any doubt ask me in the comments section and I'll try to answer as soon as possible.
If you love the article follow me on Twitter: [https://twitter.com/guptarohit_kota]
If you are the Linkedin type, let's connect: www.linkedin.com/in/rohitgupta24

Have an awesome day ahead 😀!

neuralnetwork Article's
30 articles in total
Favicon
Fitting a function to data
Favicon
Binarized Neural Network (BNN) on MNIST Dataset
Favicon
Neural Networks Simplified: The Future Beyond Traditional ML
Favicon
Activation Functions Simplified
Favicon
The Neural Nexus: Mind-Reading Headphones and the Future of Human-Computer Interaction
Favicon
Understanding Transformer Neural Networks: A Game-Changer in AI
Favicon
Deep Learning: Unleashing the Power of Neural Networks in AI
Favicon
Unlocking the Power of Neural Networks for Regression: A Comprehensive Guide
Favicon
Unleashing Deep Learning Power: Neural Networks for Classification
Favicon
Building a Neural Network from Scratch in Python
Favicon
Building neural networks in different Python frameworks
Favicon
Using RNNs for Variable Length Multidimensional Sequence Regression Problems
Favicon
Building Your First Neural Network: Image Classification with MNIST Fashion Dataset
Favicon
How to run a Neural Network(RNN) training pipeline on Airflow and deploy the AI model to AWS ECS for Inference
Favicon
TensorFlow Image Classification Tutorial: ResNet50 vs. MobileNet
Favicon
Using Data Science to Fight Malaria: A Breakthrough in Blood Cell Classification
Favicon
How to Teach Your iOS App Recognize Tone of Voice
Favicon
Deep Learning
Favicon
Day-19 of Machine Learning:
Favicon
Day-17 of Machine Learning:
Favicon
Day-16 of Machine Learning:
Favicon
Using a neural network in Go
Favicon
Introduction to deep learning in Python - Basics
Favicon
EfficientNet for Beginners
Favicon
Introduction to Spiking Neural Network
Favicon
Positional Embeddings
Favicon
ResNet For Begginers
Favicon
Convolutional Neural Network : The Easy Way
Favicon
Driving a robot with a neural network [free meetup online]
Favicon
Building a chatbot, pt. 3: How to design a conversation

Featured ones: