Logo

dev-resources.site

for different kinds of informations.

🧠Generative AI - 2

Published at
12/24/2024
Categories
genai
gpt3
gemini
learning
Author
Abheeshta P
Categories
4 categories in total
genai
open
gpt3
open
gemini
open
learning
open
🧠Generative AI - 2

Transformer Architecture in Generative AI 🤖

The transformer architecture is the foundation of many generative AI models, including language models like GPT and BERT. It consists of two main components: the encoder 📂 and the decoder.

Basic transformer architecture

Key Components:

1. Encoder 🔄:

  • The encoder processes input data and generates context-rich representations.
  • It consists of:
    • Self-Attention Mechanism 🧐: Allows the encoder to evaluate relationships between different parts of the input. Each token can attend to every other token, capturing dependencies regardless of distance.
    • Feed Forward Layer ➡️: Applies transformations to the attended data and passes it to the next encoder layer.

2. Decoder 🔄:

  • The decoder generates outputs by attending to both encoder outputs and previously generated tokens.
  • It consists of:
    • Self-Attention Mechanism 🧐: The decoder looks at the tokens it has already generated to predict the next one. At the start, the decoder is given the target data (shifted by one position, so it doesn’t just copy it directly). It generates each new token step by step, learning from what it has produced so far.
    • Encoder-Decoder Attention 📈: Aligns decoder outputs with encoded representations to refine predictions.
    • Feed Forward Layer ➡️: Further processes the data and forwards it to the next decoder layer.

Encoder decoder parts

Important Concepts:

1. Self-Attention 🧐:

  • A key mechanism where each input token attends to all other tokens in the sequence.
  • This is computed using the dot product between embeddings.
  • Challenge: Self-attention loses track of the token's original position.

2. Feed Forwarding ➡️:

  • After attention, the data is passed through a fully connected layer for further processing.
  • In encoders, this forwards data to the next encoder layer.
  • In decoders, it contributes to generating the final output.

3. Encoder-Decoder Attention 📈:

  • A layer in the decoder that allows it to attend to the encoder's output.
  • This helps the decoder extract insights from the encoded input for better output generation.

Positional Encoding 📊:

  • To address the issue of lost positional information in self-attention, transformers use positional encoding.
  • Positional encodings are added to input embeddings, providing context about token positions.
  • This ensures sequential relationships are maintained, making output more coherent and human-like.

Detailed architecture1

Do You Need Both Encoder and Decoder? 🤔

No, not always!

  • Encoder-Only Architecture:

    • Used when you don't need to generate new data but instead analyze or classify input.
    • Examples: Sentiment analysis, image classification (like BERT).
  • Decoder-Only Architecture:

    • Used primarily for generative tasks where new data needs to be created.
    • Examples: Chatbots, text generation (like GPT and Gemini).
  • Both Encoder and Decoder:

    • Required when the task involves transforming input into different output, like translating languages.
    • Examples: Machine translation (like T5 and original Transformer model).

Detailed architecture2

Summary 📊:

The transformer architecture's ability to capture long-range dependencies, align encoder and decoder outputs, and maintain positional context is what makes it powerful for generative AI tasks. These mechanisms together allow models to generate human-like text, translate languages, and perform various NLP tasks with high accuracy.

📝 Stay tuned in this learning journey to know about GENAI training! I'd love to discuss this topic further – special thanks to Guvi for the course!

Featured ones: