Logo

dev-resources.site

for different kinds of informations.

The Technology behind GPT that defined today’s world

Published at
1/14/2025
Categories
gpt3
nlp
deeplearning
machinelearning
Author
kushalharsora
Author
13 person written this
kushalharsora
open
The Technology behind GPT that defined today’s world

What if we were able to teach machines how to think? Or better, what if they could dream? A few years back this notion would have been considered an absurd idea. But here we are looking at the idea taking shape in our reality. We live in a world of miracles where technology changes everyday lives. And now in an era of artificial intelligence, generative pre-trained transformers are a stepping stone, a foundation that paves the way for such upcoming advancements.

What is a generative pre-trained transformer if you are not a geeky nerd? The short answer is that generative pre-trained transformers aka gpt are machines capable of understanding and generating human-like language. Simply, it means teaching machines to understand and behave like humans.

The technology behind the generative pre-trained transformer is a phenomenal work of art. Imagine teaching a machine that only understands 0s and 1s, what you are saying, or better, a machine that generates responses based on your questions. It sounds like a thing that is straight out of a fictional movie. But the more fascinating the technology sounds, the more fascinating the technology behind its development.

Generative pre-trained models are divided into two parts — attention block and multilayer perception. We will discuss each in detail but first, we need to understand some fundamental terms.

Generative pre-trained transformer structure

Natural Language processing is the part of preparing our human language in machine-readable terms. The general overview is that the sentences are broken into small chunks called tokens which retain maximum information, discarding the unnecessary parts. These tokens are then converted into embeddings and passed onto the transformer. Embeddings are essentially a matrix representation that consists of numbers for each token. The goal of using embeddings is to retain the contextual information and also solve any ambiguity that may arise.

Ambiguity in word

Once the transformer receives the input embedding, it is sent to the attention block also called as self-attention block where the model tries to weigh the importance of each word in the sentence given the other words. This helps the model understand the context of the input query which in turn helps it to generate the next word.

The way this is done is essentially that input query token embeddings are multiplied with the key vector of the same query via a dot product and then scaled and normalised so that all embeddings stay in a certain range. This generated matrix is then passed onto the multi-layer perception block for further evaluation.

Neural Network

The Multi-layer perception consists of a fully connected neural network with one or more hidden layers. The input matrix is then multiplied by the weights of the perception in each layer and finally using an activation function, the output is obtained. The goal of using multi-layer perception is to introduce non-linearity into our model. Why non-linearity? Because in real-world scenarios, we won’t always get data that follows a linear approach but rather a non-linear one. Also, this block provided the feature extraction by transforming the input matrix into higher dimensions which helps the model learn new kinds of patterns that are not available in lower dimensions.

Now all these processes of attention block and multi-layer perception are repeated over and over again and finally, we get the output embeddings in the transformer. The output we get is a probability distribution of all the words that might come next in the given input query and the most relevant word is chosen to be the next word.

This whole input query with the next word is again sent to the transformer to generate the next word. This process is repeated over and over again till the desired output is obtained. Thus this type of transformer is called a generative transformer as it quite literally generates each word in one iteration.

Now, the word pre-trained is associated with transformer as the model is first trained on a large set of data to tune the model's weight before it is used for general purposes. They are also called Large Language Models aka LLMs as they are trained on vast datasets. That’s a topic for another time.

Now, the future of generative pre-trained transformers is quite convincing as we see exponential growth in its successful implementations by a lot of companies like Openai, Google and Meta just to name a few. Openai and its introduction with the GPT 3 model that took the world by storm. GPT 3 proved how these generative models can impact our day-to-day lives. We saw it write poetry, code, and articles, help us with our academic assignments and much more. The future that comes with generative pre-trained transformers is quite promising but at the same time quite frightening.

Just like a coin has two sides, GPT also has both good and bad sides. And to judge it on either of its sides is a crime in itself.

As we stand on the brink of a new era in AI, it’s up to us to shape the future of GPT technology for the benefit of all.

You can also follow me on medium for more such blogs.

Thank you for reading. I hope you have a great day!

machinelearning Article's
30 articles in total
Favicon
Join us for the Agent.ai Challenge: $10,000 in Prizes!
Favicon
The Language Server Protocol - Building DBChat (Part 5)
Favicon
The Frontier of Visual AI in Medical Imaging
Favicon
Binary classification with Machine Learning: Neural Networks for classifying Chihuahuas and Muffins
Favicon
Flow Networks Breakthrough: New Theory Shows Promise for Machine Learning Structure Discovery
Favicon
Breakthrough: Privacy-First AI Splits Tasks Across Devices to Match Central Model Performance
Favicon
Revolutionary AI Model Self-Adapts Like Human Brain: Transformer Shows 15% Better Performance in Complex Tasks
Favicon
A beginner's guide to the Lama model by Allenhooo on Replicate
Favicon
Amazon Product Finder
Favicon
Why Neural Network Safety Checks Need a Universal Programming Language
Favicon
First Chatbot ELIZA Restored: 1960s AI Program Reveals Hidden Complexity
Favicon
MathReader: AI System Makes Complex Math Equations Speakable and Accessible
Favicon
Image Recognition Trends for 2025
Favicon
The World’s 1st Free and Open-Source Palm Recognition SDK from Faceplugin
Favicon
🌐 Embracing the Future: Cryptocurrency, Blockchain, and AI Synergy 🌐
Favicon
The Complete Introduction to Time Series Classification in Python
Favicon
AI in 2025: Predictions from Industry Experts
Favicon
The Technology behind GPT that defined today’s world
Favicon
Choosing the Right AWS Machine Learning Service: A Comprehensive Guide
Favicon
New AI Backdoor Attack Evades Detection While Maintaining 90% Success Rate
Favicon
New AI System Finds Exact Video Clips You Need: VideoRAG Combines Smart Search with Language Understanding
Favicon
Open-Source WiFi Platform Enables Advanced MIMO Research with GNU Radio Support
Favicon
AI Models Can Now Self-Improve Through Structured Multi-Agent Debates
Favicon
Streaming Responses in AI: How AI Outputs Are Generated in Real-Time
Favicon
I created a very very basic Ai
Favicon
Enlightening article about diffusion models in machine learning! 🧠
Favicon
Build Code-Action AI Agents with freeact
Favicon
Through the Black Mirror: How Our Ignorance of AI Coding Shapes Reality
Favicon
🔧 Generative AI Developer Week 2 - Day 3: Data Preprocessing
Favicon
LlamaV-o1: New AI Model Shows 12% Boost in Visual Reasoning Through Step-by-Step Analysis

Featured ones: