Logo

dev-resources.site

for different kinds of informations.

The Technology behind GPT that defined today’s world

Published at
1/14/2025
Categories
gpt3
nlp
deeplearning
machinelearning
Author
kushalharsora
Author
13 person written this
kushalharsora
open
The Technology behind GPT that defined today’s world

What if we were able to teach machines how to think? Or better, what if they could dream? A few years back this notion would have been considered an absurd idea. But here we are looking at the idea taking shape in our reality. We live in a world of miracles where technology changes everyday lives. And now in an era of artificial intelligence, generative pre-trained transformers are a stepping stone, a foundation that paves the way for such upcoming advancements.

What is a generative pre-trained transformer if you are not a geeky nerd? The short answer is that generative pre-trained transformers aka gpt are machines capable of understanding and generating human-like language. Simply, it means teaching machines to understand and behave like humans.

The technology behind the generative pre-trained transformer is a phenomenal work of art. Imagine teaching a machine that only understands 0s and 1s, what you are saying, or better, a machine that generates responses based on your questions. It sounds like a thing that is straight out of a fictional movie. But the more fascinating the technology sounds, the more fascinating the technology behind its development.

Generative pre-trained models are divided into two parts — attention block and multilayer perception. We will discuss each in detail but first, we need to understand some fundamental terms.

Generative pre-trained transformer structure

Natural Language processing is the part of preparing our human language in machine-readable terms. The general overview is that the sentences are broken into small chunks called tokens which retain maximum information, discarding the unnecessary parts. These tokens are then converted into embeddings and passed onto the transformer. Embeddings are essentially a matrix representation that consists of numbers for each token. The goal of using embeddings is to retain the contextual information and also solve any ambiguity that may arise.

Ambiguity in word

Once the transformer receives the input embedding, it is sent to the attention block also called as self-attention block where the model tries to weigh the importance of each word in the sentence given the other words. This helps the model understand the context of the input query which in turn helps it to generate the next word.

The way this is done is essentially that input query token embeddings are multiplied with the key vector of the same query via a dot product and then scaled and normalised so that all embeddings stay in a certain range. This generated matrix is then passed onto the multi-layer perception block for further evaluation.

Neural Network

The Multi-layer perception consists of a fully connected neural network with one or more hidden layers. The input matrix is then multiplied by the weights of the perception in each layer and finally using an activation function, the output is obtained. The goal of using multi-layer perception is to introduce non-linearity into our model. Why non-linearity? Because in real-world scenarios, we won’t always get data that follows a linear approach but rather a non-linear one. Also, this block provided the feature extraction by transforming the input matrix into higher dimensions which helps the model learn new kinds of patterns that are not available in lower dimensions.

Now all these processes of attention block and multi-layer perception are repeated over and over again and finally, we get the output embeddings in the transformer. The output we get is a probability distribution of all the words that might come next in the given input query and the most relevant word is chosen to be the next word.

This whole input query with the next word is again sent to the transformer to generate the next word. This process is repeated over and over again till the desired output is obtained. Thus this type of transformer is called a generative transformer as it quite literally generates each word in one iteration.

Now, the word pre-trained is associated with transformer as the model is first trained on a large set of data to tune the model's weight before it is used for general purposes. They are also called Large Language Models aka LLMs as they are trained on vast datasets. That’s a topic for another time.

Now, the future of generative pre-trained transformers is quite convincing as we see exponential growth in its successful implementations by a lot of companies like Openai, Google and Meta just to name a few. Openai and its introduction with the GPT 3 model that took the world by storm. GPT 3 proved how these generative models can impact our day-to-day lives. We saw it write poetry, code, and articles, help us with our academic assignments and much more. The future that comes with generative pre-trained transformers is quite promising but at the same time quite frightening.

Just like a coin has two sides, GPT also has both good and bad sides. And to judge it on either of its sides is a crime in itself.

As we stand on the brink of a new era in AI, it’s up to us to shape the future of GPT technology for the benefit of all.

You can also follow me on medium for more such blogs.

Thank you for reading. I hope you have a great day!

nlp Article's
30 articles in total
Favicon
The Technology behind GPT that defined today’s world
Favicon
LLMs for Big Data
Favicon
Hipa.ai Blog Writer Technology Stack
Favicon
Building a Production-Ready Trie Search System: A 5-Day Journey 🚀
Favicon
How to convert customer feedbacks into insights with NLP?
Favicon
Building a Sarcasm Detection System with LSTM and GloVe: A Complete Guide
Favicon
Embeddings, Vector Databases, and Semantic Search: A Comprehensive Guide
Favicon
2024 - Ultimate guide to LLM analysis using NLP standalone
Favicon
Summarizing Text Using Hugging Face's BART Model
Favicon
Emerging Trends in iOS App Development: Innovations Shaping the Future
Favicon
Exploring GraphCodeBERT for Code Search: Insights and Limitations
Favicon
Build Your Own AI Agent in Minutes with Eliza: A Complete Guide
Favicon
Build Your Intelligent Custom Application Development With Azure AI
Favicon
Real-world Uses of Natural Language Processing (NLP) in the Business Sector
Favicon
Understanding RAG Workflow: Retrieval-Augmented Generation in Python
Favicon
The Future of Healthcare: How AI is Transforming Patient Care
Favicon
Python Script for Stock Sentiment Analysis
Favicon
The Evolution of Machine Learning and Natural Language Processing to Transformers: A Journey Through Time
Favicon
Prompting for purchasing: Shopping lists & evaluation matrixes (Part 2)
Favicon
Natural Language Processing (NLP)
Favicon
Introduction to Hadoop:)
Favicon
What makes Python the Backbone of Customer Service Automation in E-commerce?
Favicon
Gemini 2.0: A New Era of AI
Favicon
textGrad: Automatic “Differentiation” via Text
Favicon
Exploring Code Search with CodeBERT – First Impressions
Favicon
PROJECT-991 ( MASH AI )
Favicon
Can English Replace Java? The Future of Programming in Plain Language
Favicon
Day 33 - ALBERT (A Lite BERT): Efficient Language Model
Favicon
ML Chapter 7: Natural Language Processing
Favicon
Day 32 - Switch Transformers: Efficient Large-Scale Models

Featured ones: