Logo

dev-resources.site

for different kinds of informations.

Testing LLM Speed Across Cloud Providers: Groq, Cerebras, AWS & More

Published at
12/8/2024
Categories
llm
genai
benchmarks
bedrock
Author
fr4ncis
Categories
4 categories in total
llm
open
genai
open
benchmarks
open
bedrock
open
Author
7 person written this
fr4ncis
open
Testing LLM Speed Across Cloud Providers: Groq, Cerebras, AWS & More

After my previous exploration of local vs cloud GPU performance for LLMs, I wanted to dive deeper into comparing inference speeds across different cloud API providers. With all the buzz around Groq and Cerebras's blazing-fast inference claims, I was curious to see how they stack up in real-world usage.

The Testing Framework

I developed a simple Node.js-based framework to benchmark different LLM providers consistently. The framework:

  • Runs a series of standardised prompts across different providers
  • Measures inference time and response generation
  • Writes results to structured output files
  • Supports multiple providers including OpenAI, Anthropic, AWS Bedrock, Groq, and Cerebras

The test prompts were designed to cover different scenarios:

  • Mathematical computations (typically challenging for LLMs)
  • Long-form text summarisation (high input tokens, lower output)
  • Structured output generation (JSON, XML, CSV formats)

Test Results

The complete benchmark results are available in this spreadsheet. While the GitHub repository contains the output from each LLM, we'll focus purely on performance metrics here.
Benchmark results

One of the most interesting findings was the significant speed variation for identical models across different providers. This suggests that infrastructure and optimization play a crucial role in inference speed.
Llama 3.2 3B results

The most dramatic differences emerged when testing larger models like Llama 70B. Providers optimized for fast inference showed remarkable capabilities, demonstrating that even models with 70B parameters can achieve impressive speeds with the right infrastructure.
Llama 70B results

Groq's performance across different model sizes reveals an intriguing pattern: whether running small or large models, inference speeds remain remarkably consistent, suggesting they possibly managed to optimise for bigger models.
Groq running different models

Key Findings

  • Groq and Cerebras: The hype is real. Both providers demonstrated exceptional performance, particularly with larger models like Llama 3 70B
  • Ollama: With a decent GPU (e.g., RTX 4090), smaller models (Llama 3.2 1B/3B) performed (speed-wise) comparably to the quickest "API-based models" like Anthropic's Claude Haiku 3 and Amazon's Nova Micro
  • Speed rankings were fairly consistent across different prompts (math, summarisation, structured output)
  • API throttling became an issue with larger models on AWS Bedrock (Claude Sonnet 3.5, Opus 3, Nova Pro)
bedrock Article's
30 articles in total
Favicon
Unlocking AI Potential: Simplifying Generative AI with AWS Bedrock
Favicon
Amazon Bedrock: Advanced Enterprise Implementation in 2024
Favicon
Prompt Engineering Techniques - AWS BedRock
Favicon
Building a Friends-Themed Chatbot: Exploring Amazon Bedrock for Dialogue Refinement
Favicon
AWS Bedrock Knowledge Base - An overview
Favicon
AWS workshop #2: Leveraging Amazon Bedrock to enhance customer service with AI-powered Automated Email Response
Favicon
Primeros pasos con AWS PartyRock
Favicon
Introduction to Amazon Bedrock: Building Generative AI Applications
Favicon
The Case against AGI
Favicon
Agentic AI Tools in AWS: What You Should Know
Favicon
Use Amazon Bedrock Models with OpenAI SDKs with a Serverless Proxy Endpoint - Without Fixed Cost!
Favicon
Gen-AI Powered Healthcare Queries with AWS Kendra & Bedrock
Favicon
Unveiling Amazon Nova Models: The Future of Generative AI πŸš€
Favicon
How to use rerank models in Amazon Bedrock
Favicon
Creating Smart AI Agents with AWS Bedrock
Favicon
Potenciando Aplicaciones de IA con AWS Bedrock y Streamlit
Favicon
Testing LLM Speed Across Cloud Providers: Groq, Cerebras, AWS & More
Favicon
AWS Serverless Generative AI: Amazon Nova Reel Foundation Model with Bedrock and Lambda
Favicon
AWS Serverless and Generative AI with Lambda and Bedrock!
Favicon
Amazon Bedrock Flows: Now Generally Available with Enhanced Safety and Traceability
Favicon
Understanding Amazon Bedrock's New Featureβ€Š-β€Š"Flows"
Favicon
Building a movie suggestion Bot using AWS Bedrock, Amazon Lex, and OpenSearch
Favicon
Save time with the Amazon Bedrock Converse API!
Favicon
Fine-Tuning and Deploying Custom AI Models on Amazon Bedrock: A Practical Guide
Favicon
Building smarter RSS feeds for my newsletter subscriptions with SES and Bedrock
Favicon
Unlocking the Power of AWS GenAI: A Comprehensive Journey
Favicon
Understanding Amazon Bedrock: Components, Pricing and Cost Optimization Strategies
Favicon
Build a Chatbot to streamline customer queries and automate tasks integrating amazon Lex and Bedrock
Favicon
CloudFormation template generator with LLMs/GenAI
Favicon
New features in Amazon Bedrock Prompt Management

Featured ones: