Logo

dev-resources.site

for different kinds of informations.

Google Axion: A New Leader in ARM Server Performance

Published at
12/8/2024
Categories
googlecloud
arm
aws
axion
Author
dkechag
Categories
4 categories in total
googlecloud
open
arm
open
aws
open
axion
open
Author
7 person written this
dkechag
open
Google Axion: A New Leader in ARM Server Performance

Although our current cloud deployment at SpareRoom is x86, I’ve had the opportunity to test Google’s first in-house ARM server CPU "Axion" for a few months before its recent public release. Without giving too much away upfront - let's just say I was not left unimpressed. I’ll let the numbers, presented as charts, show how Axion compares in both performance and value with the best offerings in Google and Amazon clouds.

Table of Contents:

The Contenders

Here are the contenders for this comparison, the best/most relevant drawn from my recent Cloud VM Comparison test, with prices updated for the 3rd week of November 2024:

Instance Type CPU type HT / SMT Price* $/Month 1Y Res.* $/Month 3Y Res.* $/Month Spot* $/Month
Amazon C7a AMD EPYC Genoa - 77.36 51.97 35.39 26.20
Google c4a Google Axion - 57.69 38.89 27.29 24.50
Amazon C8g AWS Graviton4 - 66.45 44.60 28.75 10.80
Google c4 Intel Emerald Rapids Y 64.49 41.52 30.34 27.23
Google t2d AMD EPYC Milan - 64.68 41.86 30.76 11.77
Google c3d AMD EPYC Genoa Y 57.12 36.87 27.02 24.28

* Monthly price for 2x vCPU / 4GB RAM / 30GB disk instance, except t2d with 8GB RAM. For c3d, 4x vCPU is the minimum so price extrapolated to 2x vCPU.

Amazon had the fastest ARM server VMs as we saw, featuring the Graviton4, and I am including the new compute-optimized type C8g. On the x86 front, I selected their C7a, featuring non-SMT AMD Genoa, which remains the fastest x86 VM in terms of per-vCPU performance.
From Google, I compared against three x86 types: the SMT/HT-enabled AMD Genoa (c3d) and Intel Emerald Rapids (c4), and the older AMD Milan (t2d). The t2d, while older, remains competitive in per-vCPU metrics due to its lack of SMT.

Test setup

I used the same methodology as my recent cloud comparison test, as it was detailed here with one addition: a real-world FFmpeg video compression benchmark.

Here’s how I set it up:

# For ARM instances - replace 'arm64' with 'amd64' for x86:

> wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-arm64-static.tar.xz
> tar -xJf ffmpeg-release-arm64-static.tar.xz --wildcards --no-anchored 'ffmpeg' -O > /usr/bin/ffmpeg
> chmod +x /usr/bin/ffmpeg
> wget https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_h264.mov
> time ffmpeg -i big_buck_bunny_720p_h264.mov -c:v libx264 -threads 1 out264a.mp4
>time ffmpeg -i big_buck_bunny_720p_h264.mov -c:v libx264 -threads 2 out264b.mp4
Enter fullscreen mode Exit fullscreen mode

Performance Results

Benchmark suite results

DKbench is probably the most telling benchmark for the general workloads we use our servers with, and Geekbench 5 is added because of the wide availability of comparison results:

Image description

Immediately we see that Axion not only edges out Graviton4, it’s surprisingly close to the C7a, the fastest x86 instance overall.

DKbench is a good indicator of general performance, but we'll get on with some more specialised testing.

Compilation

For a developer-specific workload, I compiled Perl on two threads:

Image description

Axion came second only to Amazon's Genoa, and the difference was marginal.

7zip performance

Image description

This is the most impressive showing for Axion, leading in both compression and decompression.

Video compression

As mentioned above, I am transcoding using FFmpeg/lib264. lib264 is a very mature library and that should be well-optimized for Intel/AMD, so I was interested to see how well the new Google CPu could do:

Image description

And the answer is, not bad at all. It falls behind of Emerald Rapids & Genoa per single thread, but not by a significant margin. And given that for video compression we don't really care about single-threaded runs, only the C7a is actually faster per vCPU.

OpenSSL (AVX-512)

Moving on to an even more heavily-optimized for Intel/AMD CPUs benchmark, OpenSSL can use the latest AVX-512 instructions for increased performance. This is basically the worst case usage scenario for ARM as the architecture has much more limited SIMD extensions (NEON):

Image description

Here, Axion improves over Graviton4 as in all tests, but cannot keep up even with the older x86.

Summary: performance delta vs Genoa & Graviton4

Let's have a look at the performance delta of the Axion vs Genoa (in purple) and Graviton4 (in yellow) for all the benchmarks we ran (skipping the special case of OpenSSL):

Image description

There are consistent gains over the Graviton4 (from 3 to 15%). On the other hand, Genoa maintains the lead for most tests, with the maximum difference at 15%, but Axion keeps much closer than that in general and even bests Genoa in some cases. I would say that, for most uses, Axion will be closer to Genoa than it is to Graviton4.

Performance / Price

The main reason Amazon & Google developed their own ARM solutions is to provide the best possible value (for themselves and their customers). Hence, a look at performance / price is possibly even more useful than raw performance. I will be looking at multicore performance with DKbench, as with it's varied benchmarks gave reasonably balanced performance results.

On Demand & Reserved

Image description

Looks like it's mission accomplished for Google. Axion is by far the best value amongst the tested VMs, both for On demand and 1y/3y reserved pricing.

Spot Instances

Spot prices vary wildly, both with time and location. Based on Eastern-US pricing on the 3rd week of November when I was compiling the results, this is what we see:

Image description

I don't know if Amazon is doing this on purpose, but they are giving Graviton4 at an unbeatable spot price for US-east, where Axion has availability, when it is priced almost 2x in US-west, where Graviton4 at an unbeatable spot price for US-east is not yet available! In any case, for good value on Axion spot instances you'll have to wait for wider availability, right now on Google you'd have to go with the Milan Tau instances if you wanted the best value.

This chart is mostly to make you research spot prices, as there are always great deals to be found, especially if you are not limited to a specific region. The deals change often, so try to keep track of pricing.

Conclusion

Google’s Axion CPU proves to be an exceptional contender in the ARM server space, offering stellar performance and value. Expect an almost 10% performance improvement over Graviton4. In addition, while it trails behind x86 CPUs in some specialized workloads (e.g. AVX-512), it is not far behind the best x86 CPUs in the majority of tasks, posing as a viable alternative for those seeking to switch to ARM but keep top-tier performance levels.

arm Article's
30 articles in total
Favicon
Understanding the Difference Between x86 and ARM CPUs: Instruction Set Comparison and Their Impact
Favicon
Using flutter with native resources on apple silicon processors
Favicon
Reviving the Remix Mini PC: A Guide to Running ARM-based OS Images
Favicon
Understanding the Differences Between FPGA, AVR, PIC, and ARM Microcontrollers
Favicon
Google Axion: A New Leader in ARM Server Performance
Favicon
Will ARM Processors Surpass x86 in Performance?
Favicon
Magento 2 ARM Ubuntu Server 24.04 AMD installation sh script
Favicon
How Arm’s Success in Data Centers is Shaping the Future of Chip Technology
Favicon
Intro to Llama on Graviton
Favicon
Investigate performance with Process Watch on AWS Graviton processors
Favicon
Adoption of AWS Graviton ARM instances (and what results we’ve seen)
Favicon
Core Architectural components of Microsoft Azure
Favicon
ARM Template: Azure SQL Server
Favicon
ARM Template: Azure Webapp
Favicon
Apple Silicon, State-of-the-art ARM CPU
Favicon
AWS Graviton Migration - Embracing the Path to Modernization
Favicon
Setting Up ARM VM on Proxmox VE
Favicon
How to Deploy a .NET isolated Azure Function in One-Click using Zip Deploy
Favicon
Cloud : un peu d’ARM avec votre cluster Kubernetes ?
Favicon
ARM vs x86 em Docker
Favicon
Conociendo ARM32
Favicon
Choose Wisely: RISC-V vs. ARM - Architectures of the Future
Favicon
Manage your Azure resources using automation tasks
Favicon
Exploring the Differences: ARM vs. RISC-V Architecture
Favicon
Use AWS Graviton processors on AWS Fargate with Copilot
Favicon
GitHub Actions for Easy ARM64 and AMD64 Docker Image Builds
Favicon
Build Arm Docker images five times faster on native hardware
Favicon
Faster Docker builds for Arm without emulation
Favicon
Azure Bicep - Finally functions to manipulate CIDRs
Favicon
Ampere Computing and Jack Aboutboul of AlmaLinux Talk Arm64

Featured ones: