Logo

dev-resources.site

for different kinds of informations.

Optimizing Memory Usage in Go: Mastering Data Structure Alignment

Published at
11/10/2024
Categories
go
programming
computerscience
Author
Ivelin Yanev
Categories
3 categories in total
go
open
programming
open
computerscience
open
Optimizing Memory Usage in Go: Mastering Data Structure Alignment

Memory optimization is crucial for writing performant software systems. When a software has a finite amount of memory to work with, many issues can arise when that memory isn't used efficiently. That's why memory optimization is critical for better overall performance.

Go inherits many of C's advantageous features, but what I notice is that a large part of people who use it do not know the full power of this language. One of the reasons may be a lack of knowledge about how it works at a low level, or a lack of experience with languages like C or C++. I mention C and C++ because the foundations of Go are pretty much built on the wonderful features of C/C++. It's no coincidence that I will quote an interview of Ken Thompson at Google I/O 2012:

For me, the reason I was enthusiastic about Go is because just about the same time that we were starting on Go, I read (or tried to read) the C++0x proposed standard, and that was a convincer for me.

Today, we're going to talk about how we can optimize our Go program, and more specifically, how it would be good to use structs in Go. Let's first say what a structure is:

A struct is a user-defined data type that groups related variables of different types under a single name.

To fully understand where the problem lies, we will mention that modern processors do not read 1 byte at a time from the memory. How the CPU fetches the data or instructions which are stored in the memory?

In computer architecture, a word is a unit of data that a processor can handle in a single operation - generally the smallest addressable unit of memory. It's a fixed-size group of bits (binary digits). The word size of a processor determines its ability to handle data efficiently. Common word sizes include 8, 16, 32, and 64 bits. Some computer processor architectures support a half word, which is half the number of bits in a word, and a double word, which is two contiguous words.

In nowday the most common architectures is 32 bit and 64 bit. If you have 32 bit processor then it means it can access 4 bytes at a time which means word size is 4 bytes. If you have 64 bit processor the it can access 8 bytes at time which means word size is 8 bytes.

When we store the data in memory, each 32-bit data word has a unique address, as shown below.

Word-Addressable Memory

Figure. 1 ‑ Word-Addressable Memory

We can read the data in memory and load it to one register using the load word (lw) instruction.

After know the above theoria let's see what is the practise. For the descripte the cases with structure data structure, I will demonstrate with C languge. A struct in C is a composite data type that allows you to group multiple variables together and store them in the same block of memory. As we said early the CPU access the data depend of the givemn architecture. Every data type in C will have alignment requirements.

So let's we have the following as simple structure:

// structure 1
typedef struct example_1 {
    char c;
    short int s;
} struct1_t;


// structure 2
typedef struct example_2 {
    double d;
    int s;
    char c;
} struct2_t;

And now try to calculate the size of the following structures:

Size of Structure 1 = Size of (char + short int) = 1 + 2 = 3.

Size of Structure 2 = Size of (double + int + char) = 8 + 4 + 1= 13.

The real sizes using a C program might surprise you.

#include <stdio.h>


// structure 1
typedef struct example_1 {
    char c;
    short int s;
} struct1_t;

// structure 2
typedef struct example_2 {
    double d;
    int s;
    char c;
} struct2_t;

int main()
{
    printf("sizeof(struct1_t) = %lu\n", sizeof(struct1_t));
    printf("sizeof(struct2_t) = %lu\n", sizeof(struct2_t));

    return 0;
}

Output

sizeof(struct1_t) = 4
sizeof(struct2_t) = 16

As we can see, the size of the structures is different from those we calculated.

What is the reason for that?

C and Go employ a technique known as "struct padding" to ensure data is appropriately aligned in memory, which can significantly affect performance due to hardware and architectural constraints. Data padding and alignment conform to the requirements of the system's architecture, mainly to optimize CPU access times by ensuring data boundaries align with word sizes.

Let's go through an example to illustrate how Go handles padding and alignment, consider the following struct:

type Employee struct {
  IsAdmin  bool
  Id       int64
  Age      int32
  Salary   float32
}

A bool is 1 byte, int64 is 8 bytes, int32 is 4 bytes and float32 is 4 bytes = 17 bytes(total).

Let's validate the struct size by examining the compiled Go program:

package main

import (
    "fmt"
    "unsafe"
)

type Employee struct {
    IsAdmin bool
    Id      int64
    Age     int32
    Salary  float32
}

func main() {

    var emp Employee

    fmt.Printf("Size of Employee: %d\n", unsafe.Sizeof(emp))
}

Output

Size of Employee: 24

The reported size is 24 bytes, not 17. This discrepancy is due to memory alignment. To understand how alignment works, we need to inspect the structure and visualize the memory which it ocupate.

Unoptimized Memory Layout

Figure 2 - Unoptimized Memory Layout

The struct Employee will consume 8*3 = 24 bytes. Yo see the problem now, there are a lot of empty holes in the layout of Employee (those gaps created by the alignment rules are called “padding”).

Padding Optimization and Performance Impact

Understanding how memory alignment and padding can affect the performance of an application is crucial. Specifically, data alignment impacts the number of CPU cycles required to access fields within a struct. This influence arises mostly from CPU cache effects, rather than raw clock cycles themselves, as cache behavior depends heavily on data locality and alignment within memory blocks.

Modern CPUs fetch data from memory into a faster intermediary called cache, organized in fixed-size blocks (commonly 64 bytes). When data is well-aligned and localized within the same or fewer cache lines, the CPU can access it more quickly due to reduced cache loading operations.

Consider the following Go structures to illustrate poor versus optimal alignment:

// Poorly aligned struct
type Misaligned struct {
    Age        uint8  // Uses 1 byte, followed by 7 bytes of padding to align the next field
    PassportId uint64 // 8-byte aligned uint64 for the passport ID
    Children   uint16 //2-byte aligned uint16

// Well-aligned struct
type Aligned struct {
    Age        uint8  // Starting with 1 byte
    Children   uint16 // Next, 2 bytes; all these combine into a 3-byte sequence
    PassportId uint64 // Finally, an 8-byte aligned uint64 without needing additional padding
}

How Alignment Affects Performance

CPU reads data in word-size instead of byte-size. As I described in the begining a word in a 64-bit system is 8 bytes, while a word in a 32-bit system is 4 bytes. In short, CPU reads address in the multiple of its word size. To fetch the variable passportId, our CPU takes two cycles to access the data instead of one. The first cycle will fetch memory 0 to 7 and the subsequent cycle will fetch the rest. And this is inefficient- we need of data structure alignment.By simply aligning the data, computers ensure that the var passportId can be retrieved in ONE CPU cycle.

Comparing Memory Access Efficiency

Figure 3 - Comparing Memory Access Efficiency

Padding is the key to achieving data alignment. Padding occurs because modern CPUs are optimized to read data from memory at aligned addresses. This alignment allows the CPU to read the data in a single operation.

Simply aligning the data

Figure 4 - Simply aligning the data

Without padding, data may be misaligned, leading to multiple memory accesses and slower performance. Therefore, while padding might waste some memory, it ensures that your program runs efficiently.

Padding Optimization Strategies

Aligned struct consumes lesser memory simply because it possesses a better struct fields order compared to Misaligned. Because of padding, two 13 bytes data structures turn out to be 16 bytes and 24 bytes respectively. Hence, you save up extra memory by simply reordering your struct fields.

Optimizing Field Order

Figure 5 - Optimizing Field Order

Improperly aligned data can slow performance as the CPU might need multiple cycles to access misaligned fields. Conversely, correctly aligned data minimizes cache line loads, which is crucial for performance, especially in systems where memory speed is a bottleneck.

Let’s do a simple benchmark to prove it:

var AlignedArr []Aligned
var MisalignedArr []Misaligned

func init() {
    const sampleSize = 1000
    AlignedArr = make([]Aligned, sampleSize)
    MisalignedArr = make([]Misaligned, sampleSize)
    for i := 0; i < sampleSize; i++ {
        AlignedArr[i] = Aligned{Age: uint8(i % 256), Siblings: uint16(i), Children: uint64(i)}
        MisalignedArr[i] = Misaligned{Age: uint8(i % 256), PassportId: uint64(i), Children: uint16(i)}
    }
}

func traverseAligned() uint16 {
    var arbitraryNum uint16
    for _, item := range AlignedArr {
        arbitraryNum += item.Siblings
    }
    return arbitraryNum
}

func traverseMisaligned() uint16 {
    var arbitraryNum uint16
    for _, item := range MisalignedArr {
        arbitraryNum += item.Children
    }
    return arbitraryNum
}

func BenchmarkTraverseAligned(b *testing.B) {
    for n := 0; n < b.N; n++ {
        traverseAligned()
    }
}

func BenchmarkTraverseMisaligned(b *testing.B) {
    for n := 0; n < b.N; n++ {
        traverseMisaligned()
    }
}

Output

go test -bench=.
goos: linux
goarch: amd64
pkg: test-project
cpu: 11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz
BenchmarkTraverseAligned-16              3022234               403.7 ns/op
BenchmarkTraverseMisaligned-16           4300167               299.1 ns/op
PASS
ok      test-project    3.195s

As you can see the traversing the Aligned indeed takes lesser time than its counterpart.

Padding’s added to make sure each struct field lines up properly in memory based on its needs, like we saw earlier. But while it enables efficient access, padding can also waste space if the fields ain’t ordered well.

Understanding how to properly align struct fields to minimize memory waste due to padding is important for efficient memory usage, especially in performance-critical applications. Below, I will provide an example with a poorly aligned structure and then show an optimized version of the same structure.

In a poorly aligned struct, fields are ordered without considering their sizes and alignment requirements, which can lead to added padding and increased memory usage:

// Badly aligned structure
type Person struct {
    Active   bool      // 1 byte + 7 bytes padding
    Salary   float64   // 8 bytes
    Age      int32     // 4 bytes + 4 bytes padding
    Nickname string    // 16 bytes (string is typically 16 bytes on a 64-bit system)
}

The total memory could hence be 1 (bool) + 7 (padding) + 8 (float64) + 4 (int32) + 4 (padding) + 16 (string) = 40 bytes.

An optimized structure arranges fields from largest to smallest size, significantly reducing or eliminating the need for additional padding:

// Well-aligned structure
type Person struct {
    Salary   float64   // 8 bytes
    Nickname string    // 16 bytes
    Age      int32     // 4 bytes
    Active   bool      // 1 byte + 3 bytes padding
}

The total memory would then neatly comprise 8 (float64) + 16 (string) + 4 (int32) + 1 (bool) + 3 (padding) = 32 bytes.

Let's proof the above:

package main

import (
    "fmt"
    "unsafe"
)

type PoorlyAlignedPerson struct {
    Active   bool
    Salary   float64
    Age      int32
    Nickname string
}

type WellAlignedPerson struct {
    Salary   float64
    Nickname string
    Age      int32
    Active   bool
}

func main() {
    poorlyAligned := PoorlyAlignedPerson{}
    wellAligned := WellAlignedPerson{}

    fmt.Printf("Size of PoorlyAlignedPerson: %d bytes\n", unsafe.Sizeof(poorlyAligned))
    fmt.Printf("Size of WellAlignedPerson: %d bytes\n", unsafe.Sizeof(wellAligned))
}

Output

Size of PoorlyAlignedPerson: 40 bytes
Size of WellAlignedPerson: 32 bytes

Reducing the structure size from 40 bytes to 32 bytes means a 20% reduction in memory usage per instance of Person. This can lead to considerable savings in applications where many such instances are created or stored, improving cache efficiency and potentially reducing the number of cache misses.

Conclusion

Data alignment is a pivotal factor in optimizing memory utilization and enhancing system performance. By arranging struct data correctly, memory usage becomes not only more efficient but also faster in terms of CPU read times, contributing significantly to overall system efficiency.

Featured ones: