Logo

dev-resources.site

for different kinds of informations.

$ugar's GC and Ownership Model

Published at
10/11/2023
Categories
newlanguage
memorymanagement
ownership
garbagecollector
Author
corarivermaxwell
Author
16 person written this
corarivermaxwell
open
$ugar's GC and Ownership Model

Let's talk about the feature I am most excited about in $ugar, its multi-memory management scheme. In my first blog post, I discussed that $ugar would allow you to choose to use the Ownership Model or a Garbage Collector to handle a variable's memory.

This will work by having two heaps: a rust heap and trash heap. The rust heap will be for memory that uses the Ownership Model and the trash heap will be for memory that will be picked up by the garbage collector.

Stack Size Prediction

Like most programs, you will be able to define the amount of memory allocated to the program, however you will also be able to define the size of the rust and trash heaps. If a program does not have any recursive functions, the $ugar compiler should be able to measure the maximum size of the stack, and if a program does include recursive functions, the $ugar compiler will have a default recursive padding number (that can be modified) that will be used to guess the maximum size of the stack.

For example, for these sets of code

pub mut fn Main {
    let x = Factorial 6;
    Std::WriteLn x;
}

pub rec fn Factorial $ i32 x : i32 {
    x < 0 ? => {
        throw ArgumentOutOfRangeError(x ++ " is negative.");
    } else x == 0 ? => {
        return 1;
    } else {
        return x * Factorial $ x - 1;
    }
}
Enter fullscreen mode Exit fullscreen mode
pub mut fn Main {
    let x = Factorial 6;
    Std::WriteLn x;
}

pub fn Factorial $ i32 x : i32 {
    x < 0 ? throw ArgumentOutOfRangeError(x ++ " is negative.");
    let mut output = 1;
    loop i in 1..x {
        output *= i;
    }
    return output;
}
Enter fullscreen mode Exit fullscreen mode

In the scope of Main, a word (2 bytes) is allocated for the variable x, which will call Factorial. For the non recursive version, Factorial will allocate 2 words for the input x and the variable output, then in the loop scope allocate another word for i. In total, for the non recursive version, the maximum stack size is 4 words. For the recursive version, we can't know at compile time the maximum size of the stack (this is if we don't use tail-recursion optimization) so we assume the function will have a maximum depth of the default padding number (let's say 10,000 for now). So Main will allocate a word for x, then Factorial will allocate two words: one for the input x and another for the function call to Factorial. Again we're assuming Factorial will call itself 10,000 times, so the compiler will say a maximum stack size of 20,001 words or approximately 40 KB.

You can attach a padding attribute to the recursive function to change the recursive padding number for that function.

#Mem::recursive_padding=100
pub rec fn Factorial $ i32 a : i32 {
    x < 0 ? => {
        throw ArgumentOutOfRangeError(x ++ " is negative.");
    } else x == 0 ? => {
        return 1;
    } else {
        return x * Factorial $ x - 1;
    }
}
Enter fullscreen mode Exit fullscreen mode

This will bring the maximum stack size for the recursive version to 201 words or 402 bytes.

The Trash and The Rust Heaps

This is important to give enough memory to prevent stack overflows, but not steal memory from the rust and trash heaps. By default, the trash heap will take up 50% of the remaining reserved space, and the trash heap will be divvied up between a few separate sections for tracking the lifetimes of trash allocated data and minimizing the performance cost of the GC. The rust heap will take up the remaining space and will be handled by the Ownership model of memory management.

The compiler will effectively treat data on the stack, the rust heap, and trash heap differently. Let's take a look at some examples.

pub mut fn Main {
    let mut data = MakeVec;
    data.Push 3;
}

pub fn MakeVec : &mut ArrayList<i32> {
    ArrayList<i32> mut values = Vector::new;
    loop i in 1..=10 {
        values.Push i;
    }
    return &mut values;
}
Enter fullscreen mode Exit fullscreen mode

This code should ring some alarm bells for Rustaceans, however this will be perfectly valid code in $ugar, since by default, if data needs to be allocated to a heap, data will be allocated to the trash heap. So the lifetime of values is defined by if there's a reference to it. Let's look at what happens if we allocate values onto the rust heap instead.

pub mut fn Main {
    let mut data = MakeVec;
    data.Push 3;
}

pub fn MakeVec : &mut ArrayList<i32> {
    oxy ArrayList<i32> mut values = Vector::new;
    loop i in 1..=10 {
        values.Push i;
    }
    return &mut values;
}
Enter fullscreen mode Exit fullscreen mode

Now this code will throw a compiler warning saying that values does not live long enough to return a reference to it. You could rewrite the code like this:

pub mut fn Main {
    let mut data = MakeVec;
    data.Push 3;
}

pub fn MakeVec : ArrayList<i32> {
    oxy ArrayList<i32> mut values = Vector::new;
    loop i in 1..=10 {
        values.Push i;
    }
    return values;
}
Enter fullscreen mode Exit fullscreen mode

This code will compile, however the internal workings is different than the version using the trash heap. For the trash heap, values is allocated onto the trash heap which will be checked periodically by the GC to see if it needs to be cleaned, and the reference to values will be returned from the function, and as long as someone owns a reference to values, values will survive in the trash heap. Alternatively for the oxidized version, values is allocated onto the rust heap, and since the data will be dropped at the end of the end of the function's scope, you cannot return a reference to it, so instead the function will transfer ownership of values to another variable (in this case data in Main), and now values will only be dropped when its new owner goes out of scope.

The trash heap allows you to write code without having to think about lifetimes, whereas if you use the oxy keyword, you're opting into having to deal with lifetimes.

The Borrow Checker

The two different heaps do not however allow you to bypass the borrow checker, since the borrow checker is not about memory management, but memory safety. The borrow checker is built to keep your memory safe (meaning no seg faults, not necessarily no memory leaks). For example, one big thing that the borrow checker prevents is moving an object via reference while there are multiple references to it. When you move an object, you are updating the reference you are using for said object, but every other reference is not updated (it is very difficult and impractical or downright impossible to track every reference to an object, and thus update every reference).

public static void Main(string[] args) {
    string value1 = "Hello";
    string value2 = value1;
    value1 += " Mom!";
    Console.WriteLine(value1);
    Console.WriteLine(value2);
}
Enter fullscreen mode Exit fullscreen mode

This C# program has unexpected behavior. Assigning "Hello" to value1, allocates memory for the string and makes value1 point to that memory so value1 is really just a pointer. Assigning value1 to value2, sets value2 to the same pointer as value1. However when we mutate value1, we have to move the data to another location. The reason why is say we allocate memory for the string "Hello" and then we allocate memory for something else, that something else is preventing "Hello" from growing, and we want the data to be contiguous, so we have to allocate another region of memory which is increased in size, then we copy "Hello" to this new region, and append the new characters onto that string. So what happens in this program is that the first WriteLine call outputs "Hello Mom!" while the second WriteLine call outputs "Hello". This is because when value1 moves, value2 is still pointing to its original spot, and since there is still a reference to "Hello", the Garbage Collector does not dispose of it. So if we have multiple references to an object and a possible mutation of that object will change its location. There is a better way to write this program, without using an unupdated reference.

public static void Main(string[] args) {
    string value1 = "Hello";
    string value2 = value1 + " Mom!";
    Console.WriteLine(value1);
    Console.WriteLine(value2);
}
Enter fullscreen mode Exit fullscreen mode

This code makes more sense. In reality, the example before this one is unsafe and should not be written in any context. The borrow checker maintains safe code. When coding in rust however, you have to deal with lifetimes at the same time as dealing with references and transferring ownership, which can slow the initial prototyping stage of developing a project. So $ugar allows you to use the borrow checker alongside a GC for safety and speed (of producing code), and then refactor your code to use the Ownership model for performance.

Featured ones: