Logo

dev-resources.site

for different kinds of informations.

Typed integers in Rust for safer Python bytecode compilation

Published at
1/13/2025
Categories
rust
python
programming
Author
jonesbeach
Categories
3 categories in total
rust
open
python
open
programming
open
Author
10 person written this
jonesbeach
open
Typed integers in Rust for safer Python bytecode compilation

Shortly after I shared my previous post, a helpful redditor pointed out that the typed integers I alluded to is known as the newtype pattern in Rust, a design that allows you to define types purely for compile-time safety and with no runtime hit.

And here I thought I invented it! Alas. 😂

I wanted to share a few details about what challenge I encountered and how I solved it.

The problem

Like I mentioned in the previous post, bytecode is a lower-level representation of your code created for efficient evaluation. One of the optimizations is that variable names are converted into indices during bytecode compilation, which supports faster lookup at runtime when the VM pumps through your bytecode.

At one point while implementing this myself, I was debugging a sequence that felt like this. (I’m using the wishy-washy “felt like” here because I have no idea what the actual bytecode looked like.)

LOAD_GLOBAL 0
LOAD_FAST 0
LOAD_CONST 0
Enter fullscreen mode Exit fullscreen mode

Okay, so we’re loading the first global, the first local, and the first constant. Given the ongoing evolution of my VM design and implementation, the data structures I’m using to communicate the mappings of these indices to their symbol names and/or values between the compiler and VM have been in a steady flux.

I’d sit down to implement the execution of a given opcode in the VM, see that I was handed the value 0, and I’d think, “What do they want me to do with this?!” (“they” in the case being myself from 5 minutes prior hacking on the compiler stage.)

After interpreting a 0 wrong for the forth or fifth time, I decided THERE HAS TO BE A BETTER WAY. I asked myself, “Is there a way to get the compiler to tell me when I’m using a 0 incorrectly?” In addition to compile-time checking, with rust-analyzer configured in my Neovim, I should get a type error as soon as I made the mistake.

Could I add types to my integers?!

The CPython bytecode docs even hint at these distinctions, but incorporating them into my own implementation wasn’t immediately clear until I experienced the pain firsthand. For my three opcodes above, the respective documentation is:

LOAD_GLOBAL(namei)
Loads the global named co_names[namei>>1] onto the stack.

LOAD_FAST(var_num)
Pushes a reference to the local co_varnames[var_num] onto the stack.

LOAD_CONST(consti)
Pushes co_consts[consti] onto the stack.
Enter fullscreen mode Exit fullscreen mode

co_names, co_varnames, and co_consts are three fields on a CPython code object, which is an immutable piece of compiled bytecode. In my interpreter, I’m using names and varnames to show my originality.

(Note to self: I treat constants separately at the moments, but I should probably unify it as I solidify my understanding of code objects.)

(Another note to self: how curious that they are shifting namei to the right during LOAD_GLOBAL. I hope to one day know why.)

The solution

I added typed integers to keep myself sane. They offered an additional benefit of providing me an opportunity to gain more experience with generics in Rust!

Here is a subset of my Opcode implementation.

pub enum Opcode {
    /// Push the value found at the specified index in the constant pool onto the stack.
    LoadConst(ConstantIndex),
    /// Read the local variable indicated by the specified index and push the value onto the stack.
    LoadFast(LocalIndex),
    /// Read the global variable indicated by the specified index and push the value onto the stack.
    LoadGlobal(NonlocalIndex),
}
Enter fullscreen mode Exit fullscreen mode

My hope here is these new types, ConstantIndex, LocalIndex, and NonlocalIndex, would semantically illustrate the behavior of each opcode while providing type safety.

Generics entered the scene next as I was hoping to implement this with minimal code reuse.

Here is the next layer of the onion.

pub type ConstantIndex = Index<ConstantMarker>;
pub type LocalIndex = Index<LocalMarker>;
pub type NonlocalIndex = Index<NonlocalMarker>;
Enter fullscreen mode Exit fullscreen mode

We’re beginning to see some code reuse–awesome!

What are these marker types? They are truly that: a type which is literally just a marker with no other data. These empty types are enforced at compile-time and do nothing at runtime.

pub struct ConstantMarker;
pub struct LocalMarker;
pub struct NonlocalMarker;
Enter fullscreen mode Exit fullscreen mode

In Rust, PhantomData<T> from std::marker allows you to include a type in a struct purely for compile-time checks without affecting runtime performance—exactly the type safety for which we’re seeking! I’m using usize for the value because my use-case was indices, which should never be negative.

/// An unsigned integer wrapper which provides type safety. This is particularly useful when
/// dealing with indices used across the bytecode compiler and the VM as common integer values such
/// as 0, 1, etc, can be interpreted many different ways.
#[derive(Copy, Clone, PartialEq, Hash, Eq)]
pub struct Index<T> {
    value: usize,
    _marker: PhantomData<T>,
}

impl<T> Index<T> {
    pub fn new(value: usize) -> Self {
        Self {
            value,
            _marker: PhantomData,
        }
    }
}

impl<T> Deref for Index<T> {
    type Target = usize;

    fn deref(&self) -> &Self::Target {
        &self.value
    }
}

impl<T> Display for Index<T> {
    fn fmt(&self, f: &mut Formatter) -> Result<(), Error> {
        write!(f, "{}", self.value)
    }
}
Enter fullscreen mode Exit fullscreen mode

The last piece of magic is impl<T> Deref for Index<T>, which allows us to treat instances of our new type as integers when dereferenced.

As a result, a piece of code like this would pass with flying colors.

#[test]
fn test_dereference() {
    let index: LocalIndex = Index::new(4);
    assert_eq!(*index, 4)
}
Enter fullscreen mode Exit fullscreen mode

By this point, we are free to use these in the bytecode compiler! Most places in the code don’t require specifying the generic because it will be inferred by the function signature, like in this example.

fn get_or_set_local_index(&mut self, name: &str) -> LocalIndex {
    if let Some(index) = self.get_local_index(name) {
        index
    } else {
        let code = self.ensure_code_object_mut();
        let new_index = code.varnames.len();
        code.varnames.push(name.to_string());
        Index::new(new_index)
    }
}
Enter fullscreen mode Exit fullscreen mode

We initialize a new index with Index::new(new_index), which the compiler knows to treat as an Index<LocalMarker>, which we have aliased to LocalIndex to allow us to be blissfully ignorant VM developers.

The end

This is a small but powerful example of how I’m falling head-over-heels for Rust’s type system. I love writing expressive code to implement core features from scratch in a way which manages complexity and makes people smile.

My mentor at my first big-boy job was a wizard at this and I credit him for showing me what is possible: spending more time thinking about what you’re trying to build rather than being bogged down by how you are trying to build it.

Have you used Rust’s type system in any fun and creative ways? Or done something similar in another language? I’d love to hear your thoughts in the comments. Be well!


Subscribe & Save [on nothing]

If you’d like to get more posts like this directly to your inbox, you can subscribe here!

Work With Me

I mentor software engineers to navigate technical challenges and career growth in a supportive sometimes silly environment. If you’re interested, you can book a session.

Elsewhere

In addition to mentoring, I also write about my experience navigating self-employment and late-diagnosed autism. Less code and the same number of jokes.

rust Article's
30 articles in total
Favicon
Mastering Rust's 'unsafe' Code: Balancing Safety and Performance in Systems Programming
Favicon
Meme Tuesday 🚱
Favicon
Pulumi WASM/Rust devlog #3
Favicon
Typed integers in Rust for safer Python bytecode compilation
Favicon
### **Exploring Embedded Systems Development with Rust**
Favicon
Scan Your Linux Disk and Visualize It on Mac with GrandPerspective
Favicon
Diesel vs SQLx in Raw and ORM Modes
Favicon
C++ or Rust? I'd stick to my good old C++
Favicon
Rust
Favicon
Building a Developer-Focused Search Engine in Rust: Lessons Learned and Challenges Overcome 🚀
Favicon
A Gentle Introduction to WebAssembly in Rust (2025 Edition)
Favicon
Stable Memory In Internet Computer
Favicon
Solana Account Model Simplified
Favicon
Rust Frameworks
Favicon
Mastering Rust's Type System: A Comprehensive Guide for Robust and Efficient Code
Favicon
🚀 Intrepid AI 0.10: Ready for Liftoff!
Favicon
Swiftide 0.16 brings AI agents to Rust
Favicon
Introducing Yamaswap
Favicon
Rust and Generative AI: Creating High-Performance Applications
Favicon
Rust: Matchy Matchy
Favicon
Rust registry error "candidate versions found which didn't match"
Favicon
Mastering Rust Lifetimes: Advanced Techniques for Safe and Efficient Code
Favicon
Rust
Favicon
让安卓手机不再吃灰:在安卓手机上搭建 Rust 开发环境
Favicon
How to test Asynchronous Rust Programs with Tokio [TUTORIAL]
Favicon
SSH port forwarding from within code
Favicon
Reto de Rust 365 días, 2025!!
Favicon
Rust-Powered Password Decrypter: Find the String Behind the Hash! 🦀🔒
Favicon
🚀 Rust Basics 5: Structs and Enums in Rust
Favicon
Rust mega-tutotial

Featured ones: