dev-resources.site
for different kinds of informations.
Typed integers in Rust for safer Python bytecode compilation
Shortly after I shared my previous post, a helpful redditor pointed out that the typed integers I alluded to is known as the newtype pattern in Rust, a design that allows you to define types purely for compile-time safety and with no runtime hit.
And here I thought I invented it! Alas. đ
I wanted to share a few details about what challenge I encountered and how I solved it.
The problem
Like I mentioned in the previous post, bytecode is a lower-level representation of your code created for efficient evaluation. One of the optimizations is that variable names are converted into indices during bytecode compilation, which supports faster lookup at runtime when the VM pumps through your bytecode.
At one point while implementing this myself, I was debugging a sequence that felt like this. (Iâm using the wishy-washy âfelt likeâ here because I have no idea what the actual bytecode looked like.)
LOAD_GLOBAL 0
LOAD_FAST 0
LOAD_CONST 0
Okay, so weâre loading the first global, the first local, and the first constant. Given the ongoing evolution of my VM design and implementation, the data structures Iâm using to communicate the mappings of these indices to their symbol names and/or values between the compiler and VM have been in a steady flux.
Iâd sit down to implement the execution of a given opcode in the VM, see that I was handed the value 0
, and Iâd think, âWhat do they want me to do with this?!â (âtheyâ in the case being myself from 5 minutes prior hacking on the compiler stage.)
After interpreting a 0
wrong for the forth or fifth time, I decided THERE HAS TO BE A BETTER WAY. I asked myself, âIs there a way to get the compiler to tell me when Iâm using a 0
incorrectly?â In addition to compile-time checking, with rust-analyzer
configured in my Neovim, I should get a type error as soon as I made the mistake.
Could I add types to my integers?!
The CPython bytecode docs even hint at these distinctions, but incorporating them into my own implementation wasnât immediately clear until I experienced the pain firsthand. For my three opcodes above, the respective documentation is:
LOAD_GLOBAL(namei)
Loads the global named co_names[namei>>1] onto the stack.
LOAD_FAST(var_num)
Pushes a reference to the local co_varnames[var_num] onto the stack.
LOAD_CONST(consti)
Pushes co_consts[consti] onto the stack.
co_names
, co_varnames
, and co_consts
are three fields on a CPython code object, which is an immutable piece of compiled bytecode. In my interpreter, Iâm using names
and varnames
to show my originality.
(Note to self: I treat constants separately at the moments, but I should probably unify it as I solidify my understanding of code objects.)
(Another note to self: how curious that they are shifting namei
to the right during LOAD_GLOBAL
. I hope to one day know why.)
The solution
I added typed integers to keep myself sane. They offered an additional benefit of providing me an opportunity to gain more experience with generics in Rust!
Here is a subset of my Opcode
implementation.
pub enum Opcode {
/// Push the value found at the specified index in the constant pool onto the stack.
LoadConst(ConstantIndex),
/// Read the local variable indicated by the specified index and push the value onto the stack.
LoadFast(LocalIndex),
/// Read the global variable indicated by the specified index and push the value onto the stack.
LoadGlobal(NonlocalIndex),
}
My hope here is these new types, ConstantIndex
, LocalIndex
, and NonlocalIndex
, would semantically illustrate the behavior of each opcode while providing type safety.
Generics entered the scene next as I was hoping to implement this with minimal code reuse.
Here is the next layer of the onion.
pub type ConstantIndex = Index<ConstantMarker>;
pub type LocalIndex = Index<LocalMarker>;
pub type NonlocalIndex = Index<NonlocalMarker>;
Weâre beginning to see some code reuseâawesome!
What are these marker types? They are truly that: a type which is literally just a marker with no other data. These empty types are enforced at compile-time and do nothing at runtime.
pub struct ConstantMarker;
pub struct LocalMarker;
pub struct NonlocalMarker;
In Rust, PhantomData<T>
from std::marker
allows you to include a type in a struct purely for compile-time checks without affecting runtime performanceâexactly the type safety for which weâre seeking! Iâm using usize
for the value because my use-case was indices, which should never be negative.
/// An unsigned integer wrapper which provides type safety. This is particularly useful when
/// dealing with indices used across the bytecode compiler and the VM as common integer values such
/// as 0, 1, etc, can be interpreted many different ways.
#[derive(Copy, Clone, PartialEq, Hash, Eq)]
pub struct Index<T> {
value: usize,
_marker: PhantomData<T>,
}
impl<T> Index<T> {
pub fn new(value: usize) -> Self {
Self {
value,
_marker: PhantomData,
}
}
}
impl<T> Deref for Index<T> {
type Target = usize;
fn deref(&self) -> &Self::Target {
&self.value
}
}
impl<T> Display for Index<T> {
fn fmt(&self, f: &mut Formatter) -> Result<(), Error> {
write!(f, "{}", self.value)
}
}
The last piece of magic is impl<T> Deref for Index<T>
, which allows us to treat instances of our new type as integers when dereferenced.
As a result, a piece of code like this would pass with flying colors.
#[test]
fn test_dereference() {
let index: LocalIndex = Index::new(4);
assert_eq!(*index, 4)
}
By this point, we are free to use these in the bytecode compiler! Most places in the code donât require specifying the generic because it will be inferred by the function signature, like in this example.
fn get_or_set_local_index(&mut self, name: &str) -> LocalIndex {
if let Some(index) = self.get_local_index(name) {
index
} else {
let code = self.ensure_code_object_mut();
let new_index = code.varnames.len();
code.varnames.push(name.to_string());
Index::new(new_index)
}
}
We initialize a new index with Index::new(new_index)
, which the compiler knows to treat as an Index<LocalMarker>
, which we have aliased to LocalIndex
to allow us to be blissfully ignorant VM developers.
The end
This is a small but powerful example of how Iâm falling head-over-heels for Rustâs type system. I love writing expressive code to implement core features from scratch in a way which manages complexity and makes people smile.
My mentor at my first big-boy job was a wizard at this and I credit him for showing me what is possible: spending more time thinking about what youâre trying to build rather than being bogged down by how you are trying to build it.
Have you used Rustâs type system in any fun and creative ways? Or done something similar in another language? Iâd love to hear your thoughts in the comments. Be well!
Subscribe & Save [on nothing]
If youâd like to get more posts like this directly to your inbox, you can subscribe here!
Work With Me
I mentor software engineers to navigate technical challenges and career growth in a supportive sometimes silly environment. If youâre interested, you can book a session.
Elsewhere
In addition to mentoring, I also write about my experience navigating self-employment and late-diagnosed autism. Less code and the same number of jokes.
- Lake-Effect Coffee, Chapter 3 - From Scratch dot org
Featured ones: