dev-resources.site
for different kinds of informations.
How I added support for nested functions in Python bytecode
I wanted to share some pretty cool stuff Iâve been learning about Python bytecode with you, including how I added support for nested functions, but my guy at the printing press said I needed to keep it under 500 words.
Itâs a holiday week, he shrugged. What do you expect me to do?
Excluding code snippets, I bargained.
Fine, he ceded.
Do you know why we use bytecode in the first place?
I just operate the printing press, I trust you though.
Fair enough. Letâs begin.
Why we use bytecode in the first place
Memphis, my Python interpreter written in Rust, has two execution engines. Neither can run all code but both can run some code.
My treewalk interpreter is what you would build if you didnât know what you were doing. đââď¸Â You tokenize the input Python code, generate an abstract syntax tree (AST), and then walk the tree and evaluate each node. Expressions return values and statements modify the symbol table, which is implemented as a series of scopes which respect Python scoping rules. Just remember the easy pneumonic LEGB: local, enclosing, global, builtin.
My bytecode VM is what you would build if you didnât know what you were doing but wanted to act like you did. Also đââď¸. For this engine, the tokens and AST work the same, but rather than walking we take off sprinting. We compile the AST into an intermediate representation (IR) hereafter known as bytecode. We then create a stack-based virtual machine (VM), which conceptually acts like a CPU, executing bytecode instructions in sequence, but itâs implemented entirely in software.
(For a complete guide of both approaches without the ramblings, Crafting Interpreters is excellent.)
Why do we do this in the first place? Just remember the two Ps: portability and performance. Remember how in the early 2000s nobody would shut up about how Java bytecode was portable? All you need is a JVM and you can run a Java program compiled on any machine! Python chose not to go with this approach for both technical and marketing reasons, but in theory the same principles apply. (In practice, the compilation steps are different and I regret opening this can of worms.)
Performance is the biggie though. Rather than traversing an AST multiple times during the lifetime of a program, the compiled IR is a more efficient representation. We see improved performance from avoiding the overhead of repeatedly traversing an AST, and its flat structure often results in better branch prediction and cache locality at runtime.
(I donât blame you for not thinking about caching if you donât have a background in computer architectureâheck, I began my career in that industry and I think about caching far less than I think about how to avoid writing the same line of code twice. So just trust me on the performance piece. Thatâs my leadership style: blind trust.)
Hey buddy, thatâs 500 words. We need to load up the frame and let âer rip.
Already?! You excluded code snippets?
There are no code snippets, my man.
Okay okay. Just 500 more. I promise.
Context matters for Python variables
I got kinda far before tabling my bytecode VM implementation about a year ago: I could define Python functions and classes and call those functions and instantiate those classes. I clamped down this behavior with some tests. But I knew my implementation was messy and that Iâd need to revisit the fundamentals before adding more fun stuff. Now itâs Christmas week and I want to add fun stuff.
Consider this snippet for calling a function, keeping an eye on the TODO
.
fn compile_function_call(
&mut self,
name: &str,
args: &ParsedArguments)
) -> Result<Bytecode, CompileError> {
let mut opcodes = vec![];
// We push the args onto the stack in reverse call order so that we will pop
// them off in call order.
for arg in args.args.iter().rev() {
opcodes.extend(self.compile_expr(arg)?);
}
let (_, index) = self.get_local_index(name);
// TODO how does this know if it is a global or local index? this may not be the right
// approach for calling a function
opcodes.push(Opcode::Call(index));
Ok(opcodes)
}
Are you done considering? We load the function arguments onto the stack and âcall the functionâ. In bytecode, all names are converted into indices (because index access is faster during the VM runtime), but we donât really have a way to know whether we are dealing with a local index or a global index here.
Now consider the improved version.
fn compile_function_call(
&mut self,
name: &str,
args: &ParsedArguments)
) -> Result<Bytecode, CompileError> {
let mut opcodes = vec![self.compile_load(name)];
// We push the args onto the stack in reverse call order so that we will pop
// them off in call order.
for arg in args.args.iter().rev() {
opcodes.extend(self.compile_expr(arg)?);
}
let argc = opcodes.len() - 1;
opcodes.push(Opcode::Call(argc));
Ok(opcodes)
}
Thank you for considering that code.
We now supported nested function calls! What changed?
- The
Call
opcode now takes a number of positional arguments, rather than an index to the function. This instructs the VM how many arguments to pop off the stack before calling the function. - After popping the arguments off the stack, the function itself will be left on the stack and
compile_load
has already handled local versus global scope for us.
LOAD_GLOBAL versus LOAD_FAST
Letâs take a look at what compile_load
is doing.
fn compile_load(&mut self, name: &str) -> Opcode {
match self.ensure_context() {
Context::Global => Opcode::LoadGlobal(self.get_or_set_nonlocal_index(name)),
Context::Local => {
// Check locals first
if let Some(index) = self.get_local_index(name) {
return Opcode::LoadFast(index);
}
// If not found locally, fall back to globals
Opcode::LoadGlobal(self.get_or_set_nonlocal_index(name))
}
}
}
There are several key principles in action here:
- We match based on the current context. Adhering to Python semantics, we can consider
Context::Global
to be at the top level of any module (not just your scriptâs entrypoint), andContext::Local
is inside any block (i.e. function definition or class definition). - We now differentiate between a local index and a non-local index. (Because I was going crazy trying to decipher what the index
0
referred to in different places, I introduced typed-integers.LocalIndex
andNonlocalIndex
provide type-safety for otherwise untyped unsigned integers. I may write about this in the future!) - We can tell at bytecode-compilation time whether a local variable exists with a given name, and if it does not, at runtime we will search for a global variable. This speaks to the dynamism built into Python: as long as a variable is present in the global scope of that module by the time a function executes, its value can be resolved at runtime. However, this dynamic resolution comes with a performance hit. While local variable lookups are optimized to use stack indices, global lookups require searching the global namespace dictionary, which is slower. This dictionary is a mapping of names to objects, which themselves may live on the heap. Who knew that the saying âThink globally, act locally.â was actually referring to Python scopes?
Whatâs in a varname?
The last thing Iâll leave you with today is a peek into how these variables names are mapped. In the code snippet below, youâll notice that local indices are found in code.varnames
and nonlocal indices are found in code.names
. Both live on a CodeObject
, which contains the metadata for a block of Python bytecode, including its variable and name mappings.
fn get_or_set_local_index(&mut self, name: &str) -> LocalIndex {
if let Some(index) = self.get_local_index(name) {
index
} else {
let code = self.ensure_code_object_mut();
let new_index = code.varnames.len();
code.varnames.push(name.to_string());
Index::new(new_index)
}
}
fn get_local_index(&self, name: &str) -> Option<LocalIndex> {
let code = self.ensure_code_object();
find_index(&code.varnames, name).map(Index::new)
}
fn get_or_set_nonlocal_index(&mut self, name: &str) -> NonlocalIndex {
let code = self.ensure_code_object_mut();
if let Some(index) = find_index(&code.names, name) {
Index::new(index)
} else {
let new_index = code.names.len();
code.names.push(name.to_string());
Index::new(new_index)
}
}
The difference between varnames
and names
tormented me for weeks (CPython calls these co_varnames
and co_names
), but itâs actually fairly straightforward. varnames
holds the variable names for all local variables in a given scope, and names
does the same for all nonlocal.
Once we properly track this, everything else just works. At runtime, the VM sees a LOAD_GLOBAL
or a LOAD_FAST
and knows to look in the global namespace dictionary or the local stack, respectively.
Buddy! Mr Gutenberg is on the phone and says we can hold the presses no longer.
Okay! Fine! I get it! Letâs ship it. đ˘
Whatâs next for Memphis?
Shh! The printing press man doesnât know Iâm writing a conclusion, so I will be brief.
With variable scoping and function calls in a solid place, Iâm gradually turning my attention to features like stack traces and async support. If youâve enjoyed this dive into bytecode or have questions about building your own interpreter, Iâd love to hear from youâdrop a comment!
Subscribe & Save [on nothing]
If youâd like to get more posts like this directly to your inbox, you can subscribe here!
Work With Me
I mentor software engineers to navigate technical challenges and career growth in a supportive sometimes silly environment. If youâre interested, you can book a session here.
Elsewhere
In addition to mentoring, I also write about my experience navigating self-employment and late-diagnosed autism. Less code and the same number of jokes.
- Lake-Effect Coffee, Chapter 2 - From Scratch dot org
Featured ones: