Logo

dev-resources.site

for different kinds of informations.

Pratt Parsing in MiniScript

Published at
2/14/2024
Categories
miniscript
minimicro
parsing
interpreters
Author
joestrout
Author
9 person written this
joestrout
open
Pratt Parsing in MiniScript

I recently came across a delightful blog post called Pratt Parsers: Expression Parsing Made Easy. It describes a technique for parsing known as Pratt parsing, or top-down operator precedence parsing.

There's no point in me repeating the explanation here, as the author (Bob Nystrom) does an excellent job already. But, he illustrates by building a parser in Java. Naturally I thought: why not do the same in MiniScript?

So, I've done just that. You can find the code on GitHub here: https://github.com/JoeStrout/bantam-miniscript

Code Overview

Since MiniScript is not Java and does not require one file per class, our code is organized a little differently. So let me just give you a quick orientation to what's what:

  • lexer.ms: a very simple lexer that breaks an input string into tokens. Note that Bantam does not support numbers; only named identifiers like "abc".
  • tokens.ms: defines the token types and a little token class (which contains the type and the actual token text).
  • precedence.ms: defines the nine different precedence levels needed by the Bantam language.
  • expressions.ms: defines the abstract syntax tree, i.e., the internal representation of expressions in our language. The goal of the parser is to convert text input into a little tree of objects defined in this file.
  • parselets.ms: defines the nine different little "parselet" classes needed (all deriving from one of two base classes, PrefixParselet or InfixParselet). There is one of these for all prefix operators (like the "-" in "-a"), one for all binary operators (like "+" in "a+b"), and the rest for handling special cases like assignment, grouping (parentheses), a simple identifier name, etc.
  • parser.ms: the base class for any language parser. The guts of the Pratt algorithm are in the parseExpression method, which we'll discuss a little more below.
  • bantamParser.ms: customizes Parser for the Bantam language. You can easily see here what parselet is used (and with what precedence) based on the type of the next token to be parsed.
  • main.ms: finally, the main program just runs a bunch of tests; my version also provides an "interact" method that shows the parsing for any expression you type in.

Heart of the Pratt parser

The core of the algorithm is this method in parser.ms:

Parser.parseExpression = function(precedence = 0)
    token = self.consume
    prefix = self.prefixParselets.get(token.type)
    if not prefix then
        self.throwError "Could not parse """ + token.text + """."
        return
    end if

    left = prefix.parse(self, token)

    while precedence < self.getPrecedence and not self.error
        token = self.consume

        infix = self.infixParselets.get(token.type)
        left = infix.parse(self, left, token)
    end while
    return left
end function
Enter fullscreen mode Exit fullscreen mode

Not too bad, is it? The first line grabs the next token to be parsed, and tries to find a matching prefix parselet. (The prefix parselets handle anything that can start an expression, like a prefix operator or a simple identifier.) We then call on that prefix to actually parse, and store the result in local variable left.

Then, we have a little loop that continues as long as the precedence of the next parser we would apply (which is found by self.getPrecedence) is greater than our current precedence. The loop gets the appropriate infix (i.e. not starting an expression) parselet, tells it to parse, and makes the result the new left.

And that's it. This simple loop, along with an appropriate set of parselets, does all the magic.

Could it be any simpler?

...Actually, I suspect it could. The algorithm itself is simple, but as written (a fairly direct port of the Java code), it still feels like a lot of overhead and bureaucracy to do a relatively simple job.

So, maybe next week I'll follow up with a clean-sheet rewrite of the same algorithm, but taking a simpler, more MiniScript-y approach.

What do you think? Let me know in the comments below!

interpreters Article's
30 articles in total
Favicon
Matanuska ADR 010 - Architecture, Revisited
Favicon
Matanuska ADR 009 - Type Awareness in The Compiler and Runtime
Favicon
Matanuska ADR 007 - Type Semantics for Primary Types
Favicon
Matanuska ADR 006 - Runtime Exit
Favicon
Matanuska ADR 002 - Architecture
Favicon
Matanuska ADR 003 - Recursive Descent Parser
Favicon
I'm Publishing Matanuska BASIC's ADRs
Favicon
Matanuska ADR 008 - Sigils
Favicon
Matanuska ADR 005 - Editor Operations
Favicon
Matanuska ADR 001 - Encoding Language
Favicon
Pratt Parsing in MiniScript
Favicon
Understanding Interpreters and Compilers in Programming
Favicon
Create Your Own Programming Language 9: Iteration
Favicon
Crafting Interpreters
Favicon
Create Your Own Programming Language 8: Conditionals
Favicon
Create Your Own Programming Language 7: More Types
Favicon
Create Your Own Programming Language 6: Functions
Favicon
Create Your Own Programming Language 4: Variables and Types
Favicon
Create Your Own Programming Language 3: Call Expressions
Favicon
How To Create Your Own Programming Language
Favicon
Create Your Own Programming Language 1: Numbers
Favicon
On What Lexers Do
Favicon
Starting the journey about creating a new programming language
Favicon
Designing and scoping my programming language Lithia
Favicon
Introduction to Programming - Compiler and Interpreter
Favicon
The Feral Programming Language
Favicon
The First Two Weeks: A Compiler Writing Journey
Favicon
Learning Compilers & Interpreters
Favicon
A Most Perfect Union: Just-In-Time Compilers
Favicon
A Deeper Inspection Into Compilation And Interpretation

Featured ones: