Logo

dev-resources.site

for different kinds of informations.

On What Lexers Do

Published at
2/22/2023
Categories
compilers
interpreters
Author
caramelomartins
Categories
2 categories in total
compilers
open
interpreters
open
Author
15 person written this
caramelomartins
open
On What Lexers Do

A couple of weeks ago, I’ve started reading Writing An Interpreter In Go. Reading this book means I’ll be building an interpreter for a language called monkey (you can follow my implementation). It is a topic that I’ve always been curious about but never actually took the plunge and decided to study it.

I’ve dabbled a little bit with it in university. But we never actually wrote any of the internal tools. We always used generators for the code and focused more on the concepts. Rest assured, I can’t remember most of it by now.

While reading this book, I reconnected with the basic concepts of how interpreters and compilers work. I’ve been taking notes and trying to expand my knowledge with other sources. I’ve always heard that to know something, you need to be able to explain it. I’m trying to do that here.

Lexers transform source code into symbols (or tokens). They execute what is called “lexical analysis”. From Writing an Interpreter in Go:

(…) from source code to tokens, is called “lexical analysis”, or “lexing” for short. It’s done by a lexer (…)

Wikipedia defines “lexical analysis” as “the process of converting a sequence of characters (…) into a sequence of lexical tokens (…)”. This aligns well with what we read in the book.

So, when we execute a lexer, we transform the source code (text) into a series of data structures (tokens). It is an integral component of writing interpreters (or compilers) as we need to transform source code into a representation that we understand and can process in later stages (e.g. parsing).

As an example, in Go, if we have the statement str := "string", we are in the presence of three tokens: an identifier token (str), an assignment token (:=) and a string token (string). Tokens are meaningful data structures , allowing us to approach source code as data structures, rather than dealing with text inside of interpreters.

Lexers also help us ignore elements of source code that aren’t relevant (e.g. whitespaces in some languages) and focus just on the elements that actually matter to us. After lexing, all irrelevant whitespace has been stripped off. For example, in C or Go, I wouldn’t expect whitespace to appear as tokens after lexing. In Python, where whitespace is relevant, I expect it to be taken into account in lexing.

One concept I was particularly interested in, as I read, was the fact that lexers need to look-ahead. This isn’t always done the same way, it depends on the approach used, but it is integral to lexing, to be able to resolve conflicts.

As an example, reading a “=“ sign might represent different things. It can be an equal sign (e.g. assignment) or it can be the start of a comparison (e.g “==“). Without looking ahead, it is impossible to resolve conflicts and accurately know what a given character represents in source code.

Lexical analysis is only but a small step, in the process of interpreting source code and its results, that will need to be fed into a parser, which will perform additional steps with the tokens.

It has been a good experience reading these initial sections of Writing An Interpreter In Go and finally feel I’m starting to understand the basics of these concepts.

interpreters Article's
30 articles in total
Favicon
Matanuska ADR 010 - Architecture, Revisited
Favicon
Matanuska ADR 009 - Type Awareness in The Compiler and Runtime
Favicon
Matanuska ADR 007 - Type Semantics for Primary Types
Favicon
Matanuska ADR 006 - Runtime Exit
Favicon
Matanuska ADR 002 - Architecture
Favicon
Matanuska ADR 003 - Recursive Descent Parser
Favicon
I'm Publishing Matanuska BASIC's ADRs
Favicon
Matanuska ADR 008 - Sigils
Favicon
Matanuska ADR 005 - Editor Operations
Favicon
Matanuska ADR 001 - Encoding Language
Favicon
Pratt Parsing in MiniScript
Favicon
Understanding Interpreters and Compilers in Programming
Favicon
Create Your Own Programming Language 9: Iteration
Favicon
Crafting Interpreters
Favicon
Create Your Own Programming Language 8: Conditionals
Favicon
Create Your Own Programming Language 7: More Types
Favicon
Create Your Own Programming Language 6: Functions
Favicon
Create Your Own Programming Language 4: Variables and Types
Favicon
Create Your Own Programming Language 3: Call Expressions
Favicon
How To Create Your Own Programming Language
Favicon
Create Your Own Programming Language 1: Numbers
Favicon
On What Lexers Do
Favicon
Starting the journey about creating a new programming language
Favicon
Designing and scoping my programming language Lithia
Favicon
Introduction to Programming - Compiler and Interpreter
Favicon
The Feral Programming Language
Favicon
The First Two Weeks: A Compiler Writing Journey
Favicon
Learning Compilers & Interpreters
Favicon
A Most Perfect Union: Just-In-Time Compilers
Favicon
A Deeper Inspection Into Compilation And Interpretation

Featured ones: