Logo

dev-resources.site

for different kinds of informations.

On What Lexers Do

Published at
2/22/2023
Categories
compilers
interpreters
Author
caramelomartins
Categories
2 categories in total
compilers
open
interpreters
open
Author
15 person written this
caramelomartins
open
On What Lexers Do

A couple of weeks ago, I’ve started reading Writing An Interpreter In Go. Reading this book means I’ll be building an interpreter for a language called monkey (you can follow my implementation). It is a topic that I’ve always been curious about but never actually took the plunge and decided to study it.

I’ve dabbled a little bit with it in university. But we never actually wrote any of the internal tools. We always used generators for the code and focused more on the concepts. Rest assured, I can’t remember most of it by now.

While reading this book, I reconnected with the basic concepts of how interpreters and compilers work. I’ve been taking notes and trying to expand my knowledge with other sources. I’ve always heard that to know something, you need to be able to explain it. I’m trying to do that here.

Lexers transform source code into symbols (or tokens). They execute what is called “lexical analysis”. From Writing an Interpreter in Go:

(…) from source code to tokens, is called “lexical analysis”, or “lexing” for short. It’s done by a lexer (…)

Wikipedia defines “lexical analysis” as “the process of converting a sequence of characters (…) into a sequence of lexical tokens (…)”. This aligns well with what we read in the book.

So, when we execute a lexer, we transform the source code (text) into a series of data structures (tokens). It is an integral component of writing interpreters (or compilers) as we need to transform source code into a representation that we understand and can process in later stages (e.g. parsing).

As an example, in Go, if we have the statement str := "string", we are in the presence of three tokens: an identifier token (str), an assignment token (:=) and a string token (string). Tokens are meaningful data structures , allowing us to approach source code as data structures, rather than dealing with text inside of interpreters.

Lexers also help us ignore elements of source code that aren’t relevant (e.g. whitespaces in some languages) and focus just on the elements that actually matter to us. After lexing, all irrelevant whitespace has been stripped off. For example, in C or Go, I wouldn’t expect whitespace to appear as tokens after lexing. In Python, where whitespace is relevant, I expect it to be taken into account in lexing.

One concept I was particularly interested in, as I read, was the fact that lexers need to look-ahead. This isn’t always done the same way, it depends on the approach used, but it is integral to lexing, to be able to resolve conflicts.

As an example, reading a “=“ sign might represent different things. It can be an equal sign (e.g. assignment) or it can be the start of a comparison (e.g “==“). Without looking ahead, it is impossible to resolve conflicts and accurately know what a given character represents in source code.

Lexical analysis is only but a small step, in the process of interpreting source code and its results, that will need to be fed into a parser, which will perform additional steps with the tokens.

It has been a good experience reading these initial sections of Writing An Interpreter In Go and finally feel I’m starting to understand the basics of these concepts.

compilers Article's
30 articles in total
Favicon
How to create simple tool for compile the Linux Kernel
Favicon
Unraveling Undefined Behavior: Performance Optimizations in Modern Compilers
Favicon
Video — Deep dive: Compiling deep learning models, from XLA to PyTorch 2
Favicon
The current state of Lithia after 2 years
Favicon
Verificando e Gerando Expressões
Favicon
Expressões encadeadas e agrupamento
Favicon
Improving Compiler Performance with Profile Guided Optimization
Favicon
Understanding Interpreters and Compilers in Programming
Favicon
Create Your Own Programming Language 9: Iteration
Favicon
Crafting Interpreters
Favicon
Create Your Own Programming Language 8: Conditionals
Favicon
Create Your Own Programming Language 7: More Types
Favicon
Create Your Own Programming Language 6: Functions
Favicon
Create Your Own Programming Language 4: Variables and Types
Favicon
Create Your Own Programming Language 3: Call Expressions
Favicon
How To Create Your Own Programming Language
Favicon
Create Your Own Programming Language 1: Numbers
Favicon
An alternative to "Distinguishing an Interpreter from a Compiler"
Favicon
On What Lexers Do
Favicon
Compilers Could Be Way More Fun
Favicon
Incremental compilation for Crystal - Part 3
Favicon
Incremental compilation for Crystal - Part 1
Favicon
Incremental compilation for Crystal - Part 2
Favicon
Crafting a Compiler in Rust: Lexical Analysis
Favicon
Crafting a Compiler in Rust: Introduction
Favicon
What is an ELF file?
Favicon
Static vs dynamic linking
Favicon
Speeding up ReScript compilation using interface files
Favicon
🕶 What it takes to build a Static Analysis tool
Favicon
A Compiler optimization area

Featured ones: