dev-resources.site
for different kinds of informations.
Create Your Own Programming Language 4: Variables and Types
Wanda has come a long way since we started. We have the ability to call functions and work with basic data types. The reader also works with linked lists like a real Lisp.
Now it's time to add variables to the language.
In this article we'll add 3 new syntactic forms: variable declarations, set expressions, and do blocks.
Variable declarations are exactly what they sound like. Set expressions allow you to change the value of a variable. Do blocks are a list of expressions you can use when you want to fit multiple expressions into a space made for just a single expression.
The Perils of Mutable State
Note that by introducing both variable declarations and a means to modify the value of variables we're opening the door to having mutable state in our programs. Mutable state means values your program relies on to calculate its result that can change throughout the course of the program.
I've decided to allow mutable state in Wanda as a kind of "escape hatch" for when it's really necessary, but the reality is it's actually rare that you really need mutable state. It's even more rare that using mutable state is actually a good idea.
By avoiding mutable state in your programs you go a long way towards making your programs easier to reason about and understand. Just because I'm giving us set expressions in Wanda does not mean it's a good idea to actually use them unless it's absolutely necessary.
Ok, end rant.
Semantic Analysis
We'll also dip our toes into the water of semantic analysis and start working on a proper type checker for Wanda.1
Wanda is a gradually typed language, which means you can use it either like a statically typed language or a dynamically typed one. With type annotations, you turn on the type checker for an extra layer of safety. If you use the language without annotations, you can work with it just like any other dynamically typed language; the type checker won't get in your way.
When we're done, Wanda will have a very expressive type system including records, objects and classes, algebraic data types, and generics.
As always, if you haven't read the previous post on call expressions you should go ahead and do that first. Really, it's ok. I'll wait right here for you.
Ok, ready? Awesome, let's go!
Fixing A Mistake from Last Time
First, we need to undo something we did in the last article. I didn't think through things well enough, and I neglected to consider the impact it would have to wrap the compiled output of a Program
node in an immediately-invoked function expression (IIFE).
The problem is when we try to declare variables: declaring a variable inside an IIFE makes the variable local to that function, which means we can never use it again. That's obviously not what we want.
So open src/emitter/Emitter.js
and modify the emitProgram
method like this:
emitProgram(node, ns) {
let code = "";
for (let n of node.body) {
code += `${this.emit(n, ns)};\n`;
}
return code;
}
That takes care of the problem.
What We Don't Need To Change Today
The lexer and reader have occupied a large amount of our time in the first few articles in this series, but for today's article we can leave them just as they are.
What We Do Need To Change Today
We'll make some changes to the parser to accommodate the new syntactic forms and to parse type annotations. We'll need to update the emitter to handle the new forms, and we'll also need to implement the guts of the type checker.
The type checker will take the most time and effort to implement, so we'll save it for last.
We'll also update the CLI so the type checker is part of our compilation pipeline and allow multiline input in the REPL. A common convention with do blocks is to put each expression on its own line, which is hard to do when all your input has to be on the same line in the REPL.
Updates To The Parser
The first big update to the parser we're going to make is to change a property on all the AST nodes.
Of Types and Kinds
I've decided that since we're doing type checking in the compiler it makes more sense to rename the type
property on AST nodes to kind
and save things related to the word "type" for type checking.
In src/parser/ast.js
update the node creators like so:
export const AST = {
Program(exprs) {
return {
kind: ASTTypes.Program,
body: exprs,
srcloc: exprs[0]?.srcloc ?? SrcLoc.new(0, 0, 0, "none"),
};
},
NumberLiteral(token) {
return {
kind: ASTTypes.NumberLiteral,
value: token.value,
srcloc: token.srcloc,
};
},
StringLiteral(token) {
return {
kind: ASTTypes.StringLiteral,
value: token.value,
srcloc: token.srcloc,
};
},
BooleanLiteral(token) {
return {
kind: ASTTypes.BooleanLiteral,
value: token.value,
srcloc: token.srcloc,
};
},
KeywordLiteral(token) {
return {
kind: ASTTypes.KeywordLiteral,
value: token.value,
srcloc: token.srcloc,
};
},
NilLiteral(token) {
return {
kind: ASTTypes.NilLiteral,
value: token.value,
srcloc: token.srcloc,
};
},
Symbol(token) {
return {
kind: ASTTypes.Symbol,
name: token.value,
srcloc: token.srcloc,
};
},
CallExpression(func, args, srcloc) {
return {
kind: ASTTypes.CallExpression,
func,
args,
srcloc,
};
},
};
In src/emitter/Emitter.js
switch on node.kind
instead of node.type
in the emit
method:
emit(node = this.program, ns = this.ns) {
switch (node.kind) {
// and so on...
Then, also in the emit
method, in the default block change the first argument to SyntaxException
from node.type
to node.kind
.
Finally, in src/printer/printAST.js
change all references to node.type
to node.kind
including in the print
and printPrimitive
methods.
Parsing The New Forms
Variable declarations, set expressions, and do blocks have the following syntax:
(var x 10)
(set! x 15)
(do
(var a 10)
(println "About to increment a")
(+ a 1))
Set expressions use the set!
keyword, with the exclamation point as an indicator that this form mutates a value.
You can also use a type annotation with a variable declaration to turn on the type checker:
(var (x : number) 10)
Note that there must be a space both between the variable name and the colon and the colon and the type. That's to prevent the lexer from reading the colon as either part of the variable name or the beginning of a keyword.
First, let's add the AST node types to the node constructors object. In src/parser/ast.js
:
export const AST = {
// existing constructors...
VariableDeclaration(lhv, expression, srcloc, typeAnnotation = null) {
return {
kind: ASTTypes.VariableDeclaration,
lhv,
expression,
srcloc,
typeAnnotation,
};
},
SetExpression(lhv, expression, srcloc) {
return {
kind: ASTTypes.SetExpression,
lhv,
expression,
srcloc,
};
},
DoExpression(body, srcloc) {
return {
kind: ASTTypes.DoExpression,
body,
srcloc,
};
},
};
We're just going to stub a function for parsing type annotations for right now; we'll come back to it when it's time to work on the type checker. In src/parser/parseTypeAnnotation.js
add this stub:
export const parseTypeAnnotation = (annotation) => {};
Now we get to add our first non-call expression forms to the parseList
function. First, let's write the parse functions for variable declarations, set expressions, and do blocks.
For variable declarations and set expressions, we'll need left hand values (LHVs) and initializer expressions. For do blocks, we'll just need a list of expressions.
I've gone ahead and included everything you'll need in this file to parse type annotations. Note that we're skipping over the colon and converting the tail of the Cons
list to a JavaScript array.
In src/parser/parse.js
:
const parseVariableDeclaration = (decl) => {
let [_, lhv, expression] = decl;
let parsedLhv,
typeAnnotation = null;
if (lhv instanceof Cons) {
// has type annotation
const realLhv = lhv.get(0);
// convert to array and get rid of ":" when passing into parseTypeAnnotation
typeAnnotation = parseTypeAnnotation([...lhv.cdr].slice(1));
parsedLhv = parseExpr(realLhv);
} else {
parsedLhv = parseExpr(lhv);
}
const parsedExpression = parseExpr(expression);
return AST.VariableDeclaration(
parsedLhv,
parsedExpression,
decl.srcloc,
typeAnnotation
);
};
const parseSetExpression = (expr) => {
const [_, lhv, expression] = expr;
const parsedLhv = parseExpr(lhv);
const parsedExpression = parseExpr(expression);
return AST.SetExpression(parsedLhv, parsedExpression, expr.srcloc);
};
const parseDoExpression = (expr) => {
const [_, ...exprs] = expr;
let body = [];
for (let ex of exprs) {
body.push(parseExpr(ex));
}
return AST.DoExpression(body, expr.srcloc);
};
Now, in the parseList
function, add the following 3 cases to the switch
statement:
switch (first.value) {
case "var":
return parseVariableDeclaration(form);
case "set!":
return parseSetExpression(form);
case "do":
return parseDoExpression(form);
// default case...
That's it for the parser until we come back to parse type annotations. We'll also add parsing type aliases, which is super simple.
Changes To The Emitter
Obviously we need to add methods for each of the 3 new AST nodes to the emitter, as well as adding clauses to the switch
statement in the emit
method.
We're also going to make it so it does a little bit of semantic checking in the emitter.
We're going to make the emitter check to see if a variable has been accessed prior to declaration in its scope, instead of adding a separate pass to the semantic analysis phase.
This isn't necessarily the best way to do it. Ideally, we would create a separate visitor to check that or do it in the type checker. However, it's the simplest way to do it in our case. Doing it like this prevents us from having to implement 2 passes during semantic analysis (and thus from having to maintain 2 environments in the REPL) or make the type checker more complicated.
First, let's update the imports for the emitter in src/emitter/Emitter.js
:
import { makeSymbol } from "../runtime/makeSymbol.js";
Now let's write the 3 emitter methods emitDoExpression
, emitSetExpression
, and emitVariableDeclaration
:
emitDoExpression(node, ns) {
const childNs = ns.extend("doExpression");
let code = "(() => {";
let i = 0;
for (let expr of node.body) {
if (i === node.body.length - 1) {
code += "return " + this.emit(expr, childNs) + "\n";
} else {
code += this.emit(expr, childNs) + "\n";
}
i++;
}
code += "})();";
return code;
}
// more methods...
emitSetExpression(node, ns) {
return `${this.emit(node.lhv, ns)} = ${this.emit(node.expression, ns)};`;
}
// more methods...
emitVariableDeclaration(node, ns) {
const name = node.lhv.name;
if (ns.has(name)) {
throw new Exception(
`Name ${name} defined at ${node.srcloc.file} ${node.srcloc.line}:${node.srcloc.col} has already been accessed in the current namespace; cannot access identifier before defining it`
);
}
const translatedName = makeSymbol(name);
ns.set(name, translatedName);
return `var ${translatedName} = ${this.emit(node.expression, ns)};`;
}
Then in the switch
statement in the emit
method add the following cases just above the default
case:
case ASTTypes.VariableDeclaration:
return this.emitVariableDeclaration(node, ns);
case ASTTypes.SetExpression:
return this.emitSetExpression(node, ns);
case ASTTypes.DoExpression:
return this.emitDoExpression(node, ns);
Finally in the emitSymbol
method, just before the return
statement, add:
// To make sure when compiling a variable definition the variable name hasn't already been accessed in the same scope
if (!ns.has(name)) {
ns.set(name, emittedName);
}
Changes to The AST Printer
Finally, let's wrap this up by making it so the AST Printer can print do blocks, variable declarations, and set expressions.
In src/printer/printAST.js
, add these cases to the switch
statement in the print
method:
// additional cases
case ASTTypes.VariableDeclaration:
return this.printVariableDeclaration(node, indent);
case ASTTypes.SetExpression:
return this.printSetExpression(node, indent);
case ASTTypes.DoExpression:
return this.printDoExpression(node, indent);
// default case
Then add the methods printDoExpression
, printSetExpression
, and printVariableDeclaration
:
printDoExpression(node, indent) {
let prStr = `${prIndent(indent)}DoExpression:\n`;
let i = 0;
for (let expr of node.body) {
prStr += `${this.print(expr, indent + 2)}`;
prStr += i === node.body.length - 1 ? "" : "\n";
i++;
}
return prStr;
}
printSetExpression(node, indent) {
let prStr = `${prIndent(indent)}SetExpression:\n`;
prStr += `${this.print(node.lhv, indent + 2)}\n`;
prStr += `${this.print(node.expression, indent + 2)}`;
return prStr;
}
printVariableDeclaration(node, indent) {
let prStr = `${prIndent(indent)}VariableDeclaration:\n`;
prStr += `${this.print(node.lhv, indent + 2)}\n`;
prStr += `${this.print(node.expression, indent + 2)}`;
return prStr;
}
Believe it or not, that's all it takes to add variable declarations, set expressions, and do blocks to Wanda! We'll make some changes to the CLI later to include multiline input in the REPL, but first let's turn our attention to the type checker.
Parsing and The Type Checker
The first thing we need to do is parse type annotations. We stubbed out the function earlier, but now we're going to fill it in.
We're also going to add type aliases to Wanda. This allows us to give names to types. This can make things more expressive for programmers. Let's say, for instance, that you have a function where age is represented by a number. If you want to refer to a value as age instead of number, just create a type alias:
(type age number)
This is a simple example, but when types get more complex this feature will be more useful than it currently appears.
Let's look at src/parser/parseTypeAnnotation.js
.
First, we'll construct an enum of Type Annotation Types:
export const TATypes = {
NumberLiteral: "NumberLiteral",
StringLiteral: "StringLiteral",
BooleanLiteral: "BooleanLiteral",
KeywordLiteral: "KeywordLiteral",
NilLiteral: "NilLiteral",
Symbol: "Symbol",
List: "List",
};
If an annotation is coming into the function as a Cons
list we're just going to convert it to an array for simplicity's sake.
Then, if an annotation is an array, we'll pluck the 1st item for testing. Otherwise, the annotation is a simple form and we'll just use it as is.
Currently, the only values we have for testing are Token
s of type "Symbol." So we'll switch
on the token's value
property to see what kind of annotation it is.
The annotation will be either a literal type or a named type. Literal types include primitives like number, as well as lists and type aliases. A list has a list type, for example it could be list (string)
with string as the list type.
Here's the parseTypeAnnotation
function:
export const parseTypeAnnotation = (annotation) => {
let annot;
if (annotation instanceof Cons) {
annotation = [...annotation];
}
if (Array.isArray(annotation)) {
annot = annotation[0];
} else {
annot = annotation;
}
if (annot.type === TokenTypes.Symbol) {
switch (annot.value) {
case "number":
return { kind: TATypes.NumberLiteral };
case "string":
return { kind: TATypes.StringLiteral };
case "boolean":
return { kind: TATypes.BooleanLiteral };
case "keyword":
return { kind: TATypes.KeywordLiteral };
case "nil":
return { kind: TATypes.NilLiteral };
case "list":
// annotation is array with listType as 2nd member
// i.e. list (string) annotation == [list, string]
return parseListAnnotation(annotation[1]);
default:
// must be a named type
return { kind: TATypes.Symbol, name: annot.value };
}
}
};
The parseListAnnotation
function is simple:
const parseListAnnotation = (type) => {
const listType = parseTypeAnnotation(type);
return { kind: TATypes.List, listType };
};
We'll add one more node to the parser for type aliases. In src/parser/ast.js
add a member to the ASTTypes
enum:
export const ASTTypes = {
// members...
TypeAlias: "TypeAlias",
};
Then add the TypeAlias
node constructor:
export const AST = {
// additional constructors...
TypeAlias(name, type, srcloc) {
return {
kind: ASTTypes.TypeAlias,
name,
type,
srcloc,
};
},
};
Add a parseTypeAlias
function to src/parser/parse.js
:
const parseTypeAlias = (form) => {
let [_, name, type] = form;
name = name.value;
const parsedType = parseTypeAnnotation(type);
return AST.TypeAlias(name, parsedType, form.srcloc);
};
And add the following case to the switch
statement in parseList
:
// additional cases...
case "type":
return parseTypeAlias(form);
// default case
Now the parser is ready to handle type annotations and type aliases!
Emitting and Printing Type Aliases
The emitter and AST printer methods for type aliases both just return empty strings. JavaScript has no equivalent for a type alias, so there's no code to emit.
In src/emitter/Emitter.js
add the type alias case to the switch
in the emit
method:
// other cases
case ASTTypes.TypeAlias:
return this.emitTypeAlias(node, ns);
// default case
Then the emitTypeAlias
method:
emitTypeAlias(node, ns) {
return "";
}
Then do the same in the switch
statement in the print
method in src/printer/printAST.js
:
// other cases...
case ASTTypes.TypeAlias:
return this.printTypeAlias(node, indent);
// default case
And the printTypeAlias
method:
printTypeAlias(node, indent) {
return "";
}
Implementing The Type Checker
Here's a high-level overview of how the type checker works.
When the checker encounters an expression, it first tries to infer a type from the expression. If the expression uses a type annotation, it checks the inferred type against the annotated type to make sure the inferred type is a proper subtype of the annotated type. If it is, then we're all good. Otherwise, it throws an Exception
.
The presence of a single type annotation will turn on checking for the entire program. A more sophisticated gradual type checker might only check variables and functions that were specifically annotated and leave the rest alone, but that would be much more complicated.
Types, Constructors, and Validators
The first thing we need to do is define our basic types. Then we'll create constructors and validators for the types so we can do the work of checking and returning types from the inference and checking functions.
Here's the src/typechecker/types.js
file, which contains an enum of TypeTypes
as well as JSDoc definitions for different type objects:
/**
* @enum {string}
*/
export const TypeTypes = {
Any: "Any",
Number: "Number",
String: "String",
Boolean: "Boolean",
Keyword: "Keyword",
Nil: "Nil",
FunctionType: "FunctionType",
TypeAlias: "TypeAlias",
List: "List",
};
/**
* @typedef Any
* @prop {TypeTypes.Any} kind
*/
/**
* @typedef Number
* @prop {TypeTypes.Number} kind
*/
/**
* @typedef String
* @prop {TypeTypes.String} kind
*/
/**
* @typedef Boolean
* @prop {TypeTypes.Boolean} kind
*/
/**
* @typedef Keyword
* @prop {TypeTypes.Keyword} kind
*/
/**
* @typedef Nil
* @prop {TypeTypes.Nil} kind
*/
/**
* @typedef TypeAlias
* @prop {TypeTypes.TypeAlias} kind
* @prop {string} name
* @prop {Type} base
*/
/**
* @typedef FunctionType
* @prop {TypeTypes.FunctionType} kind
* @prop {Type[]} params
* @prop {Type} ret
* @prop {boolean} variadic
*/
/**
* @typedef List
* @prop {TypeTypes.List} kind
* @prop {Type} listType
*/
/**
* @typedef {Number|String|Boolean|Keyword|Nil} PrimitiveTypes
*/
/**
* @typedef {Any|PrimitiveTypes|FunctionType|TypeAlias|List} Type
*/
Note the existence of an Any
type. Since Wanda is gradually typed, we need a case to handle types when type checking is turned off. The Any
type is compatible with every other type and every other type is compatible with it, meaning any time you check against an Any
type the check will pass. This has the effect of basically turning off the type checker.
Obviously as we add new kinds of types (like object types) this file will grow. Notice that we have a type for FunctionType
even though we haven't added functions to the language yet. That's because we need it for checking call expressions.
Next we have type constructors. These return objects shaped like the type definitions in types.js
. In src/typechecker/constructors.js
:
import { TypeTypes } from "./types.js";
export const any = { kind: TypeTypes.Any };
export const number = { kind: TypeTypes.Number };
export const string = { kind: TypeTypes.String };
export const boolean = { kind: TypeTypes.Boolean };
export const keyword = { kind: TypeTypes.Keyword };
export const nil = { kind: TypeTypes.Nil };
export const functionType = (params, ret, variadic = false) => {
return { kind: TypeTypes.FunctionType, params, ret, variadic };
};
export const typeAlias = (name, base) => ({
kind: TypeTypes.TypeAlias,
name,
base,
});
export const listType = (listType) => ({ kind: TypeTypes.List, listType });
Now let's combine our TypeType
enum, constructors, and validators together into a single Type
module. We'll add more to this module later, but this will do for now. In src/typechecker/Type.js
:
import { TypeTypes } from "./types.js";
import * as Validators from "./validators.js";
import * as Constructors from "./constructors.js";
export const Type = {
Type: TypeTypes,
...Constructors,
...Validators,
};
From Annotations to Types
Next we need to be able to convert type annotations into concrete types. In src/typechecker/fromTypeAnnotation.js
:
import { TATypes } from "../parser/parseTypeAnnotation.js";
import { Type } from "./Type.js";
import { Exception } from "../shared/exceptions.js";
export const fromTypeAnnotation = (typeAnnotation, typeEnv) => {
switch (typeAnnotation.kind) {
case TATypes.NumberLiteral:
return Type.number;
case TATypes.StringLiteral:
return Type.string;
case TATypes.BooleanLiteral:
return Type.boolean;
case TATypes.KeywordLiteral:
return Type.keyword;
case TATypes.NilLiteral:
return Type.nil;
case TATypes.Symbol: {
const name = typeAnnotation.name;
const type = typeEnv.getType(name);
if (!type) {
throw new Exception(
`Type ${name} not found in current type environment`
);
}
return type;
}
case TATypes.List: {
const listType = fromTypeAnnotation(typeAnnotation.listType, typeEnv);
return Type.listType(listType);
}
default:
throw new Exception(
`Type not found for type annotation ${JSON.parse(
typeAnnotation,
null,
2
)}`
);
}
};
Converting Types to Strings
We'll also need a function to convert types to strings for things like error messages. In src/typechecker/typeToString.js
:
import { Exception } from "../shared/exceptions.js";
import { TypeTypes } from "./types.js";
export const typeToString = (type) => {
switch (type.kind) {
case TypeTypes.Any:
return "any";
case TypeTypes.Number:
return "number";
case TypeTypes.String:
return "string";
case TypeTypes.Boolean:
return "boolean";
case TypeTypes.Keyword:
return "keyword";
case TypeTypes.Nil:
return "nil";
case TypeTypes.FunctionType:
return `(${type.params.map(typeToString).join(", ")}) -> ${typeToString(
type.ret
)}`;
case TypeTypes.TypeAlias:
return `TypeAlias: ${type.name}, base: ${typeToString(type.base)}`;
case TypeTypes.List:
return `list (${typeToString(type.listType)})`;
default:
throw new Exception(`String conversion not implemented for ${type.kind}`);
}
};
Now add typeToString
to the Type
module in src/typechecker/Type.js
:
import { typeToString } from "./typeToString.js";
export const Type = {
// existing members...
toString: typeToString,
};
Subtypes
The main purpose of the check
function we're going to write in a minute is to compare a type to a node and see if the node is a subtype of the provided type. To that end, we need a function to check if one type is a subtype of another.
Add this function in src/typechecker/isSubtype.js
to check if type1
is a subtype of type2
:
import { Type } from "./Type.js";
export const isSubtype = (type1, type2) => {
if (Type.isAny(type1) || Type.isAny(type2)) return true;
if (Type.isNumber(type1) && Type.isNumber(type2)) return true;
if (Type.isString(type1) && Type.isString(type2)) return true;
if (Type.isBoolean(type1) && Type.isBoolean(type2)) return true;
if (Type.isKeyword(type1) && Type.isKeyword(type2)) return true;
if (Type.isNil(type1) && Type.isNil(type2)) return true;
if (Type.isTypeAlias(type1) && Type.isTypeAlias(type2)) {
return isSubtype(type1.base, type2.base);
}
if (Type.isTypeAlias(type1) && !Type.isTypeAlias(type2)) {
return isSubtype(type1.base, type2);
}
if (Type.isTypeAlias(type2) && !Type.isTypeAlias(type1)) {
return isSubtype(type1, type2.base);
}
if (Type.isList(type1) && Type.isList(type2)) {
return isSubtype(type1.listType, type2.listType);
}
return false;
};
The Type Environment
Similar to the Namespace
we use in the emitter, we also need a compile-time environment for the type checker. We'll call this class TypeEnvironment
and have it inherit from Namespace
.
Note that I've moved the Namespace
class from src/runtime
to src/shared
. If you do the same, you'll need to update your imports accordingly.
In src/typechecker/TypeEnvironment.js
:
import { Namespace } from "../shared/Namespace.js";
export class TypeEnvironment extends Namespace {
constructor(parent = null, { name = "global" } = {}) {
super(parent, { name });
this.types = new Map();
this.checkingOn = false;
}
static new(parent = null, { name = "global" } = {}) {
return new TypeEnvironment(parent, { name });
}
existsType(name) {
return this.lookupType(name) !== null;
}
extend(name) {
return new TypeEnvironment(this, { name });
}
getType(name) {
const scope = this.lookupType(name);
if (scope) {
return scope.types.get(name);
}
return null;
}
hasType(name) {
return this.types.has(name);
}
lookupType(name) {
let scope = this;
while (scope) {
if (scope.types.has(name)) {
return scope;
}
scope = scope.parent;
}
return null;
}
setType(name, type) {
this.types.set(name, type);
}
}
Note the checkingOn
property set in the constructor. We'll set that property to true
if we encounter a type annotation so the checker knows to check the types associated with variables and, when we have functions, function parameter and return types against the types it infers from expressions.
Getting The Base Type of A Type Alias
We also need to be able to get the base type of a type alias so we can properly type check annotations that name an alias. In src/typechecker/utils.js
create the getAliasBase
function:
import { Type } from "./Type.js";
export const getAliasBase = (name, env) => {
let baseType = env.getType(name);
while (Type.isTypeAlias(baseType)) {
baseType = env.getType(baseType.name);
}
return baseType;
};
Inferring Types from Expressions
Now we're getting into the real meat of the type checking implementation.
Our type checker is going to be bidirectional, which means it can get types either from annotations or from the expressions themselves, which allows us to drastically reduce the number of type annotations required to properly type check a program.
We don't have functions in the language yet, but when we do we'll be able to type check most programs based solely on the annotations on functions and their parameters.
This is how the TypeScript type checker works, and our type system is heavily influenced by TypeScript's.
To that end, we need an infer
function that infers types from expressions.
In src/typechecker/infer.js
you can see that the main infer
function dispatches to sub-functions:
import { ASTTypes } from "../parser/ast.js";
import { Exception } from "../shared/exceptions.js";
import { Type } from "./Type.js";
import { isSubtype } from "./isSubtype.js";
import { getAliasBase } from "./utils.js";
import { fromTypeAnnotation } from "./fromTypeAnnotation.js";
export const infer = (ast, env) => {
switch (ast.kind) {
case ASTTypes.NumberLiteral:
return inferNumber();
case ASTTypes.StringLiteral:
return inferString();
case ASTTypes.BooleanLiteral:
return inferBoolean();
case ASTTypes.KeywordLiteral:
return inferKeyword();
case ASTTypes.NilLiteral:
return inferNil();
case ASTTypes.Symbol:
return inferSymbol(ast, env);
case ASTTypes.CallExpression:
return inferCallExpression(ast, env);
case ASTTypes.VariableDeclaration:
return inferVariableDeclaration(ast, env);
case ASTTypes.SetExpression:
return inferSetExpression(ast, env);
case ASTTypes.DoExpression:
return inferDoExpression(ast, env);
case ASTTypes.TypeAlias:
return inferTypeAlias(ast, env);
default:
throw new Exception(`No type inferred for AST node type ${ast.kind}`);
}
};
Inferring for literal expressions is simple. We just return the type constructor for the literal node:
// Infer types from literal nodes
const inferNumber = () => Type.number;
const inferString = () => Type.string;
const inferBoolean = () => Type.boolean;
const inferKeyword = () => Type.keyword;
const inferNil = () => Type.nil;
For inferSymbol
we need to look up the symbol in the type environment, then get the base type if the symbol resolves to a type alias:
const inferSymbol = (node, env) => {
const name = node.name;
const namedType = env.get(name);
const baseType = Type.isTypeAlias(namedType)
? getAliasBase(namedType.name)
: namedType;
if (!namedType) {
throw new Exception(
`Type not found in current environment for ${name} at ${node.srcloc.file} ${node.srcloc.col}:${node.srcloc.col}`
);
}
return baseType;
};
For inferCallExpression
we infer the type for the function. If the function is of type Any
we just return the Any
type.
Otherwise we check to make sure the number of arguments is correct. For variadic functions this means a number of arguments equal to or greater than the function's arity (number of arguments it takes). For all other functions, this means the same number of arguments as the function's arity.
If type checking is on, we check the types of the arguments against the types of the formal parameters. This includes checking variadic arguments against the type of the variadic parameter.
Finally, if everything checks out, we return the function's return type.
const inferCallExpression = (node, env) => {
const func = infer(node.func, env);
if (Type.isAny(func)) {
return Type.any;
}
if (
node.args.length !== func.params.length ||
(func.variadic && node.args.length >= func.params.length - 1)
) {
throw new Exception(
`Expected${func.variadic ? " at least " : " "}arguments; ${
node.args.length
} given at ${node.srcloc.file} ${node.srcloc.line}:${node.srcloc.col}`
);
}
if (env.checkingOn) {
func.params.forEach((p, i, a) => {
const argType = infer(node.args[i], env);
if (!isSubtype(argType, p)) {
const node = node.args[i];
throw new Exception(
`${Type.toString(argType)} is not a subtype of ${Type.toString(
p
)} at ${node.srcloc.file} ${node.srcloc.line}:${node.srcloc.col}`
);
}
if (i === a.length - 1) {
for (let arg of node.args.slice(i)) {
const argType = infer(arg, env);
if (!isSubtype(argType, p)) {
const node = node.args[i];
throw new Exception(
`${Type.toString(argType)} is not a subtype of ${Type.toString(
p
)} at ${node.srcloc.file} ${node.srcloc.line}:${node.srcloc.col}`
);
}
}
}
});
}
return func.ret;
};
We need a function inferVariableDeclaration
because we need to be able to infer a type for every expression that could be in a program or do block, but primary checking of variable declarations will happen in the TypeChecker
class so we'll just infer the type of node.expression
and call it good.
const inferVariableDeclaration = (node, env) => {
return infer(node.expression, env);
};
inferSetExpression
is the same:
const inferSetExpression = (node, env) => {
return infer(node.expression, env);
};
inferDoExpression
extends the type environment and uses that to infer each expression it contains, then returns the type of the last expression.
const inferDoExpression = (node, env) => {
let doType = Type.any;
const doEnv = env.extend("doExpression");
for (let expr of node.body) {
doType = infer(expr, doEnv);
}
return doType;
};
Lastly, inferTypeAlias
returns the base type of a type alias declaration:
const inferTypeAlias = (node, env) => {
return fromTypeAnnotation(node.type, env);
};
We'll revisit this function when we add generics to Wanda, but this will work for now.
Checking Types
The last bit of machinery we need before building the type checker proper is a check
function that takes a node, a type, infers the type from the node, and checks that type against the provided type. In src/typechecker/check.js
:
export const check = (ast, type, env) => {
const inferredType = infer(ast, env);
if (!isSubtype(inferredType, type)) {
throw new Exception(
`Type ${Type.toString(inferredType)} is not a subtype of ${Type.toString(
type
)} at ${ast.srcloc.file} ${ast.srcloc.line}:${ast.srcloc.col}`
);
}
};
The TypeChecker
Class
Now we'll create a class that implements the visitor pattern, visits each node of the AST, and type checks it. Each method will return a modified AST node that includes type information.
For most of the nodes we have so far, this is as simple as calling infer
on the provided node. One of the nice things about bidirectional type checking is that you can get your inference to go a long way, which saves the programmer the need to provide copious type annotations.
Let's construct the TypeChecker
class in src/typechecker/TypeChecker.js
:
import { ASTTypes } from "../parser/ast.js";
import { Exception } from "../shared/exceptions.js";
import { check } from "./check.js";
import { infer } from "./infer.js";
import { Type } from "./Type.js";
import { fromTypeAnnotation } from "./fromTypeAnnotation.js";
export class TypeChecker {
constructor(program, env) {
this.program = program;
this.env = env;
}
static new(program, env = TypeEnvironment.new()) {
return new TypeChecker(program, env);
}
}
Next we'll write the check
method, which dispatches to other methods based on the AST node's kind
property:
check(node = this.program, env = this.env) {
switch (node.kind) {
case ASTTypes.Program:
return this.checkProgram(node, env);
case ASTTypes.NumberLiteral:
return this.checkNumber(node, env);
case ASTTypes.StringLiteral:
return this.checkString(node, env);
case ASTTypes.BooleanLiteral:
return this.checkBoolean(node, env);
case ASTTypes.KeywordLiteral:
return this.checkKeyword(node, env);
case ASTTypes.NilLiteral:
return this.checkNil(node, env);
case ASTTypes.Symbol:
return this.checkSymbol(node, env);
case ASTTypes.CallExpression:
return this.checkCallExpression(node, env);
case ASTTypes.VariableDeclaration:
return this.checkVariableDeclaration(node, env);
case ASTTypes.SetExpression:
return this.checkSetExpression(node, env);
case ASTTypes.DoExpression:
return this.checkDoExpression(node, env);
case ASTTypes.TypeAlias:
return this.checkTypeAlias(node, env);
default:
throw new Exception(`Type checking not implemented for ${node.kind}`);
}
}
The checkProgram
method loops over node.body
and checks each expression:
checkProgram(node, env) {
/** @type {TypedAST[]} */
let body = [];
let type;
let i = 0;
for (let expr of node.body) {
if (i === node.body.length - 1) {
const node = this.check(expr, env);
type = node.type;
body.push(node);
} else {
const node = this.check(expr, env);
body.push(node);
}
}
return { kind: node.kind, body, srcloc: node.srcloc, type };
}
The primitive check methods simply infer the type from the node:
checkBoolean(node, env) {
return { ...node, type: infer(node, env) };
}
checkKeyword(node, env) {
return { ...node, type: infer(node, env) };
}
checkNil(node, env) {
return { ...node, type: infer(node, env) };
}
checkNumber(node, env) {
return { ...node, type: infer(node, env) };
}
checkString(node, env) {
return { ...node, type: infer(node, env) };
}
The checkSymbol
method does the same, since infer
looks up the type for the symbol from the type environment already:
checkSymbol(node, env) {
return { ...node, type: infer(node, env) };
}
The checkCallExpression
does the same since infer
handles all the heavy lifting:
checkCallExpression(node, env) {
// infer handles checking for argument types
return { ...node, type: infer(node, env) };
}
The checkDoExpression
method loops over each expression in the body and checks them one at a time:
checkDoExpression(node, env) {
/** @type {TypedAST[]} */
let body = [];
for (let expr of node.body) {
const node = this.check(expr, env);
body.push(node);
}
return {
kind: node.kind,
body,
srcloc: node.srcloc,
type: infer(node, env),
};
}
The checkVariableDeclaration
method checks if the node has a type annotation. If so, it derives the type from the annotation and uses that to check the type of the expression. Then it sets the annotation type in the environment and turns on type checking for the whole program.
If there's no annotation, it infers the type of the expression and sets that in the environment.
checkVariableDeclaration(node, env) {
if (node.typeAnnotation) {
const annotType = fromTypeAnnotation(node.typeAnnotation);
check(node.expression, annotType, env);
env.checkingOn = true;
env.set(node.lhv.name, annotType);
return { ...node, type: annotType };
}
const type = infer(node, env);
env.set(node.lhv.name, type);
return {
kind: node.kind,
lhv: node.lhv,
expression: this.check(node.expression, env),
srcloc: node.srcloc,
type,
};
}
checkSetExpression
checks to see if checking is turned on. If so, it checks the type of its expression against the type set in the environment for the symbol. Otherwise it infers the type from the expression.
checkSetExpression(node, env) {
if (env.checkingOn) {
const nameType = env.getType(node.lhv.name);
check(node.expression, nameType, env);
return { ...node, type: nameType };
}
return {
kind: node.kind,
lhv: node.lhv,
expression: this.check(node.expression, env),
srcloc: node.srcloc,
type: infer(node, env),
};;
}
Finally, checkTypeAlias
gets the type from the annotation on the TypeAlias
node and sets it in the type environment:
checkTypeAlias(node, env) {
const type = fromTypeAnnotation(node.type, env);
env.setType(node.name, type);
return { ...node, type };
}
Constructing The Global Type Environment
Now we need to make a way to construct the global type environment, including types for core library functions.
We're going to loop over the items of the core library and set the name to the value of the contract
property on the core library function, defaulting to the Any
type if there is no contract.
We're not going to go back and add contracts to all the core library functions because we don't actually have the ability to parse function type annotations yet, so for now that means everything gets the Any
type. Obviously we'll change that in a future article.
Write the makeGlobalTypeEnv
function in src/typechecker/makeGlobalTypeEnv.js
:
import { theModule as Core } from "../../lib/js/core.js";
import { makeRuntime } from "../runtime/makeRuntime.js";
import { Type } from "./Type.js";
import { TypeEnvironment } from "./TypeEnvironment.js";
export const makeGlobalTypeEnv = () => {
const mod = Core.module(makeRuntime());
const typeEnv = TypeEnvironment.new(null, { name: "global" });
for (let [k, v] of Object.entries(mod)) {
typeEnv.set(k, v.contract ?? Type.any);
}
return typeEnv;
};
Now when we do add contracts for core library functions they'll be automatically added to the global type environment.
A Functional Wrapper
The last thing we're going to do with the type checker is create a functional wrapper like we have with the other stages of the compilation pipeline. In src/typechecker/typecheck.js
:
import { TypeChecker } from "./TypeChecker.js";
export const typecheck = (ast, env = undefined) =>
TypeChecker.new(ast, env).check();
Changes To The CLI
Now that we have everything set up for our type checker, let's make it so we can use it in our CLI.
First, we need to update our compile
function to add type checking to the pipeline. In src/cli/compile.js
:
import { tokenize } from "../lexer/tokenize.js";
import { read } from "../reader/read.js";
import { expand } from "../expander/expand.js";
import { parse } from "../parser/parse.js";
import { desugar } from "../desugarer/desugar.js";
import { emit } from "../emitter/emit.js";
import { typecheck } from "../typechecker/typecheck.js";
export const compile = (
input,
file = "stdin",
ns = undefined,
typeEnv = undefined
) =>
emit(
desugar(typecheck(parse(expand(read(tokenize(input, file)))), typeEnv)),
ns
);
We also need to add a type environment creator param for our compileAndBuild
function, in src/cli/compileAndBuild.js
:
import { makeGlobalNameMap } from "../runtime/makeGlobals.js";
import { emitGlobalEnv } from "../emitter/emitGlobalEnv.js";
import { build } from "./build.js";
import { compile } from "./compile.js";
import { makeGlobalTypeEnv } from "../typechecker/makeGlobalTypeEnv.js";
export const compileAndBuild = (
wandaCode,
{
fileName = "stdin",
contextCreator = emitGlobalEnv,
nsCreator = makeGlobalNameMap,
outName = "main.js",
moduleName = "main",
typeEnvCreator = makeGlobalTypeEnv,
} = {}
) => {
const contextCode = contextCreator();
const compiledCode = `${contextCode}
${moduleName}.result = ${compile(
wandaCode,
fileName,
nsCreator(),
typeEnvCreator()
)}`;
return build(compiledCode, outName, moduleName);
};
We're also going to make some changes so the REPL can take multiline input to make it easier to write do blocks. First, add the file src/cli/utils.js
and add these 2 functions:
export const countIndent = (str) => {
let indentCount = 0;
for (let char of str) {
if (char === "(" || char === "[" || char === "{") {
indentCount++;
} else if (char === ")" || char === "]" || char === "}") {
indentCount--;
}
}
return indentCount;
};
export const inputFinished = (input) => countIndent(input) === 0;
We don't have tokens in the language yet for square brackets and curly braces, but we will when we add Vectors and Records in the next article, so I'm just going ahead and putting them in this function now. That way we don't have to remember to edit it later.
Now we'll move the input variable outside of the while
loop, add a variable for indentation, and handle the case where the programmer has more open parens than closing ones.
We also need to create a type environment to store type information in the REPL.
Here's the new src/cli/repl.js
:
import os from "os";
import vm from "vm";
import readlineSync from "readline-sync";
import { pprintAST, pprintDesugaredAST } from "./pprint.js";
import { println } from "../printer/println.js";
import { makeGlobalNameMap } from "../runtime/makeGlobals.js";
import { emitGlobalEnv } from "../emitter/emitGlobalEnv.js";
import { build } from "./build.js";
import { compile } from "./compile.js";
import { makeGlobalTypeEnv } from "../typechecker/makeGlobalTypeEnv.js";
import { countIndent, inputFinished } from "./utils.js";
const read = (prompt) => readlineSync.question(prompt);
export const repl = (mode = "repl") => {
// Create global compile environment
const compileEnv = makeGlobalNameMap();
const typeEnv = makeGlobalTypeEnv();
// Build global module and instantiate in REPL context
// This should make all compiled global symbols available
const globalCode = build(emitGlobalEnv());
vm.runInThisContext(globalCode);
let prompt = "wanda> ";
let input = "";
let indent = 0;
while (true) {
try {
input += read(prompt + " ".repeat(indent));
switch (input) {
// If it's a command, execute it
case ":quit":
process.exit(0);
case ":print-ast":
mode = "printAST";
break;
case ":print-desugared":
mode = "printDesugared";
break;
// If it's code, compile and run it
default:
if (inputFinished(input)) {
let compiled = compile(input, "stdin", compileEnv, typeEnv);
let result = vm.runInThisContext(compiled);
if (mode === "printAST") {
console.log(pprintAST(input));
} else if (mode === "printDesugared") {
console.log(pprintDesugaredAST(input));
}
println(result);
input = "";
indent = 0;
} else {
input += os.EOL;
indent = countIndent(input);
}
}
} catch (e) {
console.error(e.message);
}
}
};
If you have a bin
directive in your package.json
file that points to bin/wanda.js
and you haven't already used npm link
in the directory, you can do so now. This will make wanda
available as a global system command.
Now let's see the type checker in action. Run wanda
in your terminal, then try to create a variable with a type annotation. Something like this:
(var (x : number) 10)
It should work, and if you input x
and hit enter in the CLI now you should get back the value 10.
If you try with a mismatched annotation and expression type, it should throw an error. Try it with something like:
(var (s : string) 7)
You should get an error. Neat, eh?
Conclusion
Now we've got variables, do blocks, and type checking. This language is really coming along! Next time we'll add Vectors and Records to the language, as well as extending the type checker to handle union types.
As always, you can see the current state of the code on the current tag in the Wanda repo. There are also tests you can run with npm run test
as well as JSDoc comments for nearly all functions and types (I can't promise I've remembered them all).
I'll see you in the next one!
-
The type checker implementation owes much to Jake Donham's Reconstructing TypeScript series ↩
Featured ones: