dev-resources.site
for different kinds of informations.
Create Your Own Programming Language 6: Functions
Welcome to the next installment in the Create Your Own Programming Language series! In this article we're going to add functions to the Wanda language.
As always, if you haven't read the previous article in the series on vectors and records go ahead and do that first.
Ok, let's go!
What We're Going To Do
The most important thing we're going to do today is add functions to our language. Functions are extremely important. In fact, when you have functions you can literally do anything with your language. You might not believe it, but literally everything in your programming language can be functions if you're dedicated enough.
That's one of my all-time favorite computer science talks and I strongly recommend you take the time to watch it (it's split into 2 parts in the playlist).
Adding functions is the most important thing we're doing today, but we'll also do a few other things to support the addition of functions to our language.
First, we'll improve our type annotations and how we parse them. We hadn't really thought through how type annotations were supposed to work, so we'll solidify that and implement a more uniform method of parsing them.
We'll also add desugaring for the first time.
As mentioned before, "desugaring" means converting more complex language forms into simpler core forms that make things easier for the compiler to handle.
To do that, we'll create a base Visitor
from which the desugarer will inherit so we only have to implement the methods we need on the Desugarer
class. We're going to make it so that function declarations, which use the keyword def
, desugar to the var
variable declaration form with a Lambda Expression as the value being assigned.
We're also going to make it so that all functions created in the language are curried. If you're not familiar with curried functions, a curried function is one that can take in one argument at a time. If you partially apply a curried function, i.e. give it fewer arguments than it needs to execute, you get back another function that only needs the remaining arguments before it will execute. That means we'll need to make some changes to how call expressions are checked in the type checker.
We're also going to allow functions to be variadic, which is just a fancy way of saying you can allow a function to take any number of arguments.
Finally, we'll add contracts for the functions in our core library. A contract is basically a type signature. It's a promise that this function will receive these types of arguments and produce that type of return value.
There are no changes to the lexer and reader for this article, so we'll start with the parser.
Changes To The Parser
We need to do 2 things with the parser: enable it to parse functions and fix how we handle type annotations. Let's do the latter first.
Fixing Type Annotations
Before now, it's been possible for type annotations to be 2 things: either a primitive or named (Symbol) value type (i.e. number) or what we could call a "generic" type (even though we don't have generics yet) where you have a main type and that type has one or more types it contains.
The two kinds of "generic" types we've used so far are list and vector types. We've been writing them like list (number)
.
This actually complicates parsing type annotations quite a bit, because we could have to parse 2 things from the list of forms or only 1 thing depending on which kind of annotation it is. I don't want to have that kind of complexity in parsing type annotations, so I'm going to change how we handle generic type annotations.
From now on, instead of writing list (number)
we're going to use (list number)
. That way regardless of whether it's a named type or a generic type the parser only has to handle a single form to parse the type annotation.
When we get generic types in full, we'll continue the same scheme so, i.e., a Result
type that can be either a string or an Exception
will get the annotation (Result string Exception)
. If you're not familiar with Result
types, don't worry about it for now; it's just an example of a generic type that has more than one type parameter.
We're also going to add type annotations for functions, which will be written like (argtype1 argtype2 ... -> returnType)
. Again, the type annotation parser only has to handle one form from the reader because the entire annotation will be inside a cons list.
For variadic functions with type annotations, we'll use the &
operator and make the rest parameter a vector.
A variadic function annotation could look something like this:
(number &(vector number) -> number)
This will greatly streamline parsing when it comes to handling type annotations, which will simplify things like parsing function parameters.
Parsing Type Annotations
Let's take a look at src/parser/parseTypeAnnotation.js
. We need to check if the annotation is a Cons
or a Symbol
. If it's a Cons
we'll flatten it into an array.
Next we need to see if it's a function or generic annotation. If it's a function annotation it will have an arrow. If it has an arrow, we'll handle it as a function annotation.
We'll filter the arrow out because once we know it's a function annotation the arrow has done its job. We'll parse the last item in the array as the return type and all the rest as parameter types. Then construct the annotation and it's ready to go.
We've also changed the way we handle a nil
type annotation. I found during further testing that the reader passes nil
back as a Nil
token, not as a Symbol
token.
Everything else stays the same as it was: if the annotation was generic we get the first item in the array as the container type then handle the 2nd item as the contained type. Otherwise we handle the annotation as a named type like number
or a type alias.
export const parseTypeAnnotation = (annotation) => {
if (annotation instanceof Cons) {
// is function or generic annotation
// flatten Cons to array
annotation = [...annotation];
// if it has an arrow, it's a function annotation
const hasArrow = annotation.reduce((hasArrow, item) => {
if (item.type === TokenTypes.Symbol && item.value === "->") {
return true;
}
return hasArrow;
}, false);
if (hasArrow) {
// is function annotation
// filter out arrow
annotation = annotation.filter((item) => item.value !== "->");
// get return type
const retType = parseTypeAnnotation(annotation.pop());
// get param types and if it's variadic
let params = [];
let variadic = false;
for (let item of annotation) {
if (item.type === TokenTypes.Amp) {
variadic = true;
continue;
} else {
params.push(parseTypeAnnotation(item));
}
}
return { kind: TATypes.Function, params, retType, variadic };
}
}
let annot;
if (Array.isArray(annotation)) {
// is generic annotation
// get container type
annot = annotation[0];
} else {
// is simple annotation
annot = annotation;
}
if (annot.type === "RecordLiteral") {
return parseObjectAnnotation(annot);
}
if (annot.type === TokenTypes.Nil) {
return { kind: TATypes.NilLiteral };
}
if (annot.type === TokenTypes.Symbol) {
switch (annot.value) {
case "any":
return { kind: TATypes.AnyLiteral };
case "number":
return { kind: TATypes.NumberLiteral };
case "string":
return { kind: TATypes.StringLiteral };
case "boolean":
return { kind: TATypes.BooleanLiteral };
case "keyword":
return { kind: TATypes.KeywordLiteral };
case "list":
// annotation is array with listType as 2nd member
return parseListAnnotation(annotation[1]);
case "vector":
return parseVectorAnnotation(annotation[1]);
default:
// must be a named type
return { kind: TATypes.Symbol, name: annot.value };
}
}
throw new Exception(
`Unknown type annotation kind ${JSON.stringify(annot, null, 2)}`
);
};
Parsing Functions
First, let's add AST nodes for the new forms: FunctionDeclaration and LambdaExpression.
We'll add 3 members (including one for function parameters) to the ASTTypes
enum in src/parser/ast.js
:
export const ASTTypes = {
// other nodes
Param: "Param",
FunctionDeclaration: "FunctionDeclaration",
LambdaExpression: "LambdaExpression",
};
And we'll add constructors for the nodes:
export const AST = {
// other constructors...
FunctionDeclaration(name, params, body, variadic, retType, srcloc) {
return {
kind: ASTTypes.FunctionDeclaration,
name,
params,
body,
variadic,
retType,
srcloc,
};
},
LambdaExpression(params, body, variadic, retType, srcloc) {
return {
kind: ASTTypes.LambdaExpression,
params,
body,
variadic,
retType,
srcloc,
};
},
};
There are 2 new nodes, so 2 different function forms we need to parse: function declarations and lambda expressions. Function declarations start with the def
symbol and lambdas start with fn
.
Let's add those cases to the switch
statement in parseList
in src/parser/parse.js
:
// other cases...
case "def":
return parseFunctionDeclaration(form);
case "fn":
return parseLambdaExpression(form);
// default case
We just delegate to the parser functions for these nodes, which are very similar. The biggest difference is that parseFunctionDeclaration
needs to handle the name of the function. We can delegate most of the work to a single parseFunction
function. We'll also handle parsing parameters with a separate parseParams
function since that will be a little bit of work.
Both function declarations and lambda expressions can have annotations both for parameter types and return types, so we'll need to handle both the cases where these are included and when they are omitted.
We'll use "maybe" variable names for the relevant parts of a function form.
Here are parseFunctionDeclaration
and parseLambdaExpression
:
const parseFunctionDeclaration = (form) => {
const [_, name, params, maybeArrow, maybeRetType, ...maybeBody] = form;
const srcloc = form.srcloc;
const parsedName = parseExpr(name);
const { parsedParams, parsedBody, variadic, retType } = parseFunction(
params,
maybeArrow,
maybeRetType,
maybeBody
);
return AST.FunctionDeclaration(
parsedName,
parsedParams,
parsedBody,
variadic,
retType,
srcloc
);
};
const parseLambdaExpression = (form) => {
const [_, params, maybeArrow, maybeRetType, ...maybeBody] = form;
const srcloc = form.srcloc;
const { parsedParams, parsedBody, variadic, retType } = parseFunction(
params,
maybeArrow,
maybeRetType,
maybeBody
);
return AST.LambdaExpression(
parsedParams,
parsedBody,
variadic,
retType,
srcloc
);
};
Then parseFunction
handles most of the real work:
const parseFunction = (params, maybeArrow, maybeRetType, maybeBody) => {
let retType, body;
if (maybeArrow.type === TokenTypes.Symbol && maybeArrow.value === "->") {
// has return type annotation
retType = parseTypeAnnotation(maybeRetType);
body = maybeBody;
} else {
retType = null;
body = maybeRetType
? [maybeArrow, maybeRetType, ...maybeBody]
: [maybeArrow, ...maybeBody];
}
const variadic = [...params].reduce((isVar, param) => {
if (param.type === TokenTypes.Amp) {
return true;
}
return isVar;
}, false);
const parsedParams = parseParams(params);
/** @type {AST[]} */
const parsedBody = body.map(parseExpr);
return {
parsedParams,
parsedBody,
variadic,
retType,
};
};
And parseParams
uses a for
loop over the parameter forms and handles the cases where they do and do not have type annotations:
const parseParams = (forms) => {
forms = [...forms];
/** @type {import("./ast.js").Param[]} */
let params = [];
for (let i = 0; i < forms.length; i++) {
const form = forms[i];
if (form.type === TokenTypes.Symbol) {
const name = parseExpr(form);
let typeAnnotation = null;
if (
forms[i + 1]?.type === TokenTypes.Keyword &&
forms[i + 1].value === ":"
) {
// has type annotation
typeAnnotation = parseTypeAnnotation(forms[i + 2]);
i += 2;
}
params.push({ kind: ASTTypes.Param, name, typeAnnotation });
} else if (form.type === TokenTypes.Amp) {
continue;
}
}
return params;
};
See how simple it is now that we only have to worry about a single form for a type annotation? This would be much more complicated to parse if we'd kept the old list (number)
type of annotation.
We also need to make a slight edit to parseVariableDeclaration
to handle the new type annotation paradigm:
const parseVariableDeclaration = (decl) => {
let [_, lhv, expression] = decl;
let parsedLhv,
typeAnnotation = null;
if (lhv instanceof Cons) {
// has type annotation
const realLhv = lhv.get(0);
// skip over : and get type annotation
typeAnnotation = lhv.cdr.cdr;
if (typeAnnotation.cdr === null) {
// is a simple annotation, otherwise it's a Cons type annotation
typeAnnotation = typeAnnotation.car;
}
// parse type annotation
typeAnnotation = parseTypeAnnotation(typeAnnotation);
parsedLhv = parseExpr(realLhv);
} else {
parsedLhv = parseExpr(lhv);
}
if (parsedLhv.kind === ASTTypes.VectorLiteral) {
parsedLhv = convertVectorLiteralToVectorPattern(parsedLhv);
}
const parsedExpression = parseExpr(expression);
return AST.VariableDeclaration(
parsedLhv,
parsedExpression,
decl.srcloc,
typeAnnotation
);
};
Changes to The Type Checker
Adding functions to our language presents a particular problem for the type checker: recursive functions won't have a type inferred for the return type. Even with type annotations, the function's name won't be set yet in the type environment because its body hasn't finished being processed yet.
That also means you can't define mutually recursive functions. Trying to call the 2nd function in the body of the 1st function will throw an error because the type of the 2nd function hasn't been defined yet.
To handle this issue, we're going to implement 2 passes in the type checker.
You'll see that when we get to the changes to the TypeChecker
class, but first let's implement the machinery we need to check function types.
As a nod towards handling the issue with the function name not being set in the environment, we're going to add a special internal type to the type checker called undefined
.
You'll remember that we actually already have function types because we needed them to implement call expressions a few articles back.
Add the undefined case to the TypeTypes
enum in src/typechecker/types.js
:
export const TypeTypes = {
// other members
Undefined: "Undefined",
};
Also add a constructor in src/typechecker/constructors.js
:
export const undefinedType = { kind: TypeTypes.Undefined };
And a validator in src/typechecker/validators.js
:
export const isUndefined = (type) => {
return type.kind === TypeTypes.Undefined;
};
A type should only ever be undefined
during the first pass of the type checker, and we need undefined
to always pass a subtype check so we'll treat it similarly to any
in src/typechecker/isSubtype.js
:
// other cases
if (Type.isUndefined(type1) || Type.isUndefined(type2)) return true;
// rest of function
We'll also need to handle subtyping functions. A function is a subtype of another function if all the parameters of one are subtypes of all the parameters of the other and the return type of the first is a subtype of the return type of the second.
// undefined case
if (Type.isFunctionType(type1) && Type.isFunctionType(type2)) {
return (
type1.params.length === type2.params.length &&
type1.params.every((a, i) => isSubtype(type2.params[i], a)) &&
isSubtype(type1.ret, type2.ret)
);
}
// rest of function
We also need to be able to parse a function type from a type annotation in fromTypeAnnotation
, in src/typechecker/fromTypeAnnotation.js
:
// other cases
case TATypes.Function: {
const paramTypes = typeAnnotation.params.map((p) =>
fromTypeAnnotation(p, typeEnv)
);
const retType = fromTypeAnnotation(typeAnnotation.retType, typeEnv);
return Type.functionType(paramTypes, retType, typeAnnotation.variadic);
}
// default case
We need to be able to infer types for functions. We can use the same inferFunction
function to handle both function declarations and lambda expressions, since the type is what matters here and not the function name.
In src/typechecker/infer.js
we add cases for LambdaExpression
and FunctionDeclaration
to the switch
statement in the infer
function:
// other cases
case ASTTypes.LambdaExpression:
return inferFunction(ast, env);
case ASTTypes.FunctionDeclaration:
return inferFunction(ast, env);
// default case
Note that in inferFunction
the environment that will be passed into infer
will be the extended function environment, not the global environment.
We map
over the function node's parameters. If the parameters have type annotations, we turn checking on.
If the parameters have type annotations we construct the type from the annotation; otherwise we set them to any
.
We set each parameter name with its type in the environment and return the type from the map
callback to get an array of types for the parameters.
If the node has an annotated return type we make sure type checking is on.
If there's a return type annotation we construct that type, otherwise we assign any
. Then we loop over the function body and infer the return type from the function's body using the extended function environment.
If the inferred body type is a subtype of the annotated return type, everything is good and we construct and return the function type.
If the function wasn't annotated (in which case its return annotation type is any
) or has been inferred as undefined
, we use the inferred return type. Otherwise, if there was a function annotation, we use the return annotation type.
Here's inferFunction
:
const inferFunction = (node, env) => {
const params = node.params.map((p) => {
if (p.typeAnnotation) {
env.checkingOn = true;
}
const type = p.typeAnnotation
? fromTypeAnnotation(p.typeAnnotation, env)
: Type.any;
env.set(p.name.name, type);
return type;
});
if (node.retType) {
env.checkingOn = true;
}
const retType = node.retType
? fromTypeAnnotation(node.retType, env)
: Type.any;
let inferredRetType;
for (let expr of node.body) {
// type of last expression "wins"
inferredRetType = infer(expr, env);
}
if (env.checkingOn && !isSubtype(inferredRetType, retType)) {
throw new TypeException(
`Inferred return type ${Type.toString(
inferredRetType
)} is not a subtype of annotated return type ${Type.toString(retType)}`,
node.srcloc
);
}
return Type.functionType(
params,
Type.isAny(retType) || Type.isUndefined(retType)
? inferredRetType
: retType,
node.variadic
);
};
We also need to make some changes to inferCallExpression
.
Since we now will have curried functions that can be partially applied, we don't need to throw an error if the call expression has fewer arguments than the type of the function. Instead, we'll construct a new function type using the remaining parameter types and the function's return type and return that.
Let's also do some refactoring and delegate the actual argument checking to a function called checkArgTypes
.
Here's the new version of inferCallExpression
with checkArgTypes
:
const inferCallExpression = (node, env) => {
const func = infer(node.func, env);
if (Type.isAny(func)) {
return Type.any;
} else if (Type.isUndefined(func) || Type.isUndefined(func.ret)) {
// this should only happen during first typechecker pass
return Type.undefinedType;
}
if (!func.variadic && node.args.length > func.params.length) {
throw new TypeException(
`Too many arguments for function: ${node.args.length} given; ${func.params.length} expected`,
node.srcloc
);
}
// handle partially applied functions
if (
node.args.length <
(func.variadic ? func.params.length - 1 : func.params.length)
) {
// is partially applied
const params = func.params.slice(0, node.args.length);
if (env.checkingOn) {
checkArgTypes(node, params, env, func);
}
const newParams = func.params.slice(node.args.length);
return Type.functionType(newParams, func.ret, func.variadic);
}
if (env.checkingOn) {
checkArgTypes(node, func.params, env, func);
}
return func.ret;
};
const checkArgTypes = (node, params, env, func) => {
node.args.forEach((arg, i) => {
const argType = infer(arg, env);
if (func.variadic && i >= params.length - 1) {
// is part of rest args
let p = params[params.length - 1];
if (!p.vectorType) {
throw new TypeException(
`Rest parameter type must be vector; ${Type.toString(p)} given`,
arg.srcloc
);
}
p = p.vectorType;
if (!isSubtype(argType, p)) {
throw new TypeException(
`${Type.toString(argType)} is not a subtype of ${Type.toString(p)}`,
arg.srcloc
);
}
} else {
const p = params[i];
if (!isSubtype(argType, p)) {
throw new TypeException(
`${Type.toString(argType)} is not a subtype of ${Type.toString(p)}`,
arg.srcloc
);
}
}
});
};
Note that checkArgTypes
has to handle the case of a variadic function where the call expression can have more arguments than the number of parameters in the function type.
In src/typechecker/check.js
we need a case for function types. Add this check to check
above const inferredType = infer(ast, env);
:
if (
(ast.kind === ASTTypes.FunctionDeclaration ||
ast.kind === ASTTypes.LambdaExpression) &&
Type.isFunctionType(type)
) {
return checkFunction(ast, type, env);
}
checkFunction
checks the type of each parameter in the function type against each parameter in the AST node, then checks the return types. Here's checkFunction
:
const checkFunction = (ast, type, env) => {
if (type.params.length !== ast.params.length) {
throw new TypeException(
`Expected ${type.params.length} args; got ${ast.params.length}`,
ast.srcloc
);
}
const maybeType = env.get(ast.name?.name);
const funcType = maybeType ? maybeType : infer(ast, env);
type.params.forEach((p, i) => {
const pType = funcType.params[i];
if (!isSubtype(pType, p)) {
throw new TypeException(
`${Type.toString(pType)} is not a valid subtype of ${Type.toString(p)}`,
ast.srcloc
);
}
});
if (!isSubtype(funcType.ret, type.ret)) {
throw new TypeException(
`${Type.toString(funcType.ret)} is not a valid subtype of ${Type.toString(
type.ret
)}`,
ast.srcloc
);
}
};
We already have a string conversion for function types, but we also need one for undefined
.
In the switch
statement in src/typechecker/typeToString.js
:
// other cases
case TypeTypes.Undefined:
return "undefined";
// default case
The biggest changes to the type checker will be in the TypeChecker
class as we implement a two-pass typechecker.
The first pass is just to populate the environment, including getting all function names added to the environment (even if the function types return undefined
).
Then the 2nd pass actually infers and checks all types now that the environment has been set up.
First, at the top of the src/typechecker/TypeChecker.js
file (above the TypeChecker
class declaration), add this:
let isSecondPass = false;
Next, rename the check
method on the TypeChecker
class to checkNode
.
If you're using VS Code you can do this easily by highlighting the name of the check
method and pressing F2
. Other IDEs should have similar refactoring capabilities, so you shouldn't have to search and replace the text.
Now add a new check
method that implements the 2-pass checking:
check(env = this.env) {
// first pass is to populate environments so valid forward references will resolve
const firstPassProgram = this.checkNode(this.program, env);
isSecondPass = true;
return this.checkNode(firstPassProgram, env);
}
To keep track of all the environments, we're actually going to add the extended environments to the nodes they're there for. That means with DoExpression
, FunctionDeclaration
, and LambdaExpression
nodes we attach the extended environment to the node itself.
Here's the edited version of checkDoExpression
:
checkDoExpression(node, env) {
if (!node.env) {
// during the first pass this will create the child env
node.env = env.extend("DoExpression");
}
const doEnv = node.env;
/** @type {TypedAST[]} */
let body = [];
for (let expr of node.body) {
const node = this.checkNode(expr, doEnv);
body.push(node);
}
return {
kind: node.kind,
body,
srcloc: node.srcloc,
type: infer(node, doEnv),
};
}
Next let's add the checkFunctionDeclaration
method. Similarly to checkDoExpression
, we check the node for the existence of an env
property and, if we don't find it, we add it.
Next we infer the type of the function using the new function environment.
We have a nested environment, so if checking has been turned on in the nested environment we need to be sure to propagate that change to the parent environment. Also, if checking is on, we need to check
the node.
Finally, we set the function type to its name in the parent environment and return the type-annotated AST node.
checkFunctionDeclaration(node, env) {
if (!node.env) {
node.env = env.extend(node.name.name);
}
const funcEnv = node.env;
const type = infer(node, funcEnv);
if (funcEnv.checkingOn && isSecondPass) {
env.checkingOn = funcEnv.checkingOn;
check(node, type, funcEnv);
}
env.set(node.name.name, type);
return { ...node, type };
}
checkLambdaExpression
is almost exactly the same except without setting the function type to its name in the environment:
checkLambdaExpression(node, env) {
if (!node.env) {
node.env = env.extend("Lambda");
}
const funcEnv = node.env;
const type = infer(node, funcEnv);
if (funcEnv.checkingOn) {
env.checkingOn = funcEnv.checkingOn;
check(node, type, funcEnv);
}
return { ...node, type };
}
We also need to make sure in checkCallExpression
that we don't return undefined
as the type from a function on the 2nd checker pass:
checkCallExpression(node, env) {
// infer handles checking for argument types
let type = infer(node, env);
if (Type.isUndefined(type) && isSecondPass) {
type = Type.any;
}
return { ...node, type };
}
That shouldn't ever happen, but just in case it does we'll handle it there.
Finally, we need to handle the new two-pass checking in checkSymbol
:
checkSymbol(node, env) {
try {
let type = infer(node, env);
if (Type.isUndefined(type)) {
// this should never happen on the 2nd pass
type = Type.any;
env.set(node.name, type);
}
return { ...node, type };
} catch (e) {
if (!isSecondPass) {
env.set(node.name, Type.undefinedType);
}
}
}
Adding a Generic Visitor
You know by now that for compiler passes over the AST we've been using the Visitor Pattern to execute a postorder traversal over all the nodes of the syntax tree.
We haven't been able to do this until now, because we've needed compile-time environments in both the traversal passes we've implemented thus far, but now we can actually implement a generic Visitor
with a method for each node that we use to transform and return new versions of the syntax tree.
We're going to use it for desugaring the typechecked AST into some core forms used by the emitter, which will allow us to simplify the emitter and not have to implement emitter methods for every kind of AST node.
The generic Visitor
simply takes in a node with each method, visits any child nodes, and then returns a node exactly like the original.
Then in the Desugarer
class we only have to implement the methods we need.
Here's the Visitor
class in src/visitor/Visitor.js
:
import { AST, ASTTypes } from "../parser/ast.js";
import { SyntaxException } from "../shared/exceptions.js";
export class Visitor {
constructor(program) {
this.program = program;
}
static new(program) {
return new Visitor(program);
}
visit(node = this.program) {
switch (node.kind) {
case ASTTypes.Program:
return this.visitProgram(node);
case ASTTypes.NumberLiteral:
return this.visitNumberLiteral(node);
case ASTTypes.StringLiteral:
return this.visitStringLiteral(node);
case ASTTypes.BooleanLiteral:
return this.visitBooleanLiteral(node);
case ASTTypes.KeywordLiteral:
return this.visitKeywordLiteral(node);
case ASTTypes.NilLiteral:
return this.visitNilLiteral(node);
case ASTTypes.Symbol:
return this.visitSymbol(node);
case ASTTypes.CallExpression:
return this.visitCallExpression(node);
case ASTTypes.VariableDeclaration:
return this.visitVariableDeclaration(node);
case ASTTypes.SetExpression:
return this.visitSetExpression(node);
case ASTTypes.DoExpression:
return this.visitDoExpression(node);
case ASTTypes.TypeAlias:
return this.visitTypeAlias(node);
case ASTTypes.VectorLiteral:
return this.visitVectorLiteral(node);
case ASTTypes.VectorPattern:
return this.visitVectorPattern(node);
case ASTTypes.RecordLiteral:
return this.visitRecordLiteral(node);
case ASTTypes.RecordPattern:
return this.visitRecordPattern(node);
case ASTTypes.MemberExpression:
return this.visitMemberExpression(node);
case ASTTypes.Param:
return this.visitParam(node);
case ASTTypes.FunctionDeclaration:
return this.visitFunctionDeclaration(node);
case ASTTypes.LambdaExpression:
return this.visitLambdaExpression(node);
default:
throw new SyntaxException(node.kind, node.srcloc);
}
}
visitBooleanLiteral(node) {
return node;
}
visitCallExpression(node) {
const func = this.visit(node.func);
const args = node.args.map(this.visit.bind(this));
return { ...node, func, args };
}
visitDoExpression(node) {
let body = [];
for (let n of node.body) {
body.push(this.visit(n));
}
return { ...node, body };
}
visitFunctionDeclaration(node) {
const params = node.params.map(this.visit.bind(this));
let body = [];
for (let expr of node.body) {
body.push(this.visit(expr));
}
return { ...node, params, body };
}
visitKeywordLiteral(node) {
return node;
}
visitLambdaExpression(node) {
const params = node.params.map(this.visit.bind(this));
let body = [];
for (let expr of node.body) {
body.push(this.visit(expr));
}
return { ...node, params, body };
}
visitMemberExpression(node) {
const object = this.visit(node.object);
const property = this.visit(node.property);
return { ...node, object, property };
}
visitNilLiteral(node) {
return node;
}
visitNumberLiteral(node) {
return node;
}
visitParam(node) {
return node;
}
visitProgram(node) {
let body = [];
for (let n of node.body) {
body.push(this.visit(n));
}
return { ...node, body };
}
visitRecordLiteral(node) {
return node;
}
visitRecordPattern(node) {
return node;
}
visitSetExpression(node) {
const lhv = this.visit(node.lhv);
const expression = this.visit(node.expression);
return { ...node, lhv, expression };
}
visitStringLiteral(node) {
return node;
}
visitSymbol(node) {
return node;
}
visitTypeAlias(node) {
return node;
}
visitVariableDeclaration(node) {
const lhv = this.visit(node.lhv);
const expression = this.visit(node.expression);
return { ...node, lhv, expression };
}
visitVectorLiteral(node) {
return node;
}
visitVectorPattern(node) {
return node;
}
}
Note that some methods map
over arrays of nodes, which means they need to use the bind
method on the method they pass into map
.
Now it's time to desugar FunctionDeclaration
nodes into VariableDeclaration
nodes with LambdaExpression
values.
We'll create a new file src/desugarer/Desugarer.js
and stub out the Desugarer
class:
import { AST } from "../parser/ast.js";
import { TATypes } from "../parser/parseTypeAnnotation.js";
import { Visitor } from "../visitor/Visitor.js";
export class Desugarer extends Visitor {
constructor(program) {
super(program);
}
static new(program) {
return new Desugarer(program);
}
}
Since the base class handles all other nodes, the only method we need to implement is visitFunctionDeclaration
. Remember that this is after type checking, so the AST we're getting in this case is already annotated with type information.
The emitter expects to have that type information, so we need to make sure our visitFunctionDeclaration
node handles it.
We construct a LambdaExpression
node from the FunctionDeclaration
node, use the node's name
property to create the variable name, and handle type information accordingly.
visitFunctionDeclaration(node) {
const variadic = node.variadic;
// since it's TypedAST it has a type property
const type = node.type;
const lambda = AST.LambdaExpression(
node.params,
node.body,
variadic,
node.retType,
node.srcloc
);
lambda.type = type;
const paramTypes = node.params.map(
(p) => p.typeAnnotation ?? { kind: TATypes.AnyLiteral }
);
const retType = node.retType ?? { kind: TATypes.AnyLiteral };
const funcType = {
kind: TATypes.Function,
params: paramTypes,
retType,
variadic,
};
const varDecl = AST.VariableDeclaration(
node.name,
lambda,
node.srcloc,
funcType
);
varDecl.type = type;
return varDecl;
}
I bet that's way simpler than you thought it would be at first. I was amazed myself at how straightforward it is the first time I did this.
Now we need to use the Desugarer
class. In src/desugarer/desugar.js
we rewrite the desugar
function:
import { Desugarer } from "./Desugarer.js";
export const desugar = (ast) => Desugarer.new().visit(ast);
Changes to The Runtime
First, we're going to use a package to handle currying functions. Ramda is a good choice, and it also has a lot of other goodies if we decide we want to use them.
npm install ramda
We also need a function to parse contracts so we can implement them on library functions defined in JavaScript.
Let's add the parseContract
function to the new file src/runtime/parseContract.js
:
import { tokenize } from "../lexer/tokenize.js";
import { read } from "../reader/read.js";
import { parseTypeAnnotation } from "../parser/parseTypeAnnotation.js";
import { fromTypeAnnotation } from "../typechecker/fromTypeAnnotation.js";
export const parseContract = (code) =>
fromTypeAnnotation(parseTypeAnnotation(read(tokenize(code)).car));
Since we're only ever going to read in one contract at a time, we can take the car
property off the output from read
and use it as the input to parseTypeAnnotation
.
Now we need to rewrite makeFunction
in src/runtime/makeFunction.js
to handle currying everything, as well as creating contracts for library functions.
We'll pass an optional 2nd parameter into makeFunction
that's an options object.
Right now the only option is contract
, but that could change in the future.
We'll also add a hash of the function object as a name property since the compiler loses the name.
Here's the new version of makeFunction
:
import objectHash from "object-hash";
import { curryN } from "ramda";
import { makeWandaValue } from "./conversion.js";
import { addMetaField } from "./object.js";
import { parseContract } from "./parseContract.js";
export const makeFunction = (func, { contract = "" } = {}) => {
let fn = curryN(func.length, (...args) => {
const val = makeWandaValue(func(...args));
if (typeof val === "function") {
return makeFunction(val);
}
return val;
});
const hash = objectHash(fn);
addMetaField(fn, "wanda", true);
addMetaField(fn, "arity", func.length);
addMetaField(fn, "name", hash);
if (contract !== "") {
Object.defineProperty(fn, "contract", {
enumerable: false,
configurable: false,
writable: false,
value: parseContract(contract),
});
}
Object.defineProperty(fn, "name", {
enumerable: false,
configurable: false,
writable: false,
value: hash,
});
return fn;
};
Now we can add contracts to functions from the core library, which we'll do in a minute. First, though, I want to handle emitting code for functions.
Since we're desugaring away FunctionDeclaration
nodes, we only need to handle the case for LambdaExpression
in the emitter.
Add it to the switch
statement in the emit
method in src/emitter/Emitter.js
:
// other cases
case ASTTypes.LambdaExpression:
return this.emitLambdaExpression(node, ns);
// default case
We're going to emit the parameters into an array of strings so we can easily join them with ", "
in the final output.
We need to handle the case of a variadic function by outputting ...
before the rest parameter.
For the function body we loop over the body and emit code for each node.
Finally, we wrap the whole thing in rt.makeFunction
.
Here's emitLambdaExpression
in its entirety:
emitLambdaExpression(node, ns) {
const funcNs = ns.extend("Lambda");
/** @type {string[]} */
let params = [];
let i = 0;
for (let p of node.params) {
funcNs.set(p.name.name, makeSymbol(p.name.name));
if (node.variadic && i === node.params.length - 1) {
params.push(`...${this.emit(p.name, funcNs)}`);
} else {
params.push(this.emit(p.name, funcNs));
}
i++;
}
let code = `rt.makeFunction((${params.join(", ")}) => {\n`;
let j = 0;
for (let expr of node.body) {
if (j === node.body.length - 1) {
code += `return ${this.emit(expr, funcNs)};`;
} else {
code += this.emit(expr, funcNs) + ";\n";
}
j++;
}
code += "\n})";
return code;
}
That's it for changes to the emitter!
Changes to The Printer
We need to add a case for functions to the switch
statement in the printString
function in src/printer/printString.js
:
// other cases...
case "function":
return `Function: ${value.name || "lambda"}`;
// case "object"...
We need to add cases for both nodes to the print
method in src/printer/printAST.js
:
// other cases...
case ASTTypes.FunctionDeclaration:
return this.printFunctionDeclaration(node, indent);
case ASTTypes.LambdaExpression:
return this.printLambdaExpression(node, indent);
// default case
And a printFunctionDeclaration
method:
printFunctionDeclaration(node, indent) {
let prStr = `${prIndent(indent)}FunctionDeclaration:\n`;
prStr += `${prIndent(indent + 2)}Name: ${node.name.name}\n`;
prStr += `${prIndent(indent + 2)}Params:\n`;
for (let param of node.params) {
prStr += this.print(param.name, indent + 4) + "\n";
}
prStr += `${prIndent(indent + 2)}Body:\n`;
for (let expr of node.body) {
prStr += this.print(expr, indent + 4) + "\n";
}
return prStr;
}
And a printLambdaExpression
method:
printLambdaExpression(node, indent) {
let prStr = `${prIndent(indent)}LambdaExpression:\n`;
prStr += `${prIndent(indent + 2)}Params:\n`;
for (let param of node.params) {
prStr += this.print(param.name, indent + 4) + "\n";
}
prStr += `${prIndent(indent + 2)}Body:\n`;
for (let expr of node.body) {
prStr += this.print(expr, indent + 4) + "\n";
}
return prStr;
}
Changes to The Core Library
You'll see with all these function contracts that there are still a lot of any
annotations.
That will start to change as we add new and better types to our type system, but for now everything is pretty basic. We have to use any
in a lot of cases because right now there isn't a better way to do it.
Adding generics will go a long way towards improving this, as will union and intersection types.
Also note that I've added 4 higher-order functions that work on lists to the core library. Eventually I'd like them to work on any iterable type, like vectors, but it will take some time to be able to make that work with the type system.
I'm just going to include the whole core library here because as you'll notice I've made some small edits to a few functions here and there.
If you read closely, you'll note that I've used a from
method on the Cons
class that you've not seen before. Here's the definition for that in src/shared/cons.js
:
static from(iter) {
return Cons.of(...iter);
}
Now for the core library in lib/js/core.js
:
import equal from "fast-deep-equal/es6/index.js";
import { makeModule } from "../../src/runtime/Module.js";
import { Cons, cons } from "../../src/shared/cons.js";
import { print } from "../../src/printer/print.js";
import { println } from "../../src/printer/println.js";
import { printString } from "../../src/printer/printString.js";
import { isTruthy } from "../../src/runtime/utils.js";
import { Exception } from "../../src/shared/exceptions.js";
export const theModule = makeModule("Core", (rt, ns) => {
const isList = (obj) => {
if (!rt.isNil(obj) && !(obj instanceof Cons)) {
return false;
} else if (rt.isNil(obj)) {
return true;
}
// only option left is cons
return isList(obj.cdr);
};
return {
print: rt.makeFunction(print, { contract: "(any -> string)" }),
println: rt.makeFunction(println, { contract: "(any -> string)" }),
cons: rt.makeFunction(cons, { contract: "(any, any -> (list any))" }),
car: rt.makeFunction((pair) => pair.car, {
contract: "((list any) -> any)",
}),
cdr: rt.makeFunction((pair) => pair.cdr, {
contract: "((list any) -> any)",
}),
string: rt.makeFunction(printString, { contract: "(any -> string)" }),
number: rt.makeFunction(Number, { contract: "(string -> number)" }),
boolean: rt.makeFunction((val) => isTruthy(val), {
contract: "(any -> boolean)",
}),
keyword: rt.makeFunction((str) => Symbol.for(":" + str), {
contract: "(string -> keyword)",
}),
"+": rt.makeFunction(
(a, b, ...nums) => nums.reduce((sum, n) => sum + n, a + b),
{ contract: "(number, number, &(vector number) -> number)" }
),
"-": rt.makeFunction(
(a, b, ...nums) => nums.reduce((diff, n) => diff - n, a - b),
{ contract: "(number, number, &(vector number) -> number)" }
),
"*": rt.makeFunction(
(a, b, ...nums) => nums.reduce((prod, n) => prod * n, a * b),
{ contract: "(number, number, &(vector number) -> number)" }
),
"/": rt.makeFunction(
(a, b, ...nums) => nums.reduce((quot, n) => quot / n, a / b),
{ contract: "(number, number, &(vector number) -> number)" }
),
"%": rt.makeFunction(
(a, b, ...nums) => nums.reduce((quot, n) => quot % n, a % b),
{ contract: "(number, number, &(vector number) -> number)" }
),
"=": rt.makeFunction((a, b) => a === b, {
contract: "(number, number -> boolean)",
}),
">": rt.makeFunction((a, b) => a > b, {
contract: "(number, number -> boolean)",
}),
">=": rt.makeFunction((a, b) => a >= b, {
contract: "(number, number -> boolean)",
}),
"<": rt.makeFunction((a, b) => a < b, {
contract: "(number, number -> boolean)",
}),
"<=": rt.makeFunction((a, b) => a <= b, {
contract: "(number, number -> boolean)",
}),
not: rt.makeFunction((x) => !x, { contract: "(any -> boolean)" }),
list: rt.makeFunction((...args) => Cons.from(args), {
contract: "(&(vector any) -> (list any))",
}),
length: rt.makeFunction(
(obj) => {
if (obj instanceof Cons) {
let i = 0;
for (let _ of obj) {
i++;
}
return i;
}
return obj.length;
},
{ contract: "((list any) -> number)" }
),
get: rt.makeFunction(
(n, obj) => {
const value = obj.get(n);
if (value === undefined) {
throw new Exception(`Value for index ${n} not found on object`);
}
return value;
},
{ contract: "((list any) -> any)" }
),
"list?": rt.makeFunction(isList, { contract: "((list any) -> boolean)" }),
"pair?": rt.makeFunction((obj) => obj instanceof Cons, {
contract: "((list any) -> boolean)",
}),
"number?": rt.makeFunction((obj) => typeof obj === "number", {
contract: "(any -> boolean)",
}),
"string?": rt.makeFunction((obj) => typeof obj === "string", {
contract: "(any -> boolean)",
}),
"boolean?": rt.makeFunction((obj) => typeof obj === "boolean", {
contract: "(any -> boolean)",
}),
"nil?": rt.makeFunction((obj) => obj == null, {
contract: "(any -> boolean)",
}),
"keyword?": rt.makeFunction(
(obj) => typeof obj === "symbol" && obj.description.startsWith(":"),
{ contract: "(any -> boolean)" }
),
"equal?": rt.makeFunction(
(a, b) => {
if (rt.hasDict(a) && rt.hasDict(b)) {
return equal(rt.getMetaField(a, "dict"), rt.getMetaField(b, "dict"));
}
return equal(a, b);
},
{ contract: "(any, any) -> boolean" }
),
"is?": rt.makeFunction((a, b) => Object.is(a, b), {
contract: "(any, any -> boolean)",
}),
append: rt.makeFunction(
(obj1, obj2) => {
if (typeof obj1 === "string" && typeof obj2 === "string") {
return obj1 + obj2;
} else if (obj1 instanceof Cons) {
return Cons.from(obj1).append(obj2);
} else if (Array.isArray(obj1)) {
return [...obj1].push(obj2);
} else {
rt.failRuntime(`Value of type ${typeof obj1} cannot be appended`);
}
},
{ contract: "(any, any -> any)" }
),
with: rt.makeFunction(
(rec1, rec2) => {
const newDict = Object.assign(
{},
rt.getMetaField(rec1, "dict"),
rt.getMetaField(rec2, "dict")
);
return rt.makeObject(newDict);
},
{ contract: "(any, any -> any)" }
),
prop: rt.makeFunction((prop, obj) => rt.getField(obj, prop), {
contract: "(string, any -> any)",
}),
each: rt.makeFunction(
(fn, lst) => {
for (let item of lst) {
fn(item);
}
},
{
contract: "((any -> nil), (list any) -> nil)",
}
),
map: rt.makeFunction(
(fn, lst) => {
let mapped = [];
for (let item of lst) {
mapped.push(fn(item));
}
return Cons.from(mapped);
},
{
contract: "((any -> any), (list any) -> (list any))",
}
),
filter: rt.makeFunction(
(fn, lst) => {
let filtered = [];
for (let item of lst) {
if (rt.isTruthy(fn(item))) {
filtered.push(item);
}
}
return Cons.from(filtered);
},
{
contract: "((any -> boolean), (list any) -> (list any))",
}
),
fold: rt.makeFunction(
(fn, init, lst) => {
let acc = init;
for (let item of lst) {
acc = fn(acc, item);
}
return acc;
},
{
contract: "((any, any -> any), any, (list any) -> any)",
}
),
"fold-r": rt.makeFunction(
(fn, init, lst) => {
let acc = init;
const reversed = [...lst].reverse();
for (let item of reversed) {
acc = fn(acc, item);
}
return acc;
},
{
contract: "((any, any -> any), any, (list any) -> any)",
}
),
};
});
Conclusion
That's it for adding functions to the language! Fire up the REPL with the wanda
command and take them for a spin.
(def add-3 (a: number, b: number, c: number) -> number
(+ a b c))
(add-3 1 2 3) ;-> 6
As always you can see the state of the code as of the end of this article on the relevant tag at the Wanda repo.
Next time we're going to do a lot of work with type checking. We'll add tuple, union, and intersection types to Wanda as well as :as
expressions that will tell the type checker to treat a value as a specific type for times when it needs a little help.
I'll see you then!
Featured ones: