Getting the basics down: compiler from scratch pt. 1

Published 2/1/202418 minute read

compiler from scratchjavaparsing

I like compilers, so I want to write one. Now, production quality compilers for name-brand languages, like GCC, weigh well into the millions of lines¹. Even QuickJS and TCC (Tiny C Compiler), which aim to be small implementations of JavaScript and C respectively, come in at around 60k LOC each. Modern programming languages are fundamentally sophisticated beasts, and compilers are just inherently intricate, so its safe to say writing a real-world compiler is a significant undertaking.

Despite my best efforts, I haven't dissuaded myself, but I still don't have the patience (or probably skill) to write something like QuickJS or TCC. Instead, I want to write a compiler whose main purpose is fun, learning and exploration, and I want to document the process here. Hopefully anyone else writing a compiler — or really anyone else interested in compilers — will get something out of this series.

In this post, I'm going to go over the very basics of building a compiler in a pretty naive way, and I'll add some sophistication later. Actually, the "compiler" will be an AST interpreter for now, since I feel like thats a good stepping stone anyway.

What we'll implement

This project is contrived, so I'll be implementing a fairly contrived language. The language will be a very stripped down version of C, and since this is a toy project, I'll readily simplify the language to make the compiler work easier. I'm going to write the compiler in Java, because it's widely known, pretty easy to debug, and the garbage collector will come in handy. If I was trying to write a fast compiler, I definitely wouldn't use Java, but I'm not, so I will.

By the end of this post I'll have the following functionality up and running:

Lexing and parsing source code into an AST.
Some truly primitive validation of said AST.
A pretty trivial interpreter that just walks the AST.
Functions, parameters, variables, assignments, returns, integers, and nothing else, like calls or arithmetic.

We should end up with a project structure a bit like this:

Defining the AST classes

In compilers, there is the idea of an intermediate representation, which is whatever data structure the compiler uses to model the program. Source code is a representation, and is modelled by the "data structure" java.lang.String. This serves us poorly though. Imagine writing a complex algorithm operating solely on the characters of the source code. A good intermediate representation represents different things differently, represents similar things similarly, and doesn't represent anything we don't care about (like whitespace). We seek structure, uniformity, and simplicity. An abstract syntax tree, or AST, represents a program as tree of nodes, where each node corresponds to a piece of syntax, such as a function, statement, or expression.

For example, this program:

1int main(int a) {
2    int b;
3    b = a;
4    return b;
5}

Has this abstract syntax tree:

You can see how each piece of information in the program is structured based on "what belongs to what". This intermediate representation will serve us quite well to begin with. Note that there is some structure missing from the AST — maybe the return b node should have links to the declaration of b, or maybe a link to the type int. There is also no link to the next thing to be executed, for example. We can define "analysis passes" which will navigate the AST to fill in these missing details, but in more advanced cases, this strategy will be superseeded by better IRs, such as instruction based single static assignment forms², or a sea of nodes representation³.

We will model our AST as an abstract class where base classes are the different types of nodes. In the compiler, we will make extensive use of Java's new sealed classes, where a superclass lists out all possible subclasses in its declaration. This will let us use pattern matching in switch statements, which will simply our code slightly. This is essentially a glorified, safer way to do lots of instanceof checks, and we will use it in lieu of discriminated unions (like Rust's enum).

java
The base class of all AST nodes

 1public abstract sealed class Ast permits BlockAst, ExprAst, FunctionAst,
 2        IdentAst, ProgramAst, StmtAst, TyAst, VarDeclAst {
 3
 4    // An AST has a parent node, unless it is the root `ProgramAst`
 5    private Ast parent;
 6
 7    // An AST has zero or more children, all of which are also AST nodes
 8    public abstract List<? extends Ast> getChildren();
 9
10    // Utility to search down the AST
11    public <T extends Ast> T findDescendant(Class<T> cls) {
12        // ...
13    }
14
15    // Utility to search up the AST
16    public <T extends Ast> T findAncestor(Class<T> cls) {
17        // ...
18    }
19}

I've also defined ExprAst (expressions), StmtAst (statements), and TyAst (types) as abstract subclasses of Ast. Since so many pieces of syntax fall into these categories, its nice to specify them upfront.

This is also a good point to mention that I will leave out a lot of code for brevity, such as getters and setters. As such, I won't list out all the code for every AST node, but FunctionAst is fairly illustrative:

java
The class for function ASTs

 1public final class FunctionAst extends Ast {
 2
 3    private IdentAst name;
 4
 5    // VarDecl means variable declaration
 6    private List<VarDeclAst> params;
 7
 8    private TyAst returnTy;
 9
10    private BlockAst body;
11
12    public FunctionAst(IdentAst name, List<VarDeclAst> params,
13            TyAst returnTy, BlockAst body) {
14        this.name = name;
15        this.params = new ArrayList<>(params);
16        this.returnTy = returnTy;
17        this.body = body;
18    }
19
20    @Override
21    public List<? extends Ast> getChildren() {
22        var results = new ArrayList<Ast>();
23        results.add(name);
24        results.addAll(params);
25        results.add(returnTy);
26        results.add(body);
27        return results;
28    }
29}

Note how we can access the children of a function in a hetrogenous way, e.g. through the params field, but can also access its children in a homogenous way, through the getChildren method. These are both convenient at different times.

One issue with this design is that we can't, and don't, set parent in the constructor. This would be a chicken and egg problem, since we pass children into most AST constructors. Instead, we can use the assignParents utility function to make sure the parent field is set correctly for the whole tree. We will call this once after we finish building an AST.

java
This fixes up the parent pointers for all nodes

 1public abstract sealed class Ast
 2        permits /* ... */ {
 3
 4    // ...
 5
 6    public void assignParents() {
 7        for (var child : getChildren()) {
 8            child.setParent(this);
 9            child.assignParents();
10        }
11    }
12}

We will eventually have quite a lot of AST classes, but I've put the ones we have now below. Black represents concrete classes and composition, blue represents abstract classes and inheritance:

Lexing and parsing

Parsing is the process of turning our source code (as a string), into an AST. Since hand-written parsers tend to be very verbose, there are parser generators which provide a terse, domain specific language from which parsers can be generated. I've actually written a (fairly bad) parser generator, but I'm going to go firmly with rolling my own, for a few reasons. First and foremost, I've called this article "compiler from scratch" so I've sort of already boxed myself into that decision… In seriousness, I think parsers can be interesting, even if tedious. Parser generators also tend to suck — it's easy to hit a wall with them, and often the error reporting and recovery is subpar and not very customizable. Finally, despite parser generators purporting to simplify things, they also add a lot of complexity in learning the tool and integrating it into the build.

Parsing is generally regarded as the "easy" part when writing a compiler, but it is actually quite a rabbit hole with a lot of literature. Parsers can usually be classified as either top-down or bottom-up. I won't go into depth, but anything written by hand will usually be top-down, and most popular programming languages use a hand-written top-down parser since that enables the best error messages⁴.

The phrase "hand-written top-down" parser is essentially synonymous with recursive descent parsing, and indeed I'll be writing a recursive descent parser. Despite sounding a scary, recursive descent just means each piece of syntax has its own function, and these functions call eachother (maybe recursively) to recognise more complex syntax.

Recursive descent parsing usually refers to hand-written implementations, but there is also a parallel for parser generators — parsing expression grammars, or PEGs for short. The original paper is a good read if you're curious. PEGs correspond pretty directly to recursive descent parsing, so its useful background to have. You can even apply optimizations and such to parsing expression grammars; maybe one-day I'll try to write another optimizing compiler for PEGs… But, I digress. We're going the manual route.

To parse, we must lex

A lexer, or tokenizer, takes the source code and breaks it up into pieces, such as keywords, numbers and symbols. These pieces are called tokens. A parser could operate directly on characters, so this distinction is not strictly necessary, but, in practice, it makes some things easier (e.g. keywords), and can significantly improve performance.

For the moment, we will define the following tokens: CommaToken, SemicolonToken, EqualsToken, IdentToken, IntLiteralToken, KwIntToken, KwReturnToken, OpenBraceToken, CloseBraceToken, OpenParenToken, CloseParenToken, and the special EofToken. Whenever we reach the end of the source code, we return EofToken, this saves us from having to return null or Optional, which would make the code more complicated.

Our lexer has a fairly simple interface and structure:

java
An outline of the lexer

 1public class Lexer {
 2
 3    // The characters that are ignored by the lexer
 4    private static final Set<Character> WHITESPACE =
 5        Set.of(' ', '\t', '\n', '\r');
 6
 7    // Tokens that always match up to the same string
 8    private static final Map<String, Token> SYMBOLS =
 9        Map.of(
10            "(", new OpenParenToken(),
11            ")", new CloseParenToken(),
12            "{", new OpenBraceToken(),
13            "}", new CloseBraceToken(),
14            ";", new SemicolonToken(),
15            ",", new CommaToken(),
16            "=", new EqualsToken());
17
18    // Keywords that should be used instead of an identifier
19    // where applicable
20    private static final Map<String, Token> KEYWORDS =
21        Map.of(
22            "int", new KwIntToken(),
23            "return", new KwReturnToken());
24
25    // The entire input source code
26    private final String input;
27
28    // The index into the source code we are currently up to
29    private int position = 0;
30
31    public Lexer(String input) {
32        this.input = input;
33    }
34
35    // Creates a copy of the lexer so we can rewind
36    public Lexer(Lexer other) {
37        this.input = other.input;
38        this.position = other.position;
39    }
40
41    public Token peek() throws LexException {
42        return new Lexer(this).next();
43    }
44
45    public Token next() throws LexException {
46        // ...
47    }
48
49    private Token lexSymbol() {
50        // ...
51    }
52
53    private Token lexWord() {
54        // ..
55    }
56
57    private Token lexIntLiteral() {
58        // ..
59    }
60
61    private void skipWhitespace() {
62        // ..
63    }
64}

The next function does the bulk of the work by delegating to the lex* functions. The lex* functions either return a token and move position forward, or they return null and leave position as is. This otherwise isn't too tricky, you just have to make sure you lex int as a keyword, and make sure you don't lex integer as int followed by teger.

Writing the parser

Back to writing the parser. As I mentioned before, recursive descent parsers can often be expressed as parsing expression grammars, which are like an abstract representation of the parsing code, and also have a lot in common with regular expressions. Here is the parsing expression grammar we will be using:

\begin{align*} \text{Program} &\leftarrow \text{Function}^* \\ \text{Function} &\leftarrow \text{Type}\ \text{Ident}\ ``("\ \text{VarDecl}\ (``,"\ \text{VarDecl})^*\ ``)"\ \text{Block} \\ \text{Block} &\leftarrow ``\{"\ \text{Stmt}^*\ ``\}" \\ \text{VarDecl} &\leftarrow \text{Type}\ \text{Ident} \\ \text{Type} &\leftarrow \text{IntType} \\ \text{IntType} &\leftarrow ``int" \\ \text{Stmt} &\leftarrow \text{ExprStmt}\ /\ \text{VarDeclStmt}\ /\ \text{ReturnStmt} \\ \text{ExprStmt} &\leftarrow \text{Expr}\ ``;" \\ \text{VarDeclStmt} &\leftarrow \text{VarDecl}\ ``;" \\ \text{ReturnStmt} &\leftarrow ``return"\ \text{Expr}\ ``;" \\ \text{Expr} &\leftarrow \text{AssignmentExpr}\ /\ \text{IntLiteralExpr} /\ \text{VarExpr} \\ \text{AssignmentExpr} &\leftarrow \text{Ident}\ ``="\ \text{Expr} \\ \text{IntLiteralExpr} &\leftarrow int\text{-}literal \\ \text{VarExpr} &\leftarrow \text{Ident} \\ \text{Ident} &\leftarrow ident \end{align*}

The lefthand side of the $\leftarrow$ 's define a piece of syntax, or a rule, and the righthand side breaks it down into smaller pieces of syntax. The $^*$ means repeated zero or more times, and the $/$ means either the left or right can be chosen. One important quirk, is that we have to list $\text{AssignmentExpr}$ before $\text{VarExpr}$ . Imagine our input was $``x = 42"$ , and we were deciding which expression to match. If we naively tried $\text{VarExpr}$ , we would see that it does indeed match $``x"$ , and then whatever rule is next would fail to parse the remaining $``= 42"$ . PEGs and recursive descent parsers are not smart enough to handle this automatically, so we must keep track of this ourselves. This might be a bit surprising if you are used to the $|$ operator from regular expressions.

Our parser class looks a bit like this. I've omitted most of the code, since it essentially just rehashes the same few patterns, but I've included the tricky bits and a few illustrative examples:

java
Outline of the parser class

  1public class Parser {
  2
  3    private Lexer lexer;
  4
  5    public Parser(Lexer lexer) {
  6        this.lexer = lexer;
  7    }
  8
  9    // Each piece of syntax has its own function, and most of
 10    // those functions conform to this interface
 11    private interface ParseFunction<T extends Ast> {
 12
 13        // This attempts to parse out a piece of syntax
 14        // by advancing the lexer. If there is an error
 15        // (e.g. missing semicolon), then we throw an
 16        // exception. Given an error, we might like to
 17        // try to reset/rewind the lexer and try parsing
 18        // something else, but this method is not
 19        // responsible for this resetting/rewinding
 20        T parse(Parser parser) throws ParseException;
 21    }
 22
 23    public static ProgramAst parse(String syntax)
 24            throws ParseException {
 25        return new Parser(new Lexer(syntax)).parse();
 26    }
 27
 28    public ProgramAst parse() throws ParseException {
 29        var ast = parseProgram();
 30        // The `parent` field doesn't get assigned in the
 31        // constructor, so we must fix it up here
 32        ast.assignParents();
 33        return ast;
 34    }
 35
 36    // This is what we will use for parsing individual tokens
 37    private <T extends Token> T expect(Class<T> cls)
 38            throws ParseException {
 39        var next = lexer.peek();
 40
 41        if (!cls.isInstance(next)) {
 42            throw new ParseException(
 43                "expected " + cls + " but found " + next);
 44        }
 45
 46        return (T) lexer.next();
 47    }
 48
 49    // Sometimes we don't want to catch a parsing exception
 50    // and try something else. For example, if we cant parse
 51    // an assignment, maybe we want to parse an integer
 52    // literal instead? This handles catching the exception
 53    // and putting the lexer back in its original state. We
 54    // use this to implement "A / B" rules from the PEG grammar
 55    private <T extends Ast> T attempt(ParseFunction<T> f) {
 56        var copy = new Parser(new Lexer(lexer));
 57
 58        try {
 59            var result = f.parse(copy);
 60            lexer = copy.lexer;
 61            return result;
 62        } catch (ParseException e) {
 63            return null;
 64        }
 65    }
 66
 67    public ProgramAst parseProgram() throws ParseException {
 68        // ...
 69    }
 70
 71    private FunctionAst parseFunction() throws ParseException {
 72        var returnTy = parseTy();
 73        var name = parseIdent();
 74        var params = new ArrayList<VarDeclAst>();
 75
 76        expect(OpenParenToken.class);
 77
 78        // This is how we parse "VarDecl (`,` VarDecl)*"
 79        while (!(lexer.peek() instanceof CloseParenToken)) {
 80            if (!params.isEmpty()) {
 81                expect(CommaToken.class);
 82            }
 83
 84            params.add(parseVarDecl());
 85        }
 86
 87        expect(CloseParenToken.class);
 88
 89        var body = parseBlock();
 90
 91        return new FunctionAst(name, params, returnTy, body);
 92    }
 93
 94    private VarDeclAst parseVarDecl() throws ParseException {
 95        // ...
 96    }
 97
 98    private TyAst parseTy() throws ParseException {
 99        // ...
100    }
101
102    private TyAst parseKwTy() throws ParseException {
103        // ...
104    }
105
106    private BlockAst parseBlock() throws ParseException {
107        // ...
108    }
109
110    private StmtAst parseStmt() throws ParseException {
111        // ...
112    }
113
114    private VarDeclStmtAst parseVarDeclStmt()
115            throws ParseException {
116        // ...
117    }
118
119    private ReturnStmtAst parseReturnStmt()
120            throws ParseException {
121        // ...
122    }
123
124    private ExprStmtAst parseExprStmt() throws ParseException {
125        // ...
126    }
127
128    private ExprAst parseExpr() throws ParseException {
129        // This is how we parse
130        // "AssignmentExpr / VarExpr / IntLiteralExpr"
131
132        AssignmentExprAst assignment =
133            attempt(Parser::parseAssignmentExpr);
134        if (assignment != null) return assignment;
135
136        VarExprAst variable =
137            attempt(Parser::parseVariableExpr);
138        if (variable != null) return variable;
139
140        IntLiteralExprAst intLiteral =
141            attempt(Parser::parseIntLiteralExpr);
142        if (intLiteral != null) return intLiteral;
143
144        throw new ParseException("no valid expression");
145    }
146
147    private VarExprAst parseVariableExpr()
148            throws ParseException {
149        // ...
150    }
151
152    private AssignmentExprAst parseAssignmentExpr()
153            throws ParseException {
154        // ...
155    }
156
157    private IntLiteralExprAst parseIntLiteralExpr()
158            throws ParseException {
159        // ..
160    }
161
162    private IdentAst parseIdent() throws ParseException {
163        var token = expect(IdentToken.class);
164        return new IdentAst(token.getContent());
165    }
166}

Neat! This works well for parsing correct programs into ASTs. One downside is that this does very poorly in the presence of errors. For example, trying to parse the statement x = ; will unhelpfully report "no valid statement", and won't tell us where the error occured. Also, if we have multiple errors, only the first will be reported. The ability to soldier on in the presence of errors is known as error recovery. Implementing good error recovery also tends to give us the tools to implement good error reporting, too.

The key to this is having a way to quickly determine which path to go down when chosing between multiple possible branches (e.g. is this an assignment or a variable reference)? There are several solutions. One is to make a final decision at each branch by looking at what tokens are ahead. We can also have each rule "commit" after it knows it is definitely the correct branch. For example, after seeing the = in an assignment, we know we don't need to backtrack. This paper gives a good overview of error-handling techniques for PEGs. The idea of cut-points corresponds loosely to what I'm talking about. Anyway, implementing this will be left to a later date.

Implementing analysis

We can now happily produce abstract syntax trees from source code. So, for example:

1int main(int a, int a) {
2    return b;
3}

Gets correctly turned into this tree:

Although this parsed successfully, there is still an error: int a is defined twice, and b is defined nowhere! We have validated the syntax of the program, but we still need to validate the sematics. This is best implemented as several passes that run over the AST and detects all errors of a certain class.

The visitor API

Walking over ASTs can be painful since we usually only care about a few nodes. For example, when we are reporting duplicate identifiers, we don't care about integer literals. To simplify this logic, we use the visitor pattern. Our visitor class will look like this:

java
Analysis passes will implement this interface

 1public interface AstVisitor {
 2
 3    private void visitChildren(Ast node)
 4            throws AnalysisException {
 5        for (var child : node.getChildren()) {
 6            child.accept(this);
 7        }
 8    }
 9
10    default void visitProgram(ProgramAst ast)
11            throws AnalysisException {
12        visitChildren(ast);
13    }
14
15    default void visitFunction(FunctionAst ast)
16            throws AnalysisException {
17        visitChildren(ast);
18    }
19
20    default void visitVarDecl(VarDeclAst ast)
21            throws AnalysisException {
22        visitChildren(ast);
23    }
24
25    // Etc...
26}

To supplement this, we will have a new method on Ast:

java

1public abstract sealed class Ast
2        permits /* ... */ {
3
4    // ...
5
6    public abstract void accept(AstVisitor visitor)
7        throws AnalysisException;
8}

Implementations of accept just delegate to the appropriate visit* method, so ProgramAst::accept calls AstVisitor::visitProgram, and so on. The default implemention of all the visit* methods just visits all the children in order. Generally we will override some of these methods, but still call the super-implementation to continue visiting the rest of the AST below the current node.

Collecting names

Our first analysis pass will scan through all function and variable declarations, and make sure there are no duplicates. We will also take the opportunity to store away all these declarations for future passes. This pass does not modify the AST in any way, it just validates the AST and stores information for later. The code looks a bit like this:

java

 1public class CollectNames implements AstVisitor {
 2
 3    private Map<IdentAst, FunctionAst> functions
 4        = new HashMap<>();
 5
 6    private Map<NameInFunction, VarDeclAst> variables
 7        = new HashMap<>();
 8
 9    private record NameInFunction(IdentAst function,
10        IdentAst variable) {}
11
12    public FunctionAst getFunction(IdentAst ident) {
13        return functions.get(ident);
14    }
15
16    public VarDeclAst getVariable(IdentAst function,
17            IdentAst ident) {
18        return variables.get(new NameInFunction(function, ident));
19    }
20
21    @Override
22    public void visitFunction(FunctionAst ast)
23            throws AnalysisException {
24        var functionName = ast.getName();
25
26        if (functions.put(functionName, ast) != null) {
27            throw new AnalysisException(
28                "function " + functionName + " already exists");
29        }
30
31        AstVisitor.super.visitFunction(ast);
32    }
33
34    @Override
35    public void visitVarDecl(VarDeclAst ast)
36            throws AnalysisException {
37        var function = ast.findAncestor(FunctionAst.class);
38        var nameInFunction = new NameInFunction(
39            function.getName(), ast.getName());
40
41        if (variables.put(nameInFunction, ast) != null) {
42            throw new AnalysisException(
43                "variable " + ast.getName() + " already exists");
44        }
45
46        AstVisitor.super.visitVarDecl(ast);
47    }
48}

Resolving names

We will implement one more pass, that makes sure all referenced names are defined. This will depend on the results from the CollectNames pass, so will take it as a constructor parameter. Since it is helpful for the usage of a variable or function to link directly to the corresponding declaration, we'll use this pass to insert those links into the AST. Most of our analysis passes won't significantly edit the AST, but some will fill in missing information, like these declaration links. Here is the code for name resolution:

java

 1public class ResolveNames implements AstVisitor {
 2
 3    private final CollectNames collected;
 4
 5    public ResolveNames(CollectNames collected) {
 6        this.collected = collected;
 7    }
 8
 9    @Override
10    public void visitAssignmentExpr(AssignmentExprAst ast)
11            throws AnalysisException {
12        var function = ast.findAncestor(FunctionAst.class);
13        var decl = collected.getVariable(
14            function.getName(), ast.getVariable());
15
16        if (decl == null) {
17            throw new AnalysisException(
18                "use of undeclared variable " + ast.getVariable());
19        }
20
21        ast.setDecl(decl);
22        AstVisitor.super.visitAssignmentExpr(ast);
23    }
24
25    @Override
26    public void visitVariableExpr(VarExprAst ast)
27            throws AnalysisException {
28        var function = ast.findAncestor(FunctionAst.class);
29        var decl = collected.getVariable(
30            function.getName(), ast.getName());
31
32        if (decl == null) {
33            throw new AnalysisException(
34                "use of undeclared variable " + ast.getName());
35        }
36
37        ast.setDecl(decl);
38        AstVisitor.super.visitVariableExpr(ast);
39    }
40}

A trivial interpreter

I promised to write a compiler — and I will — but I'm going to mostly leave it here for now. But before I go, I'm going to write a very basic interpreter for our ASTs. I'm throwing this in mostly because its just so easy, but its also super helpful for debugging. There are of course fancy techniques for interpreters, but I'm just going to write a very simple AST-walking interpreter, which works exactly how it sounds. The code is pretty simple, although I've omitted some error-handling:

java

 1public class Interpreter {
 2
 3    private final ProgramAst ast;
 4
 5    // When a function is called, a stack-frame holds all the local
 6    // variables. We store the stack frames as maps in a stack
 7    private Deque<Map<IdentAst, Value>> stack = new ArrayDeque<>();
 8
 9    public Interpreter(ProgramAst ast) {
10      this.ast = ast;
11    }
12
13    private Value runFunction(FunctionAst ast, List<Value> args)
14            throws InterpretException {
15        var variables = new HashMap<IdentAst, Value>();
16        stack.push(variables);
17
18        // Copy over any paramters
19        for (var i = 0; i < args.size(); i++) {
20            var param = ast.getParams().get(i);
21            var value = args.get(i);
22            variables.put(param.getName(), value);
23        }
24
25        try {
26            return runBlock(ast.getBody());
27        } finally {
28            stack.pop();
29        }
30    }
31
32    private Value runBlock(BlockAst ast)
33            throws InterpretException {
34        for (var stmt : ast.getStmts()) {
35            switch (stmt) {
36                case VarDeclStmtAst ignored -> {}
37                case ReturnStmtAst returnStmt -> {
38                    return evalExpr(returnStmt.getValue());
39                }
40                case ExprStmtAst exprStmt ->
41                    evalExpr(exprStmt.getExpr());
42          }
43        }
44
45        throw new InterpretException("function did not return");
46    }
47
48    private Value evalExpr(ExprAst ast) throws InterpretException {
49        return switch (ast) {
50            case VarExprAst variable ->
51                stack.peek().get(variable.getName());
52            case IntLiteralExprAst intLiteral ->
53                new IntValue(intLiteral.getValue());
54            case AssignmentExprAst assignment -> {
55                var value = evalExpr(assignment.getValue());
56                stack.peek().put(assignment.getVariable(), value);
57                yield value;
58            }
59        };
60    }
61}

This works pretty well. I won't show the code here, but I've also written somes tests and a very basic CLI interface to make the compiler nicer to work with. So for example we can do:

Command line output from interpreting program

The output was printed in the bottom left

Check out the code

If you want to play around, or peek into the details, you can check out the code for this chapter here, or the master branch here.

Very cool slides from Graydon Hoare, creator of Rust. Well worth a look. ↩
LLVM (and Clang) use a representation based on instructions, basic blocks, and single static assignment form. This is a terrific resource introducing LLVM IR. ↩
The V8 JavaScript engine is based on a sea of nodes representation. There is a good blog post explaining it more. ↩
Despite this the pervasive Tree-sitter project, which appears in many IDEs and powers GitHub's syntax highlighting, is a bottom-up parser. Tree sitter is incremental, so small edits to the source can be handled efficiently, but there are also simple incremental methods for PEG parsing. ↩