draft · 25 Jun 2026

Azul: a Lox interpreter that pairs with itself

Building a programming language interpreter in Go that, when your code breaks, hands the error to an LLM agent and asks it to fix the source — then reruns. A study in feedback loops, interpreter design, and self-pairing programs.

The premise

Most interpreters do three things: read your code, run your code, tell you what went wrong. Then they stop. You read the error, fix the file, run it again. The loop between machine and programmer is held together by a human staring at a terminal.

Azul is a Lox interpreter written in Go that closes this loop. When it encounters an error — a bad token, a missing semicolon, a type mismatch at runtime — it packages the error into a structured payload, hands the full program context to an LLM agent, and asks it to patch the source. Then it reruns the patched file. If errors remain, it tries again. The system pairs with itself: one half runs the code, the other half reads the diagnosis and revises.

The name comes from the Portuguese word for blue. The project follows Robert Nystrom’s Crafting Interpreters, porting both the tree-walk interpreter and the bytecode VM into a single Go codebase. The lab uses notes like this as working records, so project status stays visible instead of being polished away. But the interpreter is only half the experiment. The other half is the feedback channel between the interpreter’s error machinery and an LLM that reasons about what went wrong.

How an interpreter reads your code

Before Azul can fix anything, it needs to understand the program. This happens in stages, each one transforming the source into a representation closer to meaning.

Scanning

The interpreter begins with a flat stream of characters. The scanner walks this stream and chunks it into tokens — the smallest meaningful units of the language. Some are single characters: (, ), +, ;. Others span several characters: string literals like "hello", numbers like 3.14, identifiers like count. Keywords like var, print, if look like identifiers but carry special meaning.

The scanner doesn’t care about what the program means. It only cares about what the characters are. Its job is to turn a river of bytes into a sequence of labeled pieces that the next stage can reason about.

Parsing

The parser takes that flat sequence of tokens and builds a tree that mirrors the nested structure of the grammar. The expression 1 + 2 * 3 isn’t just five tokens in a row — it has structure. Multiplication binds tighter than addition, so the tree must reflect that 2 * 3 is a single subtree, and addition sits above it.

This tree is the abstract syntax tree, or AST. Each node represents an operation — a binary expression, a variable declaration, a function call — and its children are the operands. The parser’s recursive descent follows the grammar rules: it tries to match expression, then term, then factor, each rule calling the next. What comes out is a tree that a machine can walk to evaluate the program.

Static analysis

Before running anything, the interpreter does one more pass. Binding or resolution: for every identifier in the program, it finds where that name was declared and wires the two together. This is where scope comes into play — the region of source code where a name refers to a particular declaration.

If Lox were statically typed, this is also where type checking would happen. Once you know where a and b are declared, you know their types. If they can’t be added together, you report the error now, before the program ever runs.

The results of analysis get stored. Sometimes as extra fields on the AST nodes themselves. Sometimes in a symbol table — a lookup structure keyed by identifier names, whose values describe what each name refers to. The tree-walk interpreter uses both.

Two interpreters, one codebase

Azul implements both halves of Crafting Interpreters in Go.

Part I: the tree-walk interpreter. This is the direct path. The interpreter walks the AST and evaluates each node on the spot. A binary addition node evaluates its left child, evaluates its right child, adds the results. A variable declaration stores a value in an environment table. A function call creates a new scope, binds the arguments, and evaluates the body.

The tree-walk interpreter covers: scanning, recursive-descent parsing, AST representation, evaluation, scoping, functions, closures, and classes.

Part II: the bytecode VM. Instead of walking the tree, a compiler translates the AST into a flat sequence of bytecode instructions — OP_ADD, OP_LOAD, OP_RETURN — stored in a chunk alongside a constant pool. A stack-based virtual machine then dispatches those instructions in a tight loop. No tree traversal, no recursive function calls for evaluation. Just a stack, an instruction pointer, and a dispatch table.

The bytecode path covers: chunks, the stack-based VM, value representation, a compiler from AST to bytecode, hash tables for globals and methods, closures with captured upvalues, and garbage collection.

Go, not Java, not C

Crafting Interpreters implements the tree-walk interpreter in Java and the bytecode VM in C. Azul uses neither. Java leans on inheritance hierarchies and exception-based error propagation. C uses manual memory management, pointer arithmetic, and unions. Neither maps cleanly to idiomatic Go.

The translation choices matter because they determine whether the code teaches Go or merely compiles in it. A few examples:

Generics for type safety. The VM stack is a Stack[Value], not a []any. The call-frame stack is Stack[CallFrame]. The scoping environment is a generic Table[V] — the tree-walk interpreter uses Table[any], the VM uses Table[Value], and both share the same scoping logic. The visitor pattern is parameterized: Visitor[R], so the interpreter implements Visitor[Value] and the AST printer implements Visitor[string] with no type assertions at call sites.

A sealed value interface. Lox values are represented as a sealed interface with a private marker method. The concrete types — Number, Boolean, Nil, Obj — are the only implementations. Type switches over Value are exhaustive-checked by a linter, giving compile-time guarantees when new value kinds are added.

Iterators. The scanner exposes tokens as an iter.Seq[Token] using Go 1.23’s range-over-func feature. The parser consumes tokens with for tok := range scanner.Tokens() — no intermediate slice, no manual index management.

Panic/recover at the boundary. The tree-walk interpreter uses panic with a named RuntimeError struct to propagate errors, caught at the top-level interpret() call. This keeps the code close to the book’s exception-based structure while staying inside Go’s idioms. The parser, by contrast, uses mo.Result[Expr] with FlatMap chaining — functional error propagation that keeps recursive descent readable.

The error payload

This is where Azul diverges from the book. Every error the interpreter produces carries a phase tag:

scan — a bad character, an unterminated string, something the scanner couldn’t chunk into a token
parse — a missing semicolon, an unmatched brace, a token where the grammar didn’t expect one
runtime — a type mismatch, an undefined variable, a division by zero that only surfaces during execution

The phase matters because different classes of error require different reasoning. A scan error means the source contains characters the language doesn’t recognize. A parse error means the tokens are valid but arranged in a way the grammar can’t accept. A runtime error means the program is syntactically correct but semantically broken. The agent needs this distinction to apply the right kind of fix.

The full payload sent to the agent includes the source path, the complete source text, and every error with its line number, column, message, offending token, and phase.⊕ Phase tags keep the repair agent from treating every failure as one undifferentiated compiler error. This is enough context for the agent to read the program, understand where it broke, and reason about what the author likely intended.

The fix loop

When you run azul run file.lox --fix, the interpreter doesn’t stop at reporting errors. It enters a loop:

Read the file, run it through the full pipeline — scan, parse, resolve, interpret
If no errors, print the output and exit
If errors, print them to stderr and build the error payload
Hand the payload to the LLM agent
The agent calls read_source to inspect the code around each error
The agent calls apply_patch with the corrected source and an explanation
Write the patched file to disk, rerun from step 1
If errors persist after three attempts, give up

The agent has exactly two tools. read_source reads a line range from the file with line numbers prepended, so the agent can inspect context around an error before deciding on a fix. apply_patch receives the complete corrected source as a string and writes it to disk. Full replacement rather than diffs — Lox programs are small, and diffs require the agent to reason about line offsets, which introduces failure modes that don’t help anyone.

The system prompt instructs the agent to read before fixing, find root causes rather than symptoms, and never add features or refactor — only fix what is broken. If no safe fix is determinable, it returns the original source unchanged with an explanation of why.

Self-pairing, concretely

The original premise of Azul was about feedback loops and self-pairing programs. The interpreter makes this concrete.

A self-pairing program is not one model pretending to be two people. It is a loop where separate passes maintain different obligations. The interpreter’s obligation is accuracy: scan correctly, parse correctly, evaluate correctly, report errors with enough structure that they’re actionable. The agent’s obligation is repair: read the diagnosis, understand the intent, produce a fix that eliminates the error without introducing new ones.

The feedback channel between them is the error payload — a structured object with phase tags, line numbers, and the full source. Not a log message. Not a stack trace pasted into a chat window. A first-class object designed to carry exactly the information the next pass needs to act.

This is what makes the loop a research object rather than a product feature. You can measure it:

Does the agent’s fix eliminate the reported error?
Does the fix introduce new errors?
Does the explanation match the actual diff?
How many attempts does it take to converge?
Which error phases are hardest to fix — scan, parse, or runtime?

What remains

The tree-walk interpreter is implemented. The bytecode compiler and VM are next — a tighter, faster execution path that introduces its own class of errors for the agent to reason about. Beyond that, the natural evolution is a RunContext that accumulates everything the interpreter knows during a run: the token stream, the AST, partial output, and the ordered error list. This gives the agent richer context than a bare error payload — not just what went wrong, but how far the program got before it broke.

The longer arc is instrumentation. Instead of waiting for the interpreter to crash, emit structured events during execution — variable assigned, function entered, return value produced, error hit. The agent receives these events and builds a live model of program state. This is closer to a debugger with LLM reasoning than a post-mortem fix loop, and it’s the direction Azul is heading.

For now, the loop is simple and the feedback is legible. That’s the point. An interpreter that knows how to describe its own failures. An agent that knows how to read those descriptions. A file that gets rewritten and rerun until it works, or until the system admits it doesn’t know how to help.