Building a JSON Parser with Multi-LLM Orchestration (Part 1)

Feb 07, 2026

3 min read

I've been building lok, a multi-LLM orchestration tool, and I wanted to put it through its paces on a real project. What better than a JSON parser? It's a classic learning project with enough nuance to surface interesting design decisions.

Here's the premise: instead of just diving into code, what if I let multiple LLMs debate the design first? Then synthesize their consensus into specs. Then have them collaboratively implement it.

The Setup

I started with a simple question:

lok debate "We want to write a JSON parser from scratch as a learning project.
Debate: What language should we use? What features should it support?
What should the architecture look like?"

Four models participated: Claude, Codex (GPT-5.2), Gemini, and Qwen 3 Coder (running locally via Ollama).

Round 1: Surprising Agreement

All four converged on Rust. The reasoning varied:

Claude focused on the learning angle: "Forces you to think about ownership and memory layout. You'll learn more writing a parser in Rust than in Python/JS where you can be sloppy."

Codex got practical: "Ownership semantics force you to think about buffer management. Result<T, E> makes error handling explicit."

Gemini went for performance: "Zero-cost abstractions mean clean code compiles to efficient machine code."

Qwen hit safety: "Enums with exhaustive matching prevent invalid parser states."

On architecture, unanimous: lexer + recursive descent parser. No surprises there.

Round 2: The Interesting Bits

Numbers sparked actual debate. The naive approach stores JSON numbers as f64:

Number(f64)  // Simple but WRONG

Codex caught it: "Treating numbers as f64 is not spec-correct. JSON allows arbitrary precision. The safest approach is storing the original string."

Number(&'a str)  // Preserves original, validates grammar

This is why multi-model debate works. I might have defaulted to f64 and hit precision bugs later.

Another good catch from Gemini: recursive descent needs depth limits. Without them, an adversarial input like [[[[[[[[... blows the stack. Simple fix, easy to forget.

The Consensus

After three rounds, the models settled on:

Language: Rust
Architecture: Lexer (iterator-based) + Recursive Descent Parser
Number handling: Store as string, not f64
Zero-copy: Use Cow<'a, str> where possible
Safety: Configurable depth limits
Features: RFC 8259 strict first, extensions later

Generating Specs

With design decisions locked, I fed the debate conclusions into lok spec:

lok spec "Build a JSON parser in Rust with these design decisions:
- Lexer + Recursive Descent Parser
- Numbers stored as string, not f64
- Zero-copy with Cow<'a, str>
- Depth limits for safety
- RFC 8259 strict compliance"

This queries multiple backends, synthesizes a consensus roadmap, then breaks each step into subtasks. The output:

.arf/specs/
  roadmap.arf
  01-core_types/     (5 subtasks) - Span, Error, Token, Value
  02-lexer/          (5 subtasks) - Iterator-based tokenizer
  03-parser/         (4 subtasks) - Recursive descent
  04-number_validation/ (2 subtasks) - RFC 8259 number format
  05-error_reporting/   (4 subtasks) - Line/column errors
  06-test_suite/        (3 subtasks) - JSONTestSuite compliance
  07-extension_hooks/   (4 subtasks) - Future comments/trailing commas

Each subtask is an .arf file (Agent Reasoning Format) with structured fields:

order = 1
what = "Core Lexer struct implementing Iterator over tokens"
file = "src/lexer/lexer.rs"
why = "Main lexing logic that transforms input bytes into token stream"
how = """
Struct Lexer<'a> with input: &'a str, pos: usize. Implement
Iterator<Item = Result<Token<'a>, LexError>>. Dispatch on current
byte: punctuation returns immediately, keywords verify literals,
strings handle escapes, numbers capture as slice.
"""

[context]
inputs = "Raw JSON string"
outputs = "Stream of Token results"

These specs become the contract for implementation.

What Multi-Model Debate Surfaces

Edge cases no single model catches. The f64 precision issue came from Codex. The depth limit vulnerability came from Gemini. Each model has blind spots. Claude focused on educational value, Codex on spec correctness, Gemini on performance, Qwen on safety.

Consensus beats any single model. Not because the average is smarter, but because different models catch different things. Three rounds of debate with four models surfaced issues I'd have hit weeks into implementation.

What's Next

Part 2 will cover lok implement, which takes these specs and:

Queries multiple backends in parallel for each subtask
Synthesizes consensus code from the proposals
Writes the file and verifies it compiles
Commits each file with an atomic git commit
Records structured reasoning traces (ARF) alongside the code

The implementation phase is where things get interesting. Backends disagree on details, synthesis has to resolve conflicts, and verification catches when the generated code doesn't actually compile.

Stay tuned.

Lok Part 5: Multi-Agent Planning with lok spec

Feb 06, 2026

6 min read

Lok gains a spec command that turns task descriptions into structured implementation plans. Multiple LLMs propose, debate, and converge on a roadmap before any code gets written.

#ai

#tools

#rust

#dev
ARF: Structured Reasoning for AI Agents

Feb 02, 2026

4 min read

Moving beyond chat prompts to structured agent communication. Why unstructured data lets LLMs run wild, and how ARF enforces what/why/how before acting.

#rust

#ai

#oss
I Built a Robot to Help Me Understand People

Jan 31, 2026

5 min read

I'm good with machines but bad with people. So I built a tool that reads what my coworkers write and helps me understand them better.

#ai

#tools

#work
Lok Part 4: The Self-Healing Loop

Jan 28, 2026

3 min read

Lok gains agentic workflows, fixes its own bugs, and finds a real bug in Discourse that I just pushed upstream.

#ai

#tools

#rust

#dev
Lok Part 3: Dogfooding and Code Review

Jan 27, 2026

4 min read

PR review, codebase explanation, and lok opening 25 GitHub issues on itself. Plus parallel workflows and context detection.

#ai

#tools

#rust

#dev

Building a JSON Parser with Multi-LLM Orchestration (Part 1)

The Setup

Round 1: Surprising Agreement

Round 2: The Interesting Bits

The Consensus

Generating Specs

What Multi-Model Debate Surfaces

What's Next

Related Posts

Lok Part 5: Multi-Agent Planning with lok spec

ARF: Structured Reasoning for AI Agents

I Built a Robot to Help Me Understand People

Lok Part 4: The Self-Healing Loop

Lok Part 3: Dogfooding and Code Review