finna: Multi-Model Debate, Spec, and Implement

Feb 10, 2026

5 min read

Lok's spec command was useful but felt too coupled to the rest of the tool. I wanted something standalone: give it an idea, get specs and code. No configuration files, no backend setup, just a single binary that orchestrates the models I already have installed.

That's finna.

The Problem with "Just Build It"

When you ask an LLM to build something, it starts coding immediately. Maybe it picks a good architecture, maybe it doesn't. By the time you see the output, you're committed to whatever approach it chose. If the foundation is wrong, you're either refactoring generated code or starting over.

The fix is to separate planning from execution. But not just any planning. A plan that multiple models have debated and agreed on, written down in files you can review before any code gets generated.

The Pipeline

finna runs four stages:

idea → debate → roadmap → spec → implement

Debate: Claude, Codex, and Gemini independently analyze the idea
Consensus: Claude synthesizes the proposals into a unified approach
Roadmap: Break the consensus into ordered, dependency-aware steps
Spec: Write detailed implementation specs for each step
Implement: Models propose edits, Claude synthesizes, changes applied

Each stage reads from the previous stage's output. Everything lands in .finna/ so you can review, edit, or re-run individual stages.

Usage

# Run all stages
finna "Add JWT authentication to the API"

# Or run stages separately
finna debate "Add JWT authentication"    # debate → roadmap
finna spec                                # roadmap → specs
finna implement                           # specs → code

# Target specific steps
finna spec --step auth-middleware
finna implement --step auth-middleware

A Real Example: TOML Parser

I ran finna on building a TOML parser from scratch:

finna debate "write a toml parser in rust from scratch, no dependencies"

Three models debated the architecture. They converged on a lexer-first approach with recursive descent parsing. The roadmap came out as 30 steps with proper dependency ordering:

.finna/
├── consensus.json
├── roadmap.arf
└── specs/
    ├── 01-project-scaffold/spec.arf
    ├── 02-error-types/spec.arf
    ├── 03-token-types/spec.arf
    ...
    ├── 16-lexer-integration/spec.arf
    ├── 17-parser-core/spec.arf
    ...
    └── 30-edge-cases/spec.arf

The roadmap groups work into phases: scaffold and types (1-5), lexer (6-16), parser (17-25), public API and tests (26-30). Each step declares its dependencies so they can't run out of order.

Running finna spec generates detailed ARF specs for each step. Here's the spec for lexing basic strings:

order = 10
what = "Implement lexing for double-quoted basic strings with full escape support"
why = "Basic strings are TOML's primary string format. The lexer must process
  escapes at lex time and detect unterminated strings with precise error
  locations."
how = """
1. Add lex_basic_string() method
2. Main loop: consume until closing quote or error
3. Implement lex_escape_sequence() for \t, \n, \r, \\, \", \uXXXX, \UXXXXXXXX
4. Implement lex_unicode_escape() with validation
5. Wire into lex_token() dispatch
6. Add unit tests for all escape sequences and error cases
"""
backup = "Defer escape processing to post-lex pass if implementation is error-prone"

[context]
files = ["src/lexer.rs", "src/error.rs"]
dependencies = []

The specs include test cases. The basic strings spec lists 25+ scenarios: simple strings, escape sequences, unicode handling, error conditions. The models know what to test because they debated it during planning.

The 30 specs totaled 2.1k lines. That's not wasted tokens. It's a contract you can review before any code exists.

The ARF Format

Specs use TOML with a standard structure:

order = 1
what = "one sentence description"
why = "context and motivation"
how = """
Step-by-step implementation plan
"""
backup = "fallback approach if primary fails"

[context]
files = ["paths/to/files"]
dependencies = ["step names this depends on"]

The format is simple on purpose. No special tooling needed to read or edit. Any text editor works. The structure enforces that every step has a rationale, a plan, and a fallback.

Why Separate Stages

The stages are separate because you need intervention points.

After debate, review the roadmap. Does the architecture make sense? Are the steps in the right order? Edit .finna/roadmap.arf if not.

After spec, review the specs. Is the implementation plan correct? Are the test cases comprehensive? Edit the spec files or re-run finna spec --step X.

After implement, review the changes. Did the edits apply cleanly? Is the code what you expected? The specs told you what would happen; now verify it did.

If you run finna "idea" without the subcommands, it runs all stages in sequence. That's fine for exploration. But for real work, you probably want the review gates.

Multi-Model Consensus

The debate phase isn't just asking three models and picking one. All three responses get synthesized:

Claude, Codex, and Gemini each propose an approach in parallel
Claude sees all three proposals and synthesizes consensus
Disagreements become explicit tradeoffs in the final plan

One model might over-engineer authentication. Another might skip edge cases. The synthesis catches both failure modes. You get architecture that multiple models have pressure-tested.

Implementation

finna is ~500 lines of Rust. It shells out to claude, codex, and npx @google/gemini-cli for the actual model calls. No API keys in the binary, no config files. If you have the CLIs installed, finna works.

The implementation phase runs models in parallel for each step, synthesizes their edit proposals, and applies the changes. Edits are JSON with path, old, and new fields. Simple find-and-replace, no AST manipulation.

#[derive(Debug, serde::Deserialize)]
struct Edit {
    path: String,
    old: String,
    new: String,
}

If an edit can't find the target text, it warns and continues. If the file doesn't exist, it creates it. The implementation is deliberately simple because the specs already contain the complexity.

What finna Is Not

finna is not a replacement for writing code. It's a planning tool that happens to also generate code. The value is in the specs, not the implementation.

If the generated code is wrong, you fix the spec and re-run. If the architecture is wrong, you fix the roadmap and re-spec. The code is a side effect of getting the plan right.

finna is also not trying to be general-purpose. It solves one problem: turn an idea into a structured plan with implementation. No plugins, no extensibility, no configuration. One tool, one job.

Getting Started

# Clone and build
git clone https://github.com/ducks/finna
cd finna
nix-shell
cargo build --release

# Run on your idea
./target/release/finna "your idea here"

Requires claude, codex, and npx @google/gemini-cli to be installed and authenticated. If a model fails, finna continues with the others.

The source is at github.com/ducks/finna.

The test project (TOML parser specs) is at github.com/ducks/finna-toml-parser.

Building a JSON Parser with Multi-LLM Orchestration (Part 1)

Feb 07, 2026

3 min read

Using lok to orchestrate four LLMs debating design decisions, then synthesizing specs for a Rust JSON parser. The debate phase surfaced edge cases no single model would have caught.

#ai

#tools

#rust

#dev
Lok Part 5: Multi-Agent Planning with lok spec

Feb 06, 2026

6 min read

Lok gains a spec command that turns task descriptions into structured implementation plans. Multiple LLMs propose, debate, and converge on a roadmap before any code gets written.

#ai

#tools

#rust

#dev
ARF: Structured Reasoning for AI Agents

Feb 02, 2026

4 min read

Moving beyond chat prompts to structured agent communication. Why unstructured data lets LLMs run wild, and how ARF enforces what/why/how before acting.

#rust

#ai

#oss
I Built a Robot to Help Me Understand People

Jan 31, 2026

5 min read

I'm good with machines but bad with people. So I built a tool that reads what my coworkers write and helps me understand them better.

#ai

#tools

#work
Lok Part 4: The Self-Healing Loop

Jan 28, 2026

3 min read

Lok gains agentic workflows, fixes its own bugs, and finds a real bug in Discourse that I just pushed upstream.

#ai

#tools

#rust

#dev

finna: Multi-Model Debate, Spec, and Implement

The Problem with "Just Build It"

The Pipeline

Usage

A Real Example: TOML Parser

The ARF Format

Why Separate Stages

Multi-Model Consensus

Implementation

What finna Is Not

Getting Started

Related Posts

Building a JSON Parser with Multi-LLM Orchestration (Part 1)

Lok Part 5: Multi-Agent Planning with lok spec

ARF: Structured Reasoning for AI Agents

I Built a Robot to Help Me Understand People

Lok Part 4: The Self-Healing Loop