llm-mux: Why I Rebuilt Lok

Feb 11, 2026

3 min read

Lok hit 317 cargo installs. People were using it. So naturally I rewrote it from scratch.

That's not as chaotic as it sounds. Lok grew organically from "query multiple LLMs" to "run workflows" to "apply edits" to "create GitHub issues." Each feature bolted onto the side. The codebase worked but the abstractions were wrong.

llm-mux is what lok should have been from the start.

What Was Wrong

Lok conflates backends with tasks. When you write backend = "claude" in a workflow step, you're coupling your workflow to a specific model. Want to swap Claude for Gemini? Edit every step.

Lok also has no concept of project context. A Rust project needs cargo test for verification. A Node project needs npm test. In lok, you hardcode these per-workflow. Switch projects, rewrite workflows.

The apply_edits feature was bolted on late. It works but there's no retry loop, no structured verification, no rollback without git-agent.

Roles

llm-mux introduces roles. Instead of hardcoding backends:

# lok style - backend hardcoded
[[steps]]
name = "analyze"
backend = "claude"
prompt = "Find bugs"

You declare what kind of task it is:

# llm-mux style - role-based
[[steps]]
name = "analyze"
type = "query"
role = "analyzer"
prompt = "Find bugs"

Then configure which backends handle which roles:

[roles.analyzer]
description = "Code analysis tasks"
backends = ["claude", "codex"]
execution = "parallel"

[roles.quick]
description = "Fast local checks"
backends = ["qwen"]
execution = "first"

Swapping backends is a config change, not a workflow rewrite. The workflow says "I need analysis." The config decides who does analysis.

Teams

Teams add project context:

[teams.rust]
description = "Rust projects"
detect = ["Cargo.toml"]
verify = "cargo clippy && cargo test"

[teams.rust.roles.analyzer]
backends = ["claude", "codex"]

When llm-mux detects Cargo.toml, it activates the rust team. Verification commands come from the team. Role mappings can be overridden per-team.

Same workflow, different projects, correct tooling.

HTTP Backends

Lok only does CLI. You shell out to claude, codex, ollama run. Each query spawns a process.

llm-mux supports both CLI and HTTP:

[backends.claude]
command = "claude"
args = ["-p", "--"]

[backends.openai]
command = "https://api.openai.com/v1"
model = "gpt-4"
api_key = "${OPENAI_API_KEY}"

[backends.local]
command = "http://localhost:11434/v1"
model = "llama3"

If the command starts with http, it's HTTP. Otherwise CLI. HTTP is faster for high-volume workflows. No process overhead. Proper streaming. Rate limit handling.

Apply and Verify

Lok's apply_edits was a boolean flag. llm-mux has a real system:

[[steps]]
name = "fix"
type = "apply"
source = "steps.analyze"
verify = "cargo test"
verify_retries = 3
verify_retry_prompt = "Fix failed: {{ error }}. Try again."
rollback_on_failure = true

The flow:

Parse edits from source step
Apply edits
Run verification
If it fails and retries remain, show error to LLM, try again
If all retries fail, rollback

The retry loop is the difference. Instead of failing on first bad edit, llm-mux shows the error and asks for a fix. Most failures are small mistakes a second attempt catches.

Rollback uses git stash. No external tooling.

An Example

The rust-audit workflow runs four parallel audits and writes structured docs:

llm-mux run rust-audit
llm-mux run rust-audit outdir=reports/feb-audit

Output:

docs/audit/
├── README.md          # Summary table
├── 01-safety.md       # Memory safety
├── 02-performance.md  # Perf issues
├── 03-errors.md       # Error handling
└── 04-idioms.md       # Patterns

Each audit is its own query step. Each saves to a file. The final step synthesizes a summary. The outdir argument makes it reusable.

What llm-mux Is Not

llm-mux is not a replacement for lok's CLI commands. There's no llm-mux ask or llm-mux hunt. It's purely a workflow runner.

If you want quick one-off queries, use lok. If you want structured multi-step pipelines with proper abstractions, use llm-mux.

I'm keeping both. They solve different problems.

What Doesn't Work Yet

The template system is powerful but error messages are cryptic. A typo in a Jinja variable gives you a wall of minijinja internals.

HTTP backend streaming works but the progress output is ugly. You see chunks arrive but it's not as clean as the CLI backend output.

Team auto-detection is basic. It looks for files but doesn't understand monorepos or nested projects yet.

Getting Started

cargo install llm-mux
llm-mux doctor
llm-mux run rust-audit

Config goes in ~/.config/llm-mux/config.toml. Workflows go in .llm-mux/workflows/ or ~/.config/llm-mux/workflows/.

Source at github.com/ducks/llm-mux.

Lok was the prototype. llm-mux is the product. The 317 people using lok helped me figure out what the abstractions should be.

finna: Multi-Model Debate, Spec, and Implement

Feb 10, 2026

5 min read

A standalone tool that takes an idea, debates it across Claude, Codex, and Gemini, creates a roadmap, writes specs, and implements. Planning and execution in one pipeline.

#ai

#tools

#rust

#dev
Building a JSON Parser with Multi-LLM Orchestration (Part 1)

Feb 07, 2026

3 min read

Using lok to orchestrate four LLMs debating design decisions, then synthesizing specs for a Rust JSON parser. The debate phase surfaced edge cases no single model would have caught.

#ai

#tools

#rust

#dev
Lok Part 5: Multi-Agent Planning with lok spec

Feb 06, 2026

6 min read

Lok gains a spec command that turns task descriptions into structured implementation plans. Multiple LLMs propose, debate, and converge on a roadmap before any code gets written.

#ai

#tools

#rust

#dev
ARF: Structured Reasoning for AI Agents

Feb 02, 2026

4 min read

Moving beyond chat prompts to structured agent communication. Why unstructured data lets LLMs run wild, and how ARF enforces what/why/how before acting.

#rust

#ai

#oss
I Built a Robot to Help Me Understand People

Jan 31, 2026

5 min read

I'm good with machines but bad with people. So I built a tool that reads what my coworkers write and helps me understand them better.

#ai

#tools

#work

llm-mux: Why I Rebuilt Lok

What Was Wrong

Roles

Teams

HTTP Backends

Apply and Verify

An Example

What llm-mux Is Not

What Doesn't Work Yet

Getting Started

Related Posts

finna: Multi-Model Debate, Spec, and Implement

Building a JSON Parser with Multi-LLM Orchestration (Part 1)

Lok Part 5: Multi-Agent Planning with lok spec

ARF: Structured Reasoning for AI Agents

I Built a Robot to Help Me Understand People