llm-mux: Why I Rebuilt Lok
Lok hit 317 cargo installs. People were using it. So naturally I rewrote it from scratch.
That's not as chaotic as it sounds. Lok grew organically from "query multiple LLMs" to "run workflows" to "apply edits" to "create GitHub issues." Each feature bolted onto the side. The codebase worked but the abstractions were wrong.
llm-mux is what lok should have been from the start.
What Was Wrong
Lok conflates backends with tasks. When you write backend = "claude" in a
workflow step, you're coupling your workflow to a specific model. Want to swap
Claude for Gemini? Edit every step.
Lok also has no concept of project context. A Rust project needs cargo test
for verification. A Node project needs npm test. In lok, you hardcode these
per-workflow. Switch projects, rewrite workflows.
The apply_edits feature was bolted on late. It works but there's no retry loop, no structured verification, no rollback without git-agent.
Roles
llm-mux introduces roles. Instead of hardcoding backends:
# lok style - backend hardcoded
[[steps]]
name = "analyze"
backend = "claude"
prompt = "Find bugs"
You declare what kind of task it is:
# llm-mux style - role-based
[[steps]]
name = "analyze"
type = "query"
role = "analyzer"
prompt = "Find bugs"
Then configure which backends handle which roles:
[roles.analyzer]
description = "Code analysis tasks"
backends = ["claude", "codex"]
execution = "parallel"
[roles.quick]
description = "Fast local checks"
backends = ["qwen"]
execution = "first"
Swapping backends is a config change, not a workflow rewrite. The workflow says "I need analysis." The config decides who does analysis.
Teams
Teams add project context:
[teams.rust]
description = "Rust projects"
detect = ["Cargo.toml"]
verify = "cargo clippy && cargo test"
[teams.rust.roles.analyzer]
backends = ["claude", "codex"]
When llm-mux detects Cargo.toml, it activates the rust team. Verification
commands come from the team. Role mappings can be overridden per-team.
Same workflow, different projects, correct tooling.
HTTP Backends
Lok only does CLI. You shell out to claude, codex, ollama run. Each query
spawns a process.
llm-mux supports both CLI and HTTP:
[backends.claude]
command = "claude"
args = ["-p", "--"]
[backends.openai]
command = "https://api.openai.com/v1"
model = "gpt-4"
api_key = "${OPENAI_API_KEY}"
[backends.local]
command = "http://localhost:11434/v1"
model = "llama3"
If the command starts with http, it's HTTP. Otherwise CLI. HTTP is faster for
high-volume workflows. No process overhead. Proper streaming. Rate limit handling.
Apply and Verify
Lok's apply_edits was a boolean flag. llm-mux has a real system:
[[steps]]
name = "fix"
type = "apply"
source = "steps.analyze"
verify = "cargo test"
verify_retries = 3
verify_retry_prompt = "Fix failed: {{ error }}. Try again."
rollback_on_failure = true
The flow:
- Parse edits from source step
- Apply edits
- Run verification
- If it fails and retries remain, show error to LLM, try again
- If all retries fail, rollback
The retry loop is the difference. Instead of failing on first bad edit, llm-mux shows the error and asks for a fix. Most failures are small mistakes a second attempt catches.
Rollback uses git stash. No external tooling.
An Example
The rust-audit workflow runs four parallel audits and writes structured docs:
llm-mux run rust-audit
llm-mux run rust-audit outdir=reports/feb-audit
Output:
docs/audit/
├── README.md # Summary table
├── 01-safety.md # Memory safety
├── 02-performance.md # Perf issues
├── 03-errors.md # Error handling
└── 04-idioms.md # Patterns
Each audit is its own query step. Each saves to a file. The final step synthesizes
a summary. The outdir argument makes it reusable.
What llm-mux Is Not
llm-mux is not a replacement for lok's CLI commands. There's no llm-mux ask or
llm-mux hunt. It's purely a workflow runner.
If you want quick one-off queries, use lok. If you want structured multi-step pipelines with proper abstractions, use llm-mux.
I'm keeping both. They solve different problems.
What Doesn't Work Yet
The template system is powerful but error messages are cryptic. A typo in a Jinja variable gives you a wall of minijinja internals.
HTTP backend streaming works but the progress output is ugly. You see chunks arrive but it's not as clean as the CLI backend output.
Team auto-detection is basic. It looks for files but doesn't understand monorepos or nested projects yet.
Getting Started
cargo install llm-mux
llm-mux doctor
llm-mux run rust-audit
Config goes in ~/.config/llm-mux/config.toml. Workflows go in
.llm-mux/workflows/ or ~/.config/llm-mux/workflows/.
Source at github.com/ducks/llm-mux.
Lok was the prototype. llm-mux is the product. The 317 people using lok helped me figure out what the abstractions should be.