How to cut token costs when running AI coding agents
Autonomous coding agents waste tokens four ways: redoing work, working the wrong thing, re-fetching lost context, and spinning on an empty queue. Cut all four with one ranked queue, task leasing, full per-task context, and a hard stop — then track value-per-token to see what's worth it.
Where the tokens actually go
When people see a big bill from running agents on autopilot, the instinct is "the model is expensive." Usually the model is fine — the waste around it is the bill. Four sources, roughly in order of cost:
- Wrong-thing work — the agent picks a task that didn't matter, or does them in a costly order, because there's no single source of truth for "what's next." The most expensive waste: full token spend, zero value.
- Redone work — a re-run, or a second session, repeats a task already done because nothing marked it taken. You pay twice.
- Re-fetched context — the agent re-reads files, re-greps the repo, re-asks what it already knew, because context was lost between steps or tasks.
- Idle spinning — with no hard stop, the agent invents low-value busywork instead of finishing, burning tokens to look busy.
Five ways to cut it
- One ranked queue → no wrong-thing work. A single global priority order means "what's next?" has one deterministic answer. The agent always pulls the highest-value task, never an expensive detour.
- Lease tasks → no redo, no collisions. When an agent takes a task it's marked in progress, so a re-run or a second session skips it. You never pay for the same task twice — which also makes running several agents in parallel cheap instead of duplicative.
- Full context per task → fewer round-trips. Hand the agent the relevant context with the task (
get_next_taskreturns the task plus merged project context). Fewer "let me re-read everything" loops = fewer tokens. - Stop on an empty queue → no spinning. A loop that ends cleanly when there's nothing left beats one that invents work. (TaskPrio's autopilot prompt stops on empty by design.)
- Track value-per-token → cut the bad runs. Measure completed work ÷ tokens, per run, so a run that spends a lot and ships little is visible immediately — not a surprise on the invoice.
Measuring it: value-per-token
Value-per-token is the honest metric for autonomous work: tasks completed (or outcomes shipped) divided by the tokens a run consumed. It reframes the question from "how many tokens did we use?" to "what did we get for them?" TaskPrio's live Sessions cockpit shows this per connected agent — current task, throughput, and real value-per-token — so an expensive-but-unproductive run is something you see and stop, not something you discover later.
The point isn't to minimize tokens. It's to maximize work per token — and the only way to manage that is to measure it.
Frequently asked questions
Why do autonomous agents burn so many tokens?
Mostly waste, not capability: redoing work, working the wrong thing, re-fetching lost context, and spinning when there's no stop. Remove those and the same model gets much cheaper.
Does a task queue actually lower the cost?
Yes — a ranked queue removes wrong-thing work (one deterministic next task) and leasing removes redone work (a taken task is skipped). Those are the two biggest line items.
How do I know which runs were worth it?
Value-per-token: completed work ÷ tokens spent, per run. A cockpit that shows it live lets you stop the low-ROI runs instead of paying for them.