[ECONOMICS]

Agent Cost Economics

Fix vulnerabilities for $2.64–$52 with agents. 100x cheaper than incident response. Real cost data.

Agent Cost Tiers

Standard agents: $2.64–$15 per fix
Advanced agents: $15–$45 per fix
Frontier agents: $30–$52 per fix

Hidden Costs of Incident Response

When a CVE hits production, costs multiply: engineer time, customer notifications, reputation damage, regulatory fines. Fixing in pre-production saves orders of magnitude.

Cost Optimization

Use cheaper agents for easy bugs (syntax errors, refactors). Reserve frontier agents for hard architectural problems. XOR tracks which agent solves which classes of bugs best.

Total benchmark cost

$2.64

Cheapest per pass

Best trade-off agents

1,920

Test runs completed

Prefer interactive charts? Open the Benchmark Explorer →

What it costs to fix a bug with AI

We spent $0 running 15 agents across 128 real bugs. The cheapest agent fixes bugs for $2.64 each. The most accurate costs $52/fix. Growing to 6,138+ vulnerabilities across 250+ projects.

Cost matters because you will run this repeatedly. If a bug costs $5 to patch and you run this across 500 vulnerabilities, your spend is $2,500. The same bugs with a $0.50 agent cost $250. These numbers drive real procurement decisions. We measure actual token consumption from API logs, not estimates.

Budget with real data

Security ROI = (risk reduced - cost) / cost. These tested cost-per-fix numbers replace guesswork in your budget.

See Agentic SecEcon →

Cost vs Performance

Each dot is an agent. X-axis: cost per successful patch (log scale). Y-axis: pass rate. The dashed line shows the best trade-off - no agent below it is both cheaper and more accurate.

The scatter reveals agent clusters. Some agents cluster together at 40-50% pass rate with similar costs - they are functionally equivalent. But outliers exist: agents that are cheap but miss easy bugs, or expensive but uniquely capable on hard ones. Your choice depends on your budget and your bug distribution.

Pareto Frontier with Confidence Intervals

Cost efficiency frontier with 95% Wald confidence intervals on pass rates. The Pareto frontier identifies agents where no alternative is both cheaper AND more accurate. Every agent on this line represents a genuine trade-off decision: lower cost or higher accuracy, but not both.

Confidence intervals show the statistical range around each agent's pass rate. Wider intervals indicate greater uncertainty, typically from agents with more edge cases or lower sample counts. Agents on the frontier with tight confidence intervals are more reliable choices than those with wide bands.

Oracle Set Cover

Greedy set cover showing marginal value of adding each agent to the ensemble. This analysis answers a practical question: if you can run multiple agents on the same bug, which ones should you add to maximize coverage? Start with the best agent (highest pass rate) and add agents that fix bugs the leader misses.

The first agent covers maybe 60% of bugs. Adding the second agent might bring you to 72%. Adding a third might hit 78%. But at some point, the marginal gain from each new agent drops below the cost. This visualization helps you find the optimal ensemble size for your budget.

Cost Efficiency Rankings

Rank	Agent	$/Pass	API Cost	Pass Rate	Passes
1	claude-claude-opus-4-5	$2.64	$153	45.7%	58
2	claude-claude-opus-4-6	$2.93	$225	61.6%	77
3	gemini31-gemini-3.1-pro-preview	$3.92	$251	58.7%	64
4	cursor-composer-1.5	$3.93	$224	45.2%	57
5	gemini-gemini-3-pro-preview	$4.85	$267	43.0%	55
6	codex-gpt-5.2	$5.30	$419	62.7%	79
7	opencode-gemini-gemini-3.1-pro-preview	$5.81	$389	54.9%	67
8	cursor-gpt-5.3-codex	$6.16	$394	50.4%	64
9	cursor-gpt-5.2	$6.26	$394	51.6%	63
10	codex-gpt-5.2-codex	$6.65	$419	49.2%	63
11	opencode-gpt-5.2	$6.65	$419	51.6%	63
12	opencode-gpt-5.2-codex	$8.73	$419	37.8%	48
13	cursor-opus-4.6	$35.40	$2832	62.5%	80
14	opencode-claude-opus-4-5	$40.13	$1846	36.8%	46
15	opencode-claude-opus-4-6	$51.88	$3009	47.5%	58

Running multiple agents

Running multiple agents on the same bug increases coverage but costs more. Is the extra fix worth the extra cost?

Ensemble strategies matter for operational reality. You could run the cheap agent first and escalate to an expensive one on failure. This waterfall approach achieves 74% coverage at lower average cost. Or you could run a pair of agents in parallel and keep any fix. The choice changes your overall project timeline and budget.

Best single agent

$2832

Best pair

$3226

102

Oracle (all 15)

$10,598

Marginal analysis

Best single agent: cursor-opus-4.6. Adding codex-gpt-5.2 covers 16 more samples at $25/pass - a 9x premium over the cheapest single-agent approach.

Unlock full results

Enter your email to access the full methodology, per-sample analysis, and patch examples.

[NEXT STEPS]

Optimize your agent spend

The cheapest path: claude-claude-opus-4-5 at $2.64/fix. The most accurate: opencode-claude-opus-4-6 at $52/fix. For most teams, the best pair covers 96/128 bugs.

Run this on your codebase →Install GitHub App →

Explore more

Agent leaderboard
- pass rates and rankings
Agent profiles
- how agents differ, where they agree
How verification works
- isolation, safety checks, bug reproduction

FAQ

How much does an agent fix cost?

$2.64 to $52 depending on agent and model. Calculated from real API costs across 1,920 evaluations.

Why such a wide range?

Different agents have different API costs (Claude vs Codex vs Gemini). Different bugs require different reasoning depth. Some agents solve in one attempt; others need multiple tries.

How does this compare to incident response?

Incident response for a critical CVE typically costs $10K–$50K in engineer time + downtime. Agent-based pre-production fixing costs dollars. 100x–1000x cheaper.

What if the agent fails?

Failed fixes still provide learning signals. You see which agents struggled, which tools they tried, and which approaches didn't work. No wasted money-just data.

[RELATED TOPICS]

Benchmark Results

62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.

Benchmark Results

62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.

Agent Configurations

15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.

Benchmark Methodology

How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.

Validation Process

25 questions we ran against our own data before publishing. Challenges assumptions, explores implications, extends findings.

Cost Analysis

10 findings on what AI patching costs and whether it is worth buying. 1,920 evaluations analyzed.

See which agents produce fixes that work

128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.