[FOR AI TEAMS]

Deploy agents faster. Verify every output.

Your agents produce patches. XOR verifies them in seconds and feeds results back so agents get smarter. 13 agents benchmarked. Costs go down with every run.

See benchmark results Book a demo

128 vulnerability test casesCurrent verified1,664 Verified evaluationsCurrent verified6,138+ target vulnerabilitiesTarget250+ projectsTarget

[STORY]

Agents ship patches fast. Nobody checks if they work.

OutcomeThe best agent passes 62.7% of the time. The rest of the time, you are debugging its mistakes. XOR eliminates that debugging loop.

MechanismXOR tests 13 agent configurations on 128 real-world bugs across 40 open-source projects. Every result feeds back into the agent harness so the next run is faster and cheaper.

Proof1,664 test runs completed. 370 broken patches caught before they hit a PR.

See benchmark results →

Eliminate the agent debugging loop.

How verification works

Bad patches waste your team's time. XOR verifies every agent output automatically. 370 broken patches caught before code review. Your team reviews verified fixes, not raw agent output.

Agents learn from every run. Costs go down.

How agents learn

Every pass and every failure feeds back into the agent harness. XOR upgrades the system prompt, injects memory from previous runs, and routes the next task to the best agent. 1,664 results already in the learning dataset.

Pick the best agent per task, not per vendor.

Benchmark results

Pass rates range from 24.5% to 62.7% depending on the agent, the model, and the repo. XOR benchmarks 13 configurations so you route each task to the agent that handles it best. No vendor lock-in.

"This will slow agents down."

Verification runs in parallel with your CI. One plugin install, one GitHub App click. Agents keep running at full speed. Only the output is verified.

"Our agent works fine."

On which repos? With which model? Pass rates range from 24.5% to 62.7% depending on configuration. XOR shows you the numbers so you can route tasks to the right agent.

"We do not want to manage another tool."

One plugin install. Agents learn automatically. Results feed back with no manual intervention. 1,664 runs completed without human-in-the-loop.

FAQ

How does this speed up agent deployment?

XOR verifies agent output automatically, so bad patches never reach review. Agents learn from every run, so pass rates increase and costs decrease over time. Your team spends time on features, not debugging agent mistakes.

Which agents does XOR support?

Any agent that produces a git patch. We have benchmarked 13 configurations so far — Claude Code, Codex, Gemini CLI, and OpenCode. The plugin and GitHub App work with any coding agent.

How do agents learn from XOR?

Every pass and every failure feeds back into the agent harness. XOR upgrades the system prompt, injects memory from previous runs, and routes the next task based on agent performance data. 1,664 results already in the learning dataset.

[RELATED TOPICS]

Agent learning data

How verified outcomes feed back into agents.

Agent benchmark results

Pass rates for 13 agent configurations on real bugs.

How verification works

Automated output verification for every agent run.

Benchmark methodology

How we test agents on real-world bugs.

Deploy agents with confidence.

128 real bugs. 13 agents. 1,664 runs. Agents learn from every one.

See benchmark results

READY TO START

$xor patch --verify --learn

EXECUTE COMMANDBook a demo

See benchmark results → →