Skip to main content
[TRAJECTORIES]

Agent Run Trajectories

A complete record of every agent run. See what the agent did, verify it independently, and feed the data back.

[CONTINUOUS LEARNING]

Every run makes agents smarter

OutcomeFeed verified outcomes back into agents so they improve over time.

MechanismXOR records every agent action, signs it, and feeds pass/fail results back into the agent harness. Failed fixes become learning signal. Passing fixes expand the training set.

ProofIETF Internet-Draft format. Open standard, not proprietary.

Record what agents do

XOR captures every action, tool call, and output from each agent run. You get a complete record of what happened and why.

Feed results back into agents

Every verified outcome feeds back into the agent harness. The system prompt is upgraded, memory from previous runs is injected, and the next vulnerability is triaged by business impact.

IETF
Draft status
3
Conformance classes
CDDL
Schema format

How trajectories fit the loop

People steer the run while agents execute. Most interaction happens through prompts. XOR captures each run as a verifiable trajectory, then keeps the loop running until reviews are clean.

What trajectories capture

A trajectory is a timestamped record of every action an agent took during a run. It logs which agent ran, what tools it called, what files it changed, what tests it ran, and what the results were. Each action is cryptographically signed so it cannot be altered. The trajectory is immutable proof of what the agent did.

Why behavioral clustering matters

Different agents approach the same vulnerability differently. Some agents write minimal fixes. Others add safety checks. Some explore the codebase before fixing. Others fix immediately and test. By analyzing thousands of trajectories, XOR identifies patterns: which agents are methodical vs aggressive, which ones double-check their work, which ones get stuck in loops. This helps teams understand agent behavior and predict which agent will work best for their codebase and risk tolerance.

[SPECIFICATION]

What the draft requires

  • Session Trace and File Attribution records
  • Signing Envelope with a COSE_Sign1 wrapper for cryptographic verification
  • Conformance Requirements: Producer/Verifier/Consumer classes with RFC 2119 terminology

  • CDDL schema for trace structure and validation

Trace fields that matter

  • Agent identity, tool calls, and outputs per step
  • File operations tied to patch evidence
  • Reasoning entries (optional, privacy-gated)
  • Verification outcomes tied to CVE identifiers
[AGENT TRAJECTORIES]

Real agent sessions from CVE-Agent-Bench. Watch how different agents approach the same vulnerability.

Speed-runner: arrow #20123

Claude Opus 4.5 fixes a null-check bug in 3 tool calls and 19 seconds. Grep → Read → Edit pattern.

Tools
3
Tokens
8.4k
Duration
19s
[PASS]
5/1
0 / 6
Grep: Search for FieldFromFlatbuffer function to locate the vulnerable code
Found in src/arrow/extension_array_builder.cc at line 156
Read: Read src/arrow/flatbuffer.cc to understand the null-check issue
Files: src/arrow/flatbuffer.cc
Code shows field->name() called without null check. Missing safety guard.
Edit: Add null check before field->name() call to prevent crash
Files: src/arrow/flatbuffer.cc
[PASS] Patch applied. Tests pass. Null-check fix prevents crash.
[PASS] 5 added / 1 removed
[Files]
src/arrow/flatbuffer.cc
[VERIFICATION PIPELINE]
Step 1: Pull Image
2.10s
[verify] Pulling benchmark image
verifier:20123 pulled successfully
Step 2: Write Patch
0.30s
[verify] Writing patch to /tmp/agent-patch.diff
Patch written (4 bytes)
Step 3: Apply Patch
1.60s
[verify] Applying patch to source
git apply /tmp/agent-patch.diff
Patch applied cleanly
Step 4: Build
39.10s
[verify] Building with verifier toolchain
compile with memory safety instrumentation
Linking verification runtime
Build completed successfully
Step 5: Run Trigger
1.67s
[verify] Running trigger against patched binary
verify /tmp/trigger
Reading 2 bytes from trigger input
Compression error. Error code: -6
Execution successful
[PASS]
Vulnerability fixed. trigger no longer crashes.
[VERIFIABLE]

This session conforms to the IETF Verifiable Agent Conversation Record format. The data structure maps to the VAC entry types (tool-call, tool-result, message) and could be wrapped in a COSE_Sign1 envelope for cryptographic non-repudiation.

→ draft-birkholz-verifiable-agent-conversations

Where trajectories show up in XOR

Trajectories are attached to PR test reports and verification runs, so every fix is traceable and replayable. Teams can replay a trajectory to understand why an agent made a decision or to audit the fix for compliance. Security teams can use trajectory patterns to detect suspicious agent behavior or drift from expected approaches.

FAQ

What is an agent trajectory?

A trajectory is a signed record of every action an agent took during a run: tool calls, file edits, reasoning steps, and the final outcome (pass/fail).

How are trajectories used for learning?

Every trajectory feeds back into the agent harness. Failed runs become learning signal. Passing runs expand the training corpus. Each cycle makes agents smarter.

Can I access raw trajectory data?

Yes. Trajectories are available in JSON and CBOR formats. Export to your analytics pipeline or SIEM.

[RELATED TOPICS]

See which agents produce fixes that work

128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.