Trajectories¶
A trajectory is a complete, immutable record of an agent execution. It captures every input, output, tool call, and policy decision. Trajectories are how you debug agent failures, improve reliability over time, and prove compliance when you need it.
What Trajectories Capture¶
- Every input the agent received
- Every output the agent produced
- Every tool call and its arguments
- Every policy decision and why it was made
Trajectory Lifecycle¶
Every trajectory follows the same lifecycle: initialize, adjudicate, finalize.
sequenceDiagram
participant A as Your Agent
participant H as Harness
A->>H: 1. Initialize (creates trajectory)
A->>H: 2. Adjudicate: "Run bash ls"
H-->>A: ✓ Allowed (step recorded)
A->>H: 3. Adjudicate: "Run bash rm -rf /"
H-->>A: ✗ Denied (step recorded with reason)
A->>H: 4. Adjudicate: "Read config.json"
H-->>A: ✓ Allowed (step recorded)
A->>H: 5. Finalize (closes trajectory)
Each adjudication creates a step in the trajectory. Steps occur at different stages of the agent loop (PRE_MODEL, POST_MODEL, PRE_TOOL, POST_TOOL). Even denied actions are recorded for auditing.
Harness Methods¶
Three methods control the lifecycle: initialize(), adjudicate(), and finalize().
from sondera import Stage, Role, Decision, ToolRequestContent, PolicyViolationError
# Assume harness and agent are already configured (see Getting Started)
# 1. Initialize: creates a new trajectory
await harness.initialize(agent=agent)
trajectory_id = harness.trajectory_id # Store this for later retrieval
# 2. Adjudicate: each call adds a step to the trajectory
result = await harness.adjudicate(
Stage.PRE_TOOL, Role.MODEL,
ToolRequestContent(tool_id="Bash", args={"command": "ls"})
)
if result.decision == Decision.DENY:
# The step is still recorded. Your code decides what to do next.
# Option A: Stop execution
raise PolicyViolationError(Stage.PRE_TOOL, result.reason, result)
# Option B: Steer the agent (feed reason back to model)
# return f"Action denied: {result.reason}. Try a different approach."
# 3. Finalize: marks trajectory as complete
await harness.finalize()
When a step is denied: The harness returns Decision.DENY but doesn't stop your agent automatically. Your agent code decides what to do: skip the action, try an alternative, or halt execution. The denied step is still recorded in the trajectory for auditing. See Decisions for handling patterns.
Real-World Example: Debugging a Failure¶
Here's a typical workflow: your agent failed in production, and you need to figure out why.
from datetime import datetime, timedelta
from sondera import SonderaRemoteHarness, TrajectoryStatus
harness = SonderaRemoteHarness()
# 1. Find recent failed trajectories
cutoff_time = datetime.now(timezone.utc) - timedelta(hours=24)
all_failed = await harness.list_trajectories(
agent_id="customer-support-agent",
status=TrajectoryStatus.FAILED,
)
# Filter client-side by created_at
failed = [t for t in all_failed if t.created_at >= cutoff_time]
print(f"Found {len(failed)} failed trajectories in the last 24 hours")
# 2. Get full details for the most recent failure
if failed:
trajectory = await harness.get_trajectory(failed[0].id)
# 3. Find what went wrong
for adj_step in trajectory.steps:
if adj_step.adjudication.decision == Decision.DENY:
print(f"DENIED at {adj_step.step.stage.value}:")
print(f" Action: {adj_step.step.content}")
print(f" Reason: {adj_step.adjudication.reason}")
# 4. Check metadata for context
print(f"User: {trajectory.metadata.get('user_id')}")
print(f"Session: {trajectory.metadata.get('session_id')}")
print(f"Duration: {trajectory.duration}s")
Trajectory Types¶
Sondera uses two trajectory types. The difference is whether steps include adjudication details:
| Type | Steps contain | When to use |
|---|---|---|
Trajectory |
TrajectoryStep |
Listing and filtering (lightweight, no policy details) |
AdjudicatedTrajectory |
AdjudicatedStep |
Debugging and auditing (full policy decisions) |
Why two types? Performance. When listing hundreds of trajectories, you don't need full adjudication details for each step. Use list_trajectories() to browse (returns Trajectory), then get_trajectory(id) to fetch full details for a specific one (returns AdjudicatedTrajectory).
Trajectory¶
Basic trajectory metadata, returned by list_trajectories():
from sondera import Trajectory, TrajectoryStatus
traj: Trajectory
traj.id # Unique identifier (e.g., "traj-abc123")
traj.agent_id # Agent that executed
traj.status # TrajectoryStatus enum (see below)
traj.created_at # When trajectory was created
traj.updated_at # When trajectory was last updated
traj.started_at # When execution started (after first step)
traj.ended_at # When execution finished (after finalize)
traj.metadata # Custom metadata dict (see Metadata section)
traj.steps # list[TrajectoryStep] - steps WITHOUT adjudication
# Convenience properties
traj.step_count # Number of steps
traj.duration # Seconds between started_at and ended_at (or None)
traj.is_completed # True if status is COMPLETED or FAILED
traj.is_active # True if status is RUNNING
# Filter methods
traj.get_steps_by_role(Role.MODEL) # All model steps
traj.get_steps_by_stage(Stage.PRE_TOOL) # All PRE_TOOL steps
AdjudicatedTrajectory¶
Full trajectory with policy decisions on each step, returned by get_trajectory():
from sondera import AdjudicatedTrajectory
adj_traj: AdjudicatedTrajectory
adj_traj.steps # list[AdjudicatedStep] - steps WITH adjudication
# All other properties same as Trajectory
TrajectoryStep¶
A raw step without adjudication info:
from sondera import TrajectoryStep
step: TrajectoryStep
step.stage # Stage enum: PRE_MODEL, PRE_TOOL, etc.
step.role # Role enum: USER, MODEL, TOOL, SYSTEM
step.content # ToolRequestContent, ToolResponseContent, or PromptContent
step.created_at # When this step occurred
step.state # Optional dict: agent state at this point (see State section)
step.context # Optional dict: request context (see Context section)
AdjudicatedStep¶
A step paired with its policy decision:
from sondera import AdjudicatedStep
adj_step: AdjudicatedStep
adj_step.step # The underlying TrajectoryStep
adj_step.adjudication # The policy decision (Adjudication)
adj_step.mode # MONITOR or GOVERN (see Modes section)
# Access step data through .step
adj_step.step.stage # PRE_MODEL, PRE_TOOL, etc.
adj_step.step.role # USER, MODEL, TOOL, SYSTEM
adj_step.step.content # The evaluated content
# Access decision through .adjudication
adj_step.adjudication.decision # Decision.ALLOW, Decision.DENY, or Decision.ESCALATE
adj_step.adjudication.reason # Why this decision was made
adj_step.message # "Allow: reason" or "Deny: reason"
TrajectoryStatus¶
from sondera import TrajectoryStatus
TrajectoryStatus.PENDING # Created but not started
TrajectoryStatus.RUNNING # Currently executing
TrajectoryStatus.COMPLETED # Finished successfully
TrajectoryStatus.SUSPENDED # Paused mid-execution (use resume() to continue)
TrajectoryStatus.FAILED # Finished with error
Modes: MONITOR vs GOVERN¶
Each adjudicated step has a mode that determines how policy violations are enforced:
| Mode | Behavior | Use case |
|---|---|---|
MONITOR |
Log violations but don't block | Testing new policies, shadow mode, gathering data |
GOVERN |
Block violations and return DENY | Production enforcement |
MONITOR mode is useful when rolling out new policies. You can see what would be denied without disrupting your agent. Once you're confident the policy is correct, switch to GOVERN.
Mode is configured at the harness level when you create it. See Deployment for configuration details.
Accessing Trajectories¶
How you access trajectory data depends on which harness you use.
Platform (SonderaRemoteHarness)¶
The platform persists all trajectory data. Use these methods to retrieve it:
from datetime import datetime, timedelta
from sondera import SonderaRemoteHarness, TrajectoryStatus
harness = SonderaRemoteHarness()
await harness.initialize(agent=agent)
# Get current trajectory ID
trajectory_id = harness.trajectory_id
# ... run agent, call adjudicate() ...
await harness.finalize()
# Retrieve full trajectory (returns AdjudicatedTrajectory with policy decisions)
trajectory = await harness.get_trajectory(trajectory_id)
if trajectory:
for adj_step in trajectory.steps:
print(f"{adj_step.step.stage.value}: {adj_step.adjudication.decision.value}")
Listing and Filtering¶
# List trajectories with filters
trajectories = await harness.list_trajectories(
agent_id="my-agent",
status=TrajectoryStatus.COMPLETED, # Optional: filter by status
page_size=50, # Results per page (default: 100, max: 1000)
)
# Pagination: use page_token for large result sets
first_page = await harness.list_trajectories(agent_id="my-agent", page_size=100)
if first_page.next_page_token:
second_page = await harness.list_trajectories(
agent_id="my-agent",
page_token=first_page.next_page_token,
)
Analytics¶
# Get aggregate statistics
result = await harness.analyze_trajectories(
agent_id="my-agent",
analytics=["trajectory_count"],
)
# Returns a dict with aggregate metrics:
print(f"Total trajectories: {result['analytics']['trajectory_count']['total']}")
print(f"Computed at: {result['computed_at']}")
print(f"Completed: {result['analytics']['completed_count']}")
print(f"Failed: {result['analytics']['failed_count']}")
print(f"Total steps: {[result['analytics']'step_count']}")
print(f"Denied steps: {result['analytics']['denied_count']}")
print(f"Average duration: {result['analytics']['avg_duration_seconds']}s")
print(f"Top denied policies: {result['analytics']['top_denied_policies']}") # list of (policy_id, count)
Local (CedarPolicyHarness)¶
Local harness doesn't store steps
CedarPolicyHarness evaluates policies locally but doesn't persist trajectory steps.
You only have access to trajectory_id. If you need full trajectory history, use
SonderaRemoteHarness instead.
Resuming Trajectories¶
Long-running agents can crash, time out, or need checkpoints. Resume lets you continue from where you left off instead of starting over.
When to use resume:
- Agent process crashed and restarted
- Timeout during a long operation
- Intentional checkpoint/restore pattern
- Human-in-the-loop approval that spans sessions
from sondera import SonderaRemoteHarness
# Session 1: Start a trajectory
harness = SonderaRemoteHarness()
await harness.initialize(agent=agent)
trajectory_id = harness.trajectory_id
# ... agent runs, process crashes or times out ...
# Session 2: Resume in a new process
harness = SonderaRemoteHarness() # Fresh harness instance
# Resume the same trajectory
await harness.resume(trajectory_id, agent=agent)
# Continue adjudicating - new steps append to the same trajectory
await harness.adjudicate(stage, role, content)
await harness.finalize()
Resume requires platform
resume() only works with SonderaRemoteHarness since the platform persists trajectory state.
CedarPolicyHarness doesn't support resuming because it doesn't store trajectory data.
Performance Considerations¶
| Operation | Typical latency | Notes |
|---|---|---|
adjudicate() |
5-20ms | Async, doesn't block agent execution |
get_trajectory() |
50-200ms | Depends on step count |
list_trajectories() |
20-100ms | Use pagination for large result sets |
Tips:
- Use
list_trajectories()with filters to narrow results before fetching full details - Use
get_trajectories_batch()instead of looping overget_trajectory() - Set appropriate
page_sizebased on your UI/processing needs - Consider using MONITOR mode during development to reduce enforcement overhead
Next Steps¶
- Decisions: Understanding ALLOW, DENY, and steering patterns
- Stages: Where in the agent loop steps are captured
- Writing Policies: Cedar syntax and patterns
- Integrations: Automatic trajectory capture in LangGraph, ADK, and Strands