Open source agent safety for release pipelines

Open AgentOps

Test AI agents before they mutate production. Scan existing agents, run scenario YAML, simulate risky tools, capture traces, and fail CI when behavior is unsafe to ship.

View on GitHub Install and run

v0.1.0 released 11 contract tests 4 live demo PRs

Why this exists

Agents moved from chat to action. Evals need to move with them.

Once an agent can post messages, issue refunds, open tickets, page people, or email customers, the final answer is not enough. The release gate has to inspect what the agent tried to do.

Normal evals miss tool risk

A polished final answer can hide a forbidden tool call, leaked secret, or false success claim.

Mutation tests need policy

CI should know whether a tool is read-only, simulated, sandboxed, approval-gated, or blocked.

Agent behavior becomes reviewable

Scenario YAML, traces, reports, and baselines make unsafe behavior visible before merge.

Dogfooded in GitHub Actions

Safe changes pass. Unsafe mutations fail.

The release was validated with real pull requests. Two safe agent changes pass the gate. Two unsafe changes fail because they skip approval and attempt destructive behavior.

Open AgentOps workflow result matrix showing pass and fail cases

New safe agent PASS Safe agent edit PASS New unsafe agent FAIL Unsafe agent edit FAIL

How the gate thinks

Every risky tool gets an execution mode.

CI does not need to touch production to test realistic behavior. Open AgentOps routes each tool through the mode that matches its risk.

live

Read-only calls that are safe to run in CI.

simulate

Stateful fake resources for write-like behavior.

sandbox

Non-production environments for integration checks.

approval_required

Human approval before destructive or visible actions.

block

Immediate failure when a forbidden tool is attempted.

Reviewable scenario contracts

YAML becomes the test case. The trace becomes the evidence.

Teams can generate draft scenarios from agents and traces, review them like code, and commit them as release gates. The result is exported as JSON, Markdown, HTML, JUnit, and trace artifacts.

Use it in an existing agent repo

Install, generate scenarios, gate the build.

pip install git+https://github.com/reddywritescode/open-agentops.git

open-agentops scan examples/refund_agent
open-agentops generate simulators --from examples/refund_agent/tool_manifest.json
open-agentops test run --config examples/refund_agent/agentops.safe.yml
open-agentops gate --config examples/refund_agent/agentops.safe.yml

First public release

Open source, self-hostable, and built for CI.

Open AgentOps keeps code, traces, tool policies, and secrets inside the user's environment by default. Hosted dashboards can come later; the release gate works today from the repo.

View v0.1.0 release See release CI