The trust layer
for AI agents.AI agents.
Pre-built harnesses for the scenarios you'll actually hit, built with real domain experts. The guardrails you wish your agent had — but never did.
Now in Claude Code
$ pistachio harness add agent-hygiene
✔ Installed as Claude Code MCP tool
✔ 350 checks ready
$ pistachio harness run agent-hygiene
Running 350 / 350 … ▉
Report → pistachio.sh/r/9fZ3k
Three commands.
Zero config.
We handle the hard parts — graders, fixtures, sandboxing. You focus on the agent.
Pick a harness
Browse curated harnesses, from agent hygiene to RAG faithfulness. Filter by tier, model, and failure mode.
Pipe it into Claude Code
One CLI command installs the harness as an MCP tool. Zero config. Runs against the models you already use.
Ship with receipts
Get a signed eval report you can drop into a PR, a launch doc, or a sales deck. No hand-waving.
Curated. Battle-tested.
Opinionated.
RAG Faithfulness
Catch hallucinations before your users do.
Legal Citation Accuracy
Real cases. Real holdings. No fabrications.
PHI Leakage
HIPAA-grade PHI detection on every agent output.
From people who
already pushed agents to prod.
“We stopped shipping agents on vibes the day we plugged Pistachio into our CI. Pre-deploy harness runs catch stuff our staging never did.”
“The RAG faithfulness harness alone pays for our subscription. We found three silent regressions in week one.”
“Finally — evals that I actually run. The Claude Code integration is the whole pitch. It's where my work already lives.”
The trust layer between
your agent and the real world
Pistachio runs the harnesses your agent should pass before it sees a real user. Same harnesses, same receipts, whether you're a solo dev shipping your first agent or an enterprise team managing fleets of them.