Harnesses Research Playground Pricing Docs

The trust layer
for AI agents.AI agents.

Pre-built harnesses for the scenarios you'll actually hit, built with real domain experts. The guardrails you wish your agent had — but never did.

Browse harness marketplace See the playground

Live on CLI

~ / claude-code

$ pistachio harness add agent-hygiene

✔ Installed as Claude Code MCP tool

✔ 350 checks ready

$ pistachio harness run agent-hygiene

Running 350 / 350 … ▉

Pass rate96.3%

13 failuressigned

Report → pistachio.sh/r/9fZ3k

100K+

Eval runs shipped

12

Curated harnesses

18

Failure modes detected

42s

Avg setup time

How it works

Three commands.
Zero config.

We handle the hard parts — graders, fixtures, sandboxing. You focus on the agent.

01

Pick a harness

Browse curated harnesses, from agent hygiene to RAG faithfulness. Filter by tier, model, and failure mode.

02

Pipe it into Claude Code

One CLI command installs the harness as an MCP tool. Zero config. Runs against the models you already use.

03

Ship with receipts

Get a signed eval report you can drop into a PR, a launch doc, or a sales deck. No hand-waving.

Featured harnesses

Curated. Battle-tested.
Opinionated.

Browse harness marketplace

RAG Faithfulness

Catch hallucinations before your users do.

LegalEnterprise

Legal Citation Verification

Your AI hallucinates cases. We catch it.

BenchmarkEnterprise

WebVoyager

Score your browser agent against the updated WebVoyager corpus (2026 version).

Receipts

From people who
already pushed agents to prod.

“Caught a regression we’d never have found otherwise.”

Maya R.

Eng lead, agent team

“Pistachio is our audit trail engine now.”

Theo C.

Eng lead, retail bank

“Replaced our homegrown eval scripts in a week. Now it just runs in CI.”

Iris N.

Platform lead, applied AI

The trust layer between
your agent and the real world

Pistachio runs the harnesses your agent should pass before it sees a real user. Same harnesses, same receipts, whether you're a solo dev shipping your first agent or an enterprise team managing fleets of them.

See pricing Talk to us