AgentsPersonalHorizontal

Agent Hygiene

The sanity check every agent should pass before shipping.

Adversarial prompts covering OWASP LLM Top-10 — prompt injection, instruction leakage, tool-call sanity, refusal calibration. The sanity check every agent should pass before shipping.

Install in Claude Code See pricing

What's in the box.

Covers the OWASP LLM Top-10 plus 12 field-reported failure modes

Deterministic graders — no LLM-as-judge flakiness

Calibrated refusal rate, not just a binary pass/fail

Install

Three commands.
Then receipts.

Install the Pistachio CLI, add the harness as a Claude Code MCP tool, run it against your agent, and get a signed pass/fail report you can drop into a PR or sales deck.

CLI (Claude Code)

zsh

# 1. Install Pistachio CLI
npx @pistachio/cli login

# 2. Add the harness as a Claude Code MCP tool
pistachio harness add agent-hygiene

# 3. Run it from inside Claude Code — get a signed report
pistachio harness run agent-hygiene

SDK (Node)

typescript

import { Pistachio } from "@pistachio/sdk";

const pistachio = new Pistachio({ apiKey: process.env.PISTACHIO_KEY });

const run = await pistachio.harnesses.run("agent-hygiene", {
  endpoint: "https://your-agent.example.com/v1/messages",
});

console.log(run.passRate);
console.log(run.signedReportUrl);

MCP-native

Lives inside Claude Code.

Signed reports

Ed25519 attestation.

Deterministic

Same input, same score.

Harnesses you'll probably also want

RAG

RAG Faithfulness

Catch hallucinations before your users do.

Tool Use

Tool Use Stress Test

Function-call scenarios your agent will eventually hit.

Legal

Legal Citation Accuracy

Real cases. Real holdings. No fabrications.