Encyclopedia Evalica / Evaluation / Toxicity score

Toxicity score

/tah'ksih.suh.tee skawr/A safety metric that estimates harmful/toxic content likelihood in outputs. It's often used to monitor and gate risk-sensitive releases. (noun)

The toxicity score spiked on a small set of adversarial prompts.

Related Evaluation terms

From the docs

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.

Start building