Encyclopedia Evalica / Evaluation / Eval harness

Eval harness illustration

Eval harness

/ee.val 'hah.rnuhs/The code and infrastructure that runs an eval end-to-end (dataset loading, task execution, scoring, reporting). A harness makes evals repeatable across environments. (noun)

The eval harness makes the CI run reproducible across branches.

Related Evaluation terms

From the docs

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.

Start building