Encyclopedia Evalica / Datasets / Dataset

Dataset illustration

Dataset

/'day.tuh.seht/A versioned collection of test cases used to run evals and track improvements over time. Versioning matters because it keeps comparisons reproducible. (noun)

We added 50 new examples to the dataset and re-ran the eval.

Related Datasets terms

From the docs

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.

Start building