Expected output (ground truth)

/ih'kspeh.ktuhd 'ow.tpuut grownd trooth/The reference answer/behavior a dataset record defines as correct (when doing reference-based evals). Ground truth can be human-written, programmatically derived, or imported from production. (noun)

“The expected output includes the exact policy language and a citation.”

Related Datasets terms

Adversarial examples

•

Coverage

•

Dataset

•

Dataset record

•

Edge case

•

flush()

•

Golden dataset

•

Input

•

Metadata

•

Multimodal dataset

From the docs

Build datasets

•

Create experiments

•

Evaluate systematically

•

Human review output format: string vs JSON

•

Glossary

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.