Multimodal dataset

/multimodal 'day.tuh.seht/A dataset containing non-text inputs/outputs (images, audio, PDFs), requiring multimodal evals and storage. Multimodal datasets often need specialized rubrics and display tooling for review. (noun)

“Our multimodal dataset checks whether the model reads values from screenshots.”

Customer example

Navan built a multimodal dataset of raw audio recordings (not just transcripts) to evaluate its voice agent, so scoring captures tone and nuance alongside content. Read more

Related Datasets terms

Adversarial examples

•

Coverage

•

Dataset

•

Dataset record

•

Edge case

•

Expected output (ground truth)

•

flush()

•

Golden dataset

•

Input

•

Metadata

From the docs

Build datasets

•

Create experiments

•

Evaluate systematically

•

Add to dataset captures root span only

•

Configure dataset schemas via API

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.