ABCDEFGHIJKLMNOPQRSTUVWXYZ

Encyclopedia Evalica / Evaluation / Multimodal

Multimodal

/muhl.tee'moh.duhl/AI systems or data that involve more than one type of input or output, such as text, images, audio, or PDFs. Multimodal traces and datasets require observability infrastructure that can store and display non-text content alongside structured data. (adjective)

“The new multimodal workflow lets us evaluate how the model answers questions about screenshots.”

Customer example

Navan takes a multimodal approach to evals, passing raw audio (not just transcripts) into automated scoring so the system can pick up on tone and nuance that text-only pipelines miss. Read more

Related Evaluation terms

Absolute scoring

•

Agent

•

AI eval

•

Alignment

•

Annotation schema

•

Baseline

•

Baseline experiment

•

Benchmark

•

Calibration

•

CI/CD integration

•

Coherence

•

Confidence interval

•

Eval harness

•

Eval leakage

•

Experiment

•

Factuality

•

Failure mode

•

Faithfulness

•

Feedback signal

•

Groundedness

•

Hallucination

•

Inter-annotator agreement (IAA)

•

LLM-as-a-judge

•

Loop

•

Model comparison

•

Non-determinism

•

Offline evaluation

•

Pairwise evaluation

•

Pass@k

•

Playground

•

Quality gate

•

RAG (retrieval-augmented generation)

•

RAG evaluation

•

Reference-based scoring

•

Reference-free scoring

•

Regression testing

•

Release criteria

•

Remote evaluation

•

Rubric

•

Safety

•

Score distribution

•

Scorer

•

Semantic failure

•

Signal-to-noise ratio

•

Task (eval task)

•

Toxicity score

From the docs

Evaluate systematically

•

Build datasets

•

File attachment support across models

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.