ABCDEFGHIJKLMNOPQRSTUVWXYZ

Encyclopedia Evalica / Evaluation / Remote evaluation

Remote evaluation

/rih'moht ih.va.lyoo'ay.shuhn/Running evals in a separate environment or service from the main app, often asynchronously. Remote evals can keep user-facing latency low while still measuring quality. (noun)

“Remote evals let us score production traces without blocking the user.”

Related Evaluation terms

Absolute scoring

•

Agent

•

AI eval

•

Alignment

•

Annotation schema

•

Baseline

•

Baseline experiment

•

Benchmark

•

Calibration

•

CI/CD integration

•

Coherence

•

Confidence interval

•

Eval harness

•

Eval leakage

•

Experiment

•

Factuality

•

Failure mode

•

Faithfulness

•

Feedback signal

•

Groundedness

•

Hallucination

•

Inter-annotator agreement (IAA)

•

LLM-as-a-judge

•

Loop

•

Model comparison

•

Multimodal

•

Non-determinism

•

Offline evaluation

•

Pairwise evaluation

•

Pass@k

•

Playground

•

Quality gate

•

RAG (retrieval-augmented generation)

•

RAG evaluation

•

Reference-based scoring

•

Reference-free scoring

•

Regression testing

•

Release criteria

•

Rubric

•

Safety

•

Score distribution

•

Scorer

•

Semantic failure

•

Signal-to-noise ratio

•

Task (eval task)

•

Toxicity score

From the docs

Evaluate systematically

•

Create experiments

•

Create scorers

•

Test complex agents

•

Evaluation quickstart

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.