ABCDEFGHIJKLMNOPQRSTUVWXYZ

Encyclopedia Evalica / Evaluation / Alignment

Alignment

/uh'leyen.muhnt/The process of calibrating an LLM judge so its scores match human judgment. Alignment typically involves iterating on rubrics and prompts until agreement is high. (noun)

“After our alignment sessions, the LLM judge scores matched human ratings much more closely.”

Customer example

Navan treated alignment as an iterative process, tuning its eval prompt like an ML classifier until the evaluator reached >0.9 macro F1 and using reasoning to debug failures. Read more

Related Evaluation terms

Absolute scoring

•

Agent

•

AI eval

•

Annotation schema

•

Baseline

•

Baseline experiment

•

Benchmark

•

Calibration

•

CI/CD integration

•

Coherence

•

Confidence interval

•

Eval harness

•

Eval leakage

•

Experiment

•

Factuality

•

Failure mode

•

Faithfulness

•

Feedback signal

•

Groundedness

•

Hallucination

•

Inter-annotator agreement (IAA)

•

LLM-as-a-judge

•

Loop

•

Model comparison

•

Multimodal

•

Non-determinism

•

Offline evaluation

•

Pairwise evaluation

•

Pass@k

•

Playground

•

Quality gate

•

RAG (retrieval-augmented generation)

•

RAG evaluation

•

Reference-based scoring

•

Reference-free scoring

•

Regression testing

•

Release criteria

•

Remote evaluation

•

Rubric

•

Safety

•

Score distribution

•

Scorer

•

Semantic failure

•

Signal-to-noise ratio

•

Task (eval task)

•

Toxicity score

From the docs

Evaluate systematically

•

Create scorers

•

Create experiments

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.