ABCDEFGHIJKLMNOPQRSTUVWXYZ

Encyclopedia Evalica / Evaluation / Absolute scoring

Absolute scoring

/'a.bsuh.loot 'skaw.rihng/Assigning a numeric or categorical score to a single output on a fixed scale. This is distinct from pairwise evals, which compares two outputs against each other. (noun)

“We switched to absolute scoring so we could set a consistent pass threshold.”

Related Evaluation terms

Agent

•

AI eval

•

Alignment

•

Annotation schema

•

Baseline

•

Baseline experiment

•

Benchmark

•

Calibration

•

CI/CD integration

•

Coherence

•

Confidence interval

•

Eval harness

•

Eval leakage

•

Experiment

•

Factuality

•

Failure mode

•

Faithfulness

•

Feedback signal

•

Groundedness

•

Hallucination

•

Inter-annotator agreement (IAA)

•

LLM-as-a-judge

•

Loop

•

Model comparison

•

Multimodal

•

Non-determinism

•

Offline evaluation

•

Pairwise evaluation

•

Pass@k

•

Playground

•

Quality gate

•

RAG (retrieval-augmented generation)

•

RAG evaluation

•

Reference-based scoring

•

Reference-free scoring

•

Regression testing

•

Release criteria

•

Remote evaluation

•

Rubric

•

Safety

•

Score distribution

•

Scorer

•

Semantic failure

•

Signal-to-noise ratio

•

Task (eval task)

•

Toxicity score

From the docs

Evaluate systematically

•

Create experiments

•

Create scorers

•

How multi-reviewer scoring works in human review

•

Configure root span filter for online scoring

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.

Start building

Manifesto

Adversarial examples →