Encyclopedia Evalica / Evaluation / Task (eval task)

Task (eval task)

/task ee.val task/The AI function being tested and scored during an eval. A task can be a single prompt, a chain of prompts, a full agent, or any function that takes an input and produces an output. (noun)

Our task is 'Answer support questions with citations and the right policy.'

Related Evaluation terms

From the docs

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.

Start building