Encyclopedia Evalica / Tracing and instrumentation / Trace

Trace
/trays/A complete record of a single request or interaction in an AI application, containing one or more spans. It captures the full execution flow including inputs, outputs, latency, and metadata. (noun)
Why it matters
In traditional software, a stack trace tells you exactly which function threw an error. In AI applications, the equivalent question is harder because failures are often semantic rather than structural. The model returned a response, but was it the right response? Traces capture the full execution path of a request, including every LLM call, tool invocation, retrieval step, and intermediate output, so you can reconstruct what happened and why. Without traces, debugging an AI application means guessing which part of a multi-step pipeline went wrong. With traces, you can see that the retriever returned irrelevant documents, or that the model ignored its system prompt, or that a tool call timed out and the agent hallucinated an answer instead. Traces also feed the eval loop. When you find a bad trace in production, you can turn it into a dataset record and add it to your eval suite, making sure the same failure is caught automatically in the future.
“The trace showed the model searched the wrong index before answering.”
Customer example
Loom evaluates auto-generated video titles by tracing the full workflow and iterating on custom scoring functions for relevance, conciseness, engagement, and clarity, starting with a small dataset of 10-15 examples before scaling to online evals. Read more
Related Tracing and instrumentation terms
From the docs
Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.
Start building