Trace

/trays/A complete record of a single request or interaction in an AI application, containing one or more spans. It captures the full execution flow including inputs, outputs, latency, and metadata. (noun)

Why it matters

In traditional software, a stack trace tells you exactly which function threw an error. In AI applications, the equivalent question is harder because failures are often semantic rather than structural. The model returned a response, but was it the right response? Traces capture the full execution path of a request, including every LLM call, tool invocation, retrieval step, and intermediate output, so you can reconstruct what happened and why. Without traces, debugging an AI application means guessing which part of a multi-step pipeline went wrong. With traces, you can see that the retriever returned irrelevant documents, or that the model ignored its system prompt, or that a tool call timed out and the agent hallucinated an answer instead. Traces also feed the eval loop. When you find a bad trace in production, you can turn it into a dataset record and add it to your eval suite, making sure the same failure is caught automatically in the future.

“The trace showed the model searched the wrong index before answering.”

Customer example

Loom evaluates auto-generated video titles by tracing the full workflow and iterating on custom scoring functions for relevance, conciseness, engagement, and clarity, starting with a small dataset of 10-15 examples before scaling to online evals. Read more

Related Tracing and instrumentation terms

Attachment

•

Instrumentation

•

Span

•

Tool call

•

User feedback

•

Wrap (provider wrap)

From the docs

Instrument your application

•

Trace LLM calls

•

Trace application logic

•

Failed to fetch full trace in data plane

•

Open trace URLs in a specific view mode

Get started with Evals

Braintrust is the AI observability and eval platform for production AI. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.