Harrison Chase
@hwchase17 · 1w agotraces matter!
Viv
@Vtrivedy10 · 1w agostandard awesome Anthropic blog, I cannot stress the importance of this step enough —> pls look at the agent traces!!!
TONS of value in understanding if an agent “failed a task” then:
1. how did it fail? (formatting error, logical error, went down wrong track in planning, environment bug…)
2. did it actually fail? should you update your grading if it was wrong?
3. what were successful patterns vs failing patterns (stratify across pass and fail and see the steps)
4. link it back to your instructions and tools/skills. did you not give the right tool to do the task? Are instructions ambiguous?
Agent building is a deeply iterative process and this is literally the most golden data you can get to hill climb on, the list of things to look at is massive
at LangChain when we want to make Deep Agents better, we mine the traces with an agent guiding us through what happened. Traces are huge, let agents help you!!
there’s tons of human intuitions that don’t get baked into the agent in your first pass of designing them, you’ll learn stuff about the agent as you observe it and it will get better as you tune the knobs (prompts, tools/skills, harness design like context management etc)
there’s a beautiful loop between designing agents and making them better systematically, agent design and eval design are deeply coupled. it’s great to see more people getting into it
if you’re into building agents or making them observable so you can make them better, always down to talk 👀
0
3
45
15K
7
1
39
23
13








