Exploring Next

Exploring Next — Ep 201 w/ Justy & Cody — Improving Deep Agents with harness engineering

LangChain improved their coding agent from Top 30 to Top 5 on Terminal Bench 2.0 by only changing the harness - the system that wraps around the model. They used trace analysis to identify failure patterns and implemented targeted fixes like self-verification loops, context injection, and reasoning budget optimization. The 13.7 point improvement shows how much performance gains come from better tooling around models, not just bigger models.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →