Exploring Next
Exploring Next — Ep 355 w/ Justy & Cody — Alibaba's HDPO cuts AI agent tool overuse from 98% to 2%
Justy and Cody dig into Alibaba's HDPO and Metis, a training setup that teaches AI agents to stop calling tools by default. Cody likes the core idea because it separates accuracy from efficiency during reinforcement learning, but he questions how portable the benchmark win is. Justy pushes on why this matters for real products right now: users feel latency, teams feel API bills, and nobody wants an agent that opens a toolbox for a task it already knows how to do.