Exploring Next

Exploring Next — Ep 439 w/ Justy & Cody — LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Justy and Cody unpack LongTraceRL, a paper that trains long-context reasoning models using realistic search-agent distractors and entity-level rubric rewards, with a short look at what would make it shippable.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →