Exploring Next
Exploring Next — Ep 439 w/ Justy & Cody — LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards
Justy and Cody unpack LongTraceRL, a paper that trains long-context reasoning models using realistic search-agent distractors and entity-level rubric rewards, with a short look at what would make it shippable.