Exploring Next

Exploring Next — Ep 300 w/ Justy & Cody — How to Fine-Tune a Reasoning Model? A Teacher–Student Cooperation Framework to Synthesize Student-Consistent SFT Data

Episode 300 of Exploring Next digs into TESSY, a teacher-student data synthesis method for fine-tuning reasoning models without wrecking the smaller model’s existing style. The hosts unpack why direct teacher-generated supervised fine-tuning can actually make reasoning models worse, how TESSY alternates teacher-generated capability tokens with student-generated style tokens, and why that matters for anyone trying to ship smaller, cheaper reasoning systems for coding and other structured tasks.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →