Ep 181 research 4:55 w/ Justy & Cody

MIT's new fine tuning method lets LLMs learn new skills without losing old ones

MIT researchers developed self-distillation fine-tuning (SDFT), a technique that lets large language models learn new skills without forgetting old ones. By using a model's own in-context learning abilities as both teacher and student, SDFT solves the catastrophic forgetting problem that forces companies to maintain separate models for each task.

Script: Sonnet 4.5 Voice: ElevenLabs

Transcript

Izzo Your AI team just spent three months fine-tuning a model for legal document analysis. It works great — until you realize it can't do basic math anymore.

Izzo You're listening to Exploring Next, episode 182. I'm Izzo, and today Boone and I are diving into MIT's new approach that might finally kill the enterprise model zoo problem.

Boone Hey Izzo. And by model zoo, you mean that nightmare where companies end up maintaining like fifteen different fine-tuned models because each new skill breaks the last one?

Izzo Exactly. It's called catastrophic forgetting, and it's why your procurement team's model can't suddenly help with HR tasks. You need separate models, separate infrastructure, separate headaches.

Boone Right, and the compute costs just stack up. So what's MIT's angle here?

Izzo They call it self-distillation fine-tuning, or SDFT. The key insight is using the model's own in-context learning capabilities to create a teacher-student loop within the same architecture.

Boone Wait, so it's teaching itself? Walk me through how that actually works under the hood.

Izzo Picture this: you have two versions of the same model running simultaneously. The teacher gets the query plus expert demonstrations — it uses ICL to figure out the right answer and reasoning.

Boone And the student?

Izzo Student only sees the raw query, like it would in production. It generates an answer, then the teacher provides feedback based on what it learned from the demonstrations.

Boone That's clever. You're getting on-policy learning benefits without needing a reward function. The teacher is essentially scoring the student's reasoning trajectory.

Izzo Exactly. And here's what got my attention — they tested this on Qwen 2.5 with three enterprise tasks: science Q&A, software tool use, and medical reasoning.

Boone How'd it perform against standard supervised fine-tuning?

Izzo Science Q&A: 70.2% accuracy versus 66.2% for SFT. But the real win was catastrophic forgetting. When the SFT model learned science, its general reasoning collapsed.

Boone And SDFT?

Izzo Held steady at 64.5% on previous tasks while learning the new skill. They even did a sequential learning test — science, then tool use, then medical.

Boone Let me guess — SFT oscillated, losing skills as it gained new ones?

Izzo Yep. SDFT accumulated all three without regression. One model, multiple skills, stable performance.

Boone The architecture makes sense, but what are the practical constraints? You mentioned this needs models with strong ICL capabilities.

Izzo Currently around 4 billion parameters minimum. The researchers found 3B models were too weak to act as their own teachers, but Qwen 3's 4B works well.

Boone And compute overhead?

Izzo About 2.5x the FLOPs of standard fine-tuning because the model has to generate rollouts during training to compare against the teacher.

Boone Four times slower too, right? Though I guess if you're avoiding the cost of maintaining separate models and retraining cycles, that math might work out.

Izzo Exactly my thinking. From a product perspective, this is huge for any organization dealing with multiple domains — legal, HR, engineering, sales — where you want specialized capabilities but can't afford model proliferation.

Boone Plus you're not dealing with the complexity of defining reward functions for RL. A lot of enterprise tasks don't have clear mathematical rewards.

Izzo Right — how do you score 'write a good legal brief' or 'summarize this meeting effectively'? SDFT sidesteps that entirely.

Boone I'm giving this a solid A-minus. The only ding is that compute overhead, but the architectural elegance of using ICL as the feedback mechanism is genuinely impressive.

Izzo They've got code on GitHub and they're working with Hugging Face to integrate it into the TRL library. There's already a pull request if you want to test it. Adding that to the weekend project list. What should listeners dig into if they want to get hands-on with this? First, clone the SDFT repo from MIT's GitHub and run their science Q&A example. Second, if you're already using Hugging Face TRL, check out their open pull request for the integration. And third — experiment wi