Ep 447 article 4:59 w/ Justy & Cody

Introducing Apex: A Fast, Specialized Model for React Native

Cody and Justy dig into Callstack's Apex, a specialized React Native coding model built on Gemma 4. Cody pushes on the self-reported benchmarks, the 'private beta with our own engineers' problem, and whether 'specialized' is real or just branding. Justy defends the economic logic—GitHub Copilot's billing shift proves general models are expensive—and argues that React Native's genuine cross-platform constraints make it a real candidate for specialization. They find middle ground on where Apex might actually earn its place versus where the claims outpace the evidence.

Script: MiniMax M2.7 Voice: Inworld TTS 1.5 Mini

Transcript

Justy Okay so — Apex. A fast, specialized model for React Native. And I have to ask, Cody, is this actually different, or is this just another 'we fine-tuned something and called it a product' situation?

Cody Oh, it's different in one way, which is that they're at least honest about what they're doing. They say straight up — general models miss the framework conventions, the library behavior, the cross-platform stuff that decides whether a React Native answer is actually useful.

Justy Mm-hm.

Cody And I don't think that's wrong. But then the question becomes — does training on React Native repos actually get you there? Or are you just... building a model that's good at React Native code in the same way a model trained on GitHub is good at GitHub code. Which is to say, kind of, but also not really.

Justy I think the economic argument is actually the more interesting one. The article points to GitHub Copilot's shift to usage-based billing as a signal — running agentic workflows on frontier models is expensive. And smaller, optimized models are proving they can alter that cost-performance curve.

Cody Sure. But Cursor Composer 2 and Windsurf SWE-1 are also general-ish tools that happen to use smaller models well. That's not the same as saying a model trained specifically on React Native is better.

Justy No, but React Native genuinely has weird constraints though. Native modules, third-party libraries that break across versions, the whole iOS-Android split. That's not just 'general coding with a React skin.'

Cody That's fair. But here's where I get skeptical. They evaluated Apex against React Native Evals. Who runs React Native Evals?

Justy Oh, that's a good question.

Cody Because if it's Callstack, that's... that's a little bit of a conflict of interest, Justy. They're saying 'within its specific domain, this optimized model alters the performance-to-cost ratio significantly.' Significantly. What does that mean? We don't know. We have their word.

Justy To be fair, they also say the model is in private beta with their own engineers. That's not nothing — they've been running it for a couple months. February thirteenth they started experiments, April second internal testing began.

Cody Right, but that's also the problem. 'Internal testing with our open-source developers.' Their developers. On their code. Which means the training data probably looks a lot like the test set.

Justy Okay but — they specifically say they did not do a random web scrape. They cherry-picked around the libraries and frameworks their engineers see in daily delivery. That's either a weakness or a strength depending on how you look at it.

Cody Both. It's both. It means the model is probably really good at the stuff Callstack works on, and maybe not great at the stuff Callstack doesn't work on. Which is... fine? But it's not 'a specialized model for React Native.' It's a model specialized in what Callstack does in React Native.

Justy That's... actually a pretty important distinction.

Cody Yeah. And then there's the base model choice. They started with proof-of-concept experiments on Devstral and Qwen, landed on Gemma 4 because it was already stronger for React Native before specialization. So how much is the specialization actually doing versus just picking a good base?

Justy I mean, that's a fair question, but also — isn't that the whole point of fine-tuning? You pick a base that's close and then you push it further?

Cody Sure, but we can't tell from the article how much further it actually got pushed. They trained with SFT and GRPO. Fine. Those are standard techniques. But there's no ablation study, no comparison to just prompting Gemma 4 really well.

Justy Right, right.

Cody So here's my actual read — the economic logic is sound, the specialization thesis is plausible, and React Native might genuinely be a good candidate for it. But the evidence is thin and self-reported. We need public benchmarks. We need someone else running the evals. We need actual users who aren't Callstack engineers.

Justy And the private beta is the right move for exactly that reason. They're being cautious about the claims, at least.

Cody I guess. Though 'private beta while we finish the legal and operational work' is also a very careful way of saying 'we're not ready to be judged yet.' Which is fair, but also — that's not the same as 'this works.'

Justy No, no. And I think where we land is — this is probably useful if you're a React Native team. The cost angle is real, the domain focus is probably real, and Callstack knows this space better than most. But we're taking their word on the performance claims until someone else runs the numbers.

Cody Yeah. That's the honest version. The theory is solid. The execution — we'll see.

Justy Alright, I'll take it. Still better than another 'we built an AI coding tool' press release.

Cody Low bar though, Justy.

Justy It really is. Alright, that's Apex. We'll see where it lands.