Ep 458 News June 4, 2026 2:27 w/ Justy & Cody

Google's new open source Gemma 4 12B analyzes audio, video — and runs entirely locally on a typical 16GB enterprise laptop

Justy and Cody debate whether Google's new Gemma 4 12B—an 11.95B-parameter model that runs locally on 16GB laptops with encoder-free multimodal processing—is a genuine breakthrough for edge AI or just a cleverly marketed niche tool. They clash on the practical trade-offs: Cody questions the real-world performance and fine-tuning complexity, while Justy highlights the enterprise use cases where offline, private inference is non-negotiable. They land on it being a specialized win for specific scenarios, not a universal replacement.

Read the source → Plain-text transcript →

Embed this episode

Paste this on any site — the player is a self-contained iframe with no cookies or trackers.

<iframe src="https://sandrise.io/exploring-next/embed/458"
  width="100%" height="180" style="max-width:640px;border:0;border-radius:12px;overflow:hidden"
  title="Exploring Next — Episode 458 audio player"
  loading="lazy" allow="autoplay" referrerpolicy="strict-origin-when-cross-origin"></iframe>

Embed & API docs →

Script Mistral Medium 3.5 128B Voice Hume TTS

Transcript

Justy Okay, so Google drops Gemma 4 12B and the pitch is it does audio and video analysis entirely on a 16 gig laptop.

Cody Sure, and I’ll believe that when I see it actually running on my ThinkPad without turning into a space heater.

Justy It’s open weights, Apache two point oh license, and they’re saying encoder-free architecture lets it handle raw audio and visual patches directly in the LLM.

Cody Right. So instead of a proper encoder, they’re just shoving waveforms and image patches through a linear layer and calling it a day.

Justy They’re framing it as a breakthrough for edge cases—offline use, security, compliance.

Cody Mm-hm. And I’m saying that sounds great until you realize the encoder’s job isn’t just overhead—it’s where a lot of the actual understanding happens.

Justy But if you’re in healthcare or finance, Cody, you CAN’T send data to the cloud. Period. So a model that’s good enough and local is… kind of everything.

Cody Yeah, and I get that. But is it good enough? Nearing their 26B MoE benchmarks sounds suspiciously like ‘almost as good’.

Justy They’ve got a 256K token context window. You could throw an entire earnings call transcript at this thing and ask it to summarize risks.

Cody Right, right. And it’s got that step-by-step reasoning mode and tool-use baked in. But fine-tuning a unified pipeline? That’s gonna be a nightmare.

Justy I mean, they’re not pitching this as a fine-tune-first model. It’s for people who want to run multimodal tasks NOW, offline, on hardware they already have.

Cody Okay okay. But let’s not pretend this is a slam dunk for most teams. If your use case doesn’t local, you’re probably better off with something bigger and cloud-hosted.

Justy So we’re saying it’s a niche win, not a revolution.

Cody Exactly. A really clever niche win.

Justy There you go. Cody just said something NICE about new tech. I need to mark this on the calendar.

Cody Don’t get used to it. This is still a model that’s trading a lot of accuracy for portability.

Justy And for the right enterprise, that’s a trade worth making. Anyway—

Cody Also, Justy, my ThinkPad turn into a space heater.

Justy Fair. Okay, forty-five-eight in the books. I’m gonna go see if this thing actually installs in under ten minutes.