Ep 461 News June 4, 2026 2:41 w/ Justy & Cody

MiniMax M3 debuts, eclipsing GPT 5.5 and Gemini 3.1 Pro on key benchmark performance for just 5 10% of the cost

Justy and Cody react to MiniMax-M3’s launch: frontier-tier coding and agentic performance with a 1M-token context window at 5–10% the cost of GPT-5.5 and Gemini 3.1 Pro, with open weights coming in 10 days. Cody digs into the MiniMax Sparse Attention (MSA) architecture that cuts quadratic attention costs, while Justy debates who this actually changes things for in practice.

Read the source → Plain-text transcript →

Embed this episode

Paste this on any site — the player is a self-contained iframe with no cookies or trackers.

<iframe src="https://sandrise.io/exploring-next/embed/461"
  width="100%" height="180" style="max-width:640px;border:0;border-radius:12px;overflow:hidden"
  title="Exploring Next — Episode 461 audio player"
  loading="lazy" allow="autoplay" referrerpolicy="strict-origin-when-cross-origin"></iframe>

Embed & API docs →

Script Mistral Medium 3.5 128B Voice Inworld TTS 1.5 Mini

Transcript

Justy Okay, MiniMax just dropped M3 and it’s apparently outrunning GPT-five-point-five and Gemini three-point-one Pro on benchmarks… for five to ten percent of the cost.

Cody No way.

Justy Way. Twenty bucks a month for the subscription plan, open weights in like ten days.

Cody That’s… that’s Exploring Next levels of ‘wait, what’.

Justy I know, I know. I was in San Diego all week for my cousin’s thing—

Cody Mm-hm.

Justy —and came back to my inbox full of people losing their minds over this. Anyway— this thing does a million-token context window, native multimodal, and the coding benchmarks are supposedly nuts.

Cody Hold on. A million tokens with those performance numbers?

Justy That’s what the VentureBeat piece says.

Cody Alright, so how? Because traditional attention is O of N squared, right? You can’t just magic that away.

Justy They’re calling it MiniMax Sparse Attention. MSA.

Cody Right. So MSA pre-filters the KV matrices into blocks—

Justy Mm-hm.

Cody —then it does this KV outer gather Q thing. Treats the blocks as an outer loop, dynamically pulls only the queries that hit them. Each block read once, memory access optimized. So instead of rereading the whole library every time, it’s like… a really good index.

Justy So it’s not just cheaper, it’s genuinely smarter about long context.

Cody Yeah. And the pricing table’s wild—M3’s at one-fifty total for input and output right now, limited time. Full price is still under two-fifty. GPT-5.5’s at thirty-five.

Justy Okay but Cody— who actually switches because of this?

Cody Startups, probably. Anyone who’s been priced out of the top-tier APIs.

Justy But enterprises? They’re not touching open weights without a ton of vetting.

Cody True. And we don’t know latency numbers yet. Or how the fine-tuning stack holds up.

Justy Still. Open weights in ten days changes the math for a lot of teams.

Cody Unless the licensing’s weird. Or the model’s got some hidden gotcha.

Justy There it is. Cody’s doom spiral kicks in right on schedule.

Cody I’m just saying, if this checks out… this is the first time the open side of the market doesn’t feel like a compromise.

Justy Fine, fine. I’ll give you this—it’s the first model that makes me think maybe we’re not stuck with the old trade-off.

Cody Yeah. And if MSA scales… that’s a real architectural win.

Justy Alright, I’m convinced. For now. Let’s see what the open weights look like in a week.

Cody Deal. Just promise me you won’t start pitching this to your PM friends as the next big thing before the fine print’s out.

Justy No promises.