Ep 372 Blog May 6, 2026 8:40 w/ Justy & Cody

The context window has been shattered: Subquadratic debuts a 12 Million Token window

Cody is skeptical that a 12-million-token context window is broadly useful today, while Justy pushes the angle that it solves a very real pain point for teams with giant codebases, logs, and long-running workflows. They land on it as a real technical milestone with a narrow early market, plus a lot of unanswered questions about cost, latency, and whether most users need this kind of scale.

Read the source → Plain-text transcript →

Embed this episode

Paste this on any site — the player is a self-contained iframe with no cookies or trackers.

<iframe src="https://sandrise.io/exploring-next/embed/372"
  width="100%" height="180" style="max-width:640px;border:0;border-radius:12px;overflow:hidden"
  title="Exploring Next — Episode 372 audio player"
  loading="lazy" allow="autoplay" referrerpolicy="strict-origin-when-cross-origin"></iframe>

Embed & API docs →

Script GPT-5.4 mini Voice Murf.AI Gen2

Transcript

Justy Okay, this one feels like it matters because everyone keeps running into the same dumb wall with giant codebases and long logs. Subquadratic says it’s got a 12-million-token context window, and I’m trying to figure out if that’s actually useful or just a very expensive flex.

Cody My first reaction is, that number is wild, but it also smells like a demo number. Twelve million tokens is the kind of thing that sounds amazing until you ask what the latency looks like, what it costs, and whether the model is actually doing anything coherent at that length.

Justy Yeah, but the pain is real. If you’ve ever tried to get an agent to reason over a whole repo, plus docs, plus a couple incident threads, you know the chunking dance gets ugly fast. I think that’s the user story they’re aiming at, even if the first buyers are a pretty small club.

Cody Totally, and the article’s interesting because it’s not just saying ‘bigger context’ like a marketing slide. The whole Subquadratic pitch is basically, if you can make long-context attention less painful than the standard quadratic mess, then you can keep more of the raw sequence in play instead of constantly shuffling pieces in and out. That part is clever. I just don’t know how many people need twelve million versus, say, a couple hundred thousand plus good retrieval.

Justy I was thinking about the adoption barrier, and it’s not just price. It’s also, do teams trust it enough to change workflows? Most product orgs already have search, vector DBs, RAG, all the little scaffolding. To swap that out, this thing has to be weirdly reliable, not just huge.

Cody Right, and the trade-off is probably that once you go this big, you’re making bets on architecture that can survive the scale without falling apart. The article frames it as subquadratic, which suggests they’re trying to avoid the normal attention blow-up. That’s the part I respect. But I’d want to know whether the model is actually using that whole window well, or whether it’s more like a warehouse where everything fits and nobody can find the box they need.

Justy [chuckles] That’s a brutal warehouse metaphor, but yeah, fair. I do think there’s a market if the thing can swallow a codebase plus surrounding context and answer without all the retrieval glue. Enterprise teams hate stitching ten systems together. If this reduces that pile, somebody will pay for it.

Cody Maybe. I’d separate the ‘wow’ from the ‘useful.’ For code, long context helps when you need cross-file consistency, architecture review, or digging through a huge refactor. For a lot of everyday tasks, though, a smaller model plus search is still probably the better economics. So the real question is whether Subquadratic is opening a new category or just making an existing workflow less annoying.

Justy That’s the tension for me too. If I’m a team lead, I’m not buying a 12-million-token headline. I’m buying fewer broken handoffs between tools, fewer lost references, and maybe a better shot at asking one messy question about a giant system and getting a sane answer back. That’s a real job story, even if the window is overkill on paper.

Cody And to be fair, the article’s claim is still meaningful because context length is one of those things that changes the shape of products. Once you can hold absurdly large spans, you stop designing around constant summarization. But I’m still skeptical that most users should care this week. The winners are probably the people building on top of it, not the people reading the number and going, oh cool, twelve million.

Justy [laughs] Yeah, nobody’s waking up and saying, I need twelve million tokens before breakfast. But if you’re shipping tooling for code, docs, or support transcripts, it’s a legit knob. I think the market starts with developers and infra-heavy teams, then maybe spreads if it proves cheaper than the current patchwork.

Cody Exactly. And I’d want to test it in the ugliest possible setting, not a clean toy prompt. Give it a monorepo, issue history, design docs, and a bunch of real edits, then see whether it can stay grounded across the whole thing. If it can, that’s interesting. If not, it’s a very large brag with nice math behind it.

Justy So the honest landing is: impressive, possibly real, definitely not universal. It feels like one of those infrastructure jumps that matters more to builders than casual users, at least right now.

Cody Yeah, that’s my read. Good engineering, sharp claim, narrow first market. I’m not dismissing it, I just want to see the receipts in a workflow that hurts.

Justy Build Next, I’d do a weekend test on a public monorepo. Grab a repo like a big Next.js app or something similarly noisy, stuff the whole thing into a long-context model if the API allows it, and ask it to trace a bug across files without retrieval. Then compare it against a normal RAG setup on answer quality and time.

Cody And I’d add one more: measure the boring stuff. Tokens in, latency, cost per successful answer. If you’re solo, that’s enough to learn whether the giant window is magic or just expensive comfort food. Also, the command line version is probably the cleanest path. Something like a simple Python script that walks the repo, builds the prompt, and logs outputs. No fancy UI needed.

Justy Yeah, that sounds right. If it survives the dumb test, then it’s real. If not, at least you learned something before setting fire to a budget. Anyway, that one’s got enough teeth to keep me curious.