Ep 368 article 7:35 w/ Justy & Cody

The RAG era is ending for agentic AI — a new compilation Stage knowledge layer is what comes next

Pinecone just announced Nexus, a 'knowledge engine' that shifts reasoning from inference time to a compilation stage — meaning agents get pre-built, task-specific knowledge artifacts instead of rediscovering context from scratch every session. Justy and Cody dig into why RAG was never really built for agents, what the architecture actually does, and whether the 98% token reduction claim holds water.

Script: Sonnet 4.6 Voice: ElevenLabs

Transcript

Justy Okay so I keep watching enterprise AI pilots die in the same place — not the model, not the infra, just... the agent keeps re-figuring out the same stuff over and over. And apparently that's like, the actual architectural problem.

Cody Yeah, Pinecone put a number on it — they're claiming 85% of agent compute is going to what they call the re-discovery cycle. Not the task. Just re-learning which data sources exist, which ones are authoritative, how they relate. Every session, cold start, from scratch.

Justy Which is wild when you say it out loud. You're paying for the model to re-read the org chart every time.

Cody And that's kind of the whole Nexus pitch. They're announcing this thing today — they're calling it a knowledge engine, not a retrieval upgrade. The idea is you shift the reasoning work from inference time to a compilation stage that runs before any agent query. You build the context once, store it as a reusable artifact, and the agent just consumes it directly.

Justy Right, and the CEO's quote is basically: RAG was built for humans. One query, one response, a person in the loop to interpret it. Agents don't work that way. Agents are assigned tasks, not questions — and completing a task means assembling context from multiple sources, resolving conflicts, tracking what's already been pulled. RAG just was never designed for that loop.

Cody The governance angle is real too. Non-deterministic results — running the same task twice and getting different answers with no record of which sources drove either — that's not a tuning problem for enterprises, it's a compliance disqualifier. And that's where the architecture gets interesting. The composable retriever returns typed fields with per-field citations and confidence levels, and it does deterministic conflict resolution. When two sources disagree, the system has a

Justy Okay walk me through KnowQL because that's the piece I kept rereading. A declarative query language — for agents, not humans. Six primitives.

Cody Intent, filter, provenance, output shape, confidence, and budget. So an agent can say: I need revenue context, filtered to Q1 contracts, grounded in these specific source tables, returned as structured JSON, with a minimum confidence threshold, and don't spend more than X tokens getting there. All in one interface. Pinecone's comparing it to what SQL did for relational databases — before a standard existed, every app built its own data access layer. Same problem, different la

Justy The SQL analogy is doing a lot of work there. [chuckles] I mean, it took SQL like fifteen years to actually win.

Cody Fair. And the analysts are saying basically the same thing — the concept of moving reasoning upstream isn't new. What's changed is doing it at scale without a dedicated engineering team for every domain. That's the specific claim Nexus is making, and that's also the part that hasn't been validated in production yet. The benchmark number is eyebrow-raising — 2.8 million tokens down to 4,000 on a financial analysis task — but I'd want to know what that task looked like before N

Justy That's my honest read too. And the competitive picture is getting crowded fast — Microsoft FabricIQ, Google's Agentic Data Cloud, hindsight for contextual memory. The analysts are basically saying: stop comparing features, start asking whether your stack gives you cost control, governance control, and security control.

Justy Okay, Build Next. What do you actually do with this today?

Cody Nexus early access is live as of today, so if you're on Pinecone already, that's the obvious first move. But even before you get access, KnowQL's six primitives are worth mapping to an agent task you're already running — intent, filter, provenance, output shape, confidence, budget — just as a design exercise. Where in your current pipeline are you implicitly handling those? That tells you where your architecture is leaking.

Justy And for someone building solo, not running enterprise infra?

Cody Hindsight. It's a standalone contextual memory library — run it against a small RAG prototype you already have. You don't need Nexus to feel the difference between inference-time context assembly and something pre-compiled. That gap becomes really tangible pretty fast when you're the one paying the token bill.

Justy Alright. If nothing else, Cody, I'm never going to stop thinking about agents re-reading the org chart every morning. That image is going to haunt me.

Cody [laughs] Fully deserved. That's just what RAG is doing.

Justy Episode 368, everybody. Or — well, just us. Go look at KnowQL's six primitives and see where your stack is quietly bleeding.