Ep 245 Tool March 31, 2026 1:55 w/ Justy & Cody

7 Steps to Mastering Memory in Agentic AI Systems MachineLearningMastery

Izzo and Boone dive deep into the seven-step framework for implementing memory in agentic AI systems, exploring why memory is a systems design problem rather than just throwing more context at models. They break down the four types of agent memory, explain the crucial differences between RAG and memory, and get into the architectural decisions around storage, retrieval, and forgetting that make production agents actually useful over time.

Read the source → Plain-text transcript →

Embed this episode

Paste this on any site — the player is a self-contained iframe with no cookies or trackers.

<iframe src="https://sandrise.io/exploring-next/embed/245"
  width="100%" height="180" style="max-width:640px;border:0;border-radius:12px;overflow:hidden"
  title="Exploring Next — Episode 245 audio player"
  loading="lazy" allow="autoplay" referrerpolicy="strict-origin-when-cross-origin"></iframe>

Embed & API docs →

Script Sonnet 4.5 Voice Google TTS

Transcript

Izzo Every AI agent you've used that felt genuinely helpful? It remembered something about you.

Izzo You're listening to Exploring Next, episode two forty-five. I'm Izzo, here with Boone, and today we're talking about something that separates useful AI agents from glorified chatbots — memory systems.

Boone And not just 'make the context window bigger' memory. Real memory architecture.

Izzo Right, because here's what I keep seeing in production — teams build these elaborate agent workflows, ship them, and then wonder why users bounce after the first session.

Boone The agent forgets everything. User asks about their deployment issue on Monday, comes back Tuesday, and has to explain the whole situation again.

Izzo Exactly. So this framework from MachineLearningMastery breaks down seven steps to actually solving this. Boone, what's the core insight here?

Boone Memory isn't a model problem — it's a systems problem. You can't just throw GPT-4 Turbo with 128K context at this and call it solved.

Izzo Why not though? More context seems like it should help.

Boone Because of what they call 'context rot.' When you stuff everything into the context window, the model starts spending attention budget on noise instead of signal. Performance actually degrades.

Izzo Okay, so we need to be selective. What are we actually storing?

Boone Four types. Short-term memory is your context window — everything the model can reason over right now. Think RAM.

Izzo Fast but wiped when the session ends.

Boone Exactly. Then episodic memory — specific past events. Like 'user's deployment failed last Tuesday due to missing environment variable.'

Izzo That's the stuff that makes an agent feel like it actually knows you.

Boone Right. Semantic memory is structured facts — user preferences, domain knowledge. And procedural memory is the workflows and decision rules the agent learns.

Izzo So a customer service agent that knows I prefer concise answers and work in legal — that's semantic memory at work.

Boone Yep. And if it learns to always check dependency conflicts before suggesting library upgrades, that's procedural.

Izzo Now here's where teams get confused — they think RAG solves this. Break that down for me, Boone.

Boone RAG is read-only retrieval for universal knowledge. Your company docs, product catalogs. It's stateless — each query starts fresh.

Izzo Versus memory which is read-write and user-specific.

Boone Exactly. RAG answers 'what's our refund policy?' Memory answers 'what did this customer tell us about their account last month?'

Izzo So RAG for things true for everyone, memory for things true for this user. Most production agents need both.

Boone Right. They run in parallel, each contributing different signals to the final context.

Izzo Okay, so you're designing this memory architecture. What are the key decisions?

Boone Four big ones. What to store, how to store it, how to retrieve it, and crucially — when to forget.

Izzo Don't just dump raw conversation transcripts into a vector database and hope for the best?

Boone That's a recipe for noisy retrieval. You want to distill interactions into structured memory objects — key facts, preferences, action outcomes.

Izzo And storage options? Vector databases for semantic similarity, key-value stores like Redis for fast structured lookup, relational for compliance and auditability, graphs for complex relationships. When would you reach for graph storage? Only after vector plus relational becomes a bottleneck. Graphs are powerful but complex to maintain. What about retrieval strategies? Match the strategy to memory type. Semantic search for episodic memories, structured lookup for profiles. But