OpenAI upgrades its Responses API to support agent skills and a complete terminal shell
OpenAI's major Responses API upgrade introduces Server-side Compaction for persistent agent memory, hosted shell containers with full terminal environments, and support for the universal Skills standard - transforming AI agents from forgetful assistants into reliable, long-running digital workers.
Script: Sonnet 4.5 Voice: ElevenLabs
Transcript
Izzo Your AI agent just crashed again because it hit the token limit and forgot what it was doing for the past two hours.
Izzo You're listening to Exploring Next, episode one-hundred-eighty-one. I'm Izzo, and today Boone and I are diving into OpenAI's massive Responses API upgrade that might finally solve the agent memory problem.
Boone The context amnesia problem, as they're calling it. And honestly, this feels like the moment agents stop being demos and start being actual digital workers.
Izzo Right? Because here's what's been driving everyone crazy: you build this sophisticated agent, it's doing great work, making API calls, running scripts, and then boom – it hits the token limit and suddenly has no idea what it was working on.
Boone It's like training a marathon runner who can only remember the last thirty seconds. You'd get maybe a few dozen interactions before the model starts hallucinating or repeating itself because it lost the thread entirely.
Izzo So let's break down what OpenAI actually shipped here, because this isn't just an incremental update. We're talking about three major pieces: Server-side Compaction, hosted shell containers, and the Skills standard.
Boone Server-side Compaction is the big one for me. Instead of just truncating conversation history when you hit token limits, the system can now compress the agent's past actions into what they call a 'compressed state.'
Izzo And the numbers are wild – Triple Whale's agent Moby successfully handled a session with five million tokens and one-fifty tool calls without any drop in accuracy. That's not a demo, that's production-grade persistence.
Boone The technical approach here is clever. Rather than dumping the oldest context, it's essentially letting the model summarize its own reasoning into a more compact form while keeping the essential decision-making context alive.
Izzo Which transforms the whole user story, right? Instead of babysitting an agent through a complex workflow, you can actually hand off a multi-hour task and trust it to maintain state throughout.
Boone Now the hosted shell piece – this is where OpenAI is really making a play for the infrastructure layer. They're giving each agent its own Debian 12 container with Python 3.11, Node 22, Java 17, Go 1.23, Ruby 3.1.
Izzo Plus persistent storage through slash-mount-slash-data and full networking capabilities. So your agent can install libraries, hit external APIs, generate artifacts, and actually save its work between sessions.
Boone This removes so much scaffolding work for developers. Before this, if you wanted an agent to run code safely, you had to build your own sandboxing, handle state persistence, manage execution environments. Now OpenAI is basically saying 'give us the instructions, we'll provide the computer.'
Izzo The Skills standard might be the most interesting piece long-term though. Both OpenAI and Anthropic have converged on the same SKILL.md manifest format with YAML frontmatter.
Boone And it's actually working across platforms. OpenClaw – that new open-source agent – adopted the exact same standard and can now use skills originally built for Claude. ClawHub has over three thousand community-built extensions.
Izzo That's the network effect kicking in. When you can write a skill once and deploy it across OpenAI, Anthropic, local Llama instances, suddenly you're building portable, versioned assets instead of vendor-locked features.
Boone Though I'm curious about the architectural differences between OpenAI and Anthropic's approaches. OpenAI seems focused on this 'programmable substrate' idea – bundling the shell, memory, and skills into their API.
Izzo While Anthropic is more about the expertise marketplace with pre-packaged partner integrations from Atlassian, Figma, Stripe. Different strategies for the same open standard.
Boone Glean reported their tool accuracy jumped from seventy-three percent to eighty-five percent using OpenAI's Skills framework. That's a meaningful improvement, not just marketing fluff.
Izzo From a product perspective, this feels like the end of the 'bespoke infrastructure' era for agents. The challenge isn't 'how do I give this agent a terminal' anymore – it's 'which skills are authorized for which users.'
Boone Security is going to be huge here though. Domain Secrets and Org Allowlists help, but giving an AI model shell access and networking? SecOps teams are probably having some conversations right about now.
Izzo Especially as skills become easier to deploy. You need to watch out for malicious skills that could introduce prompt injection or unauthorized data paths. The ease of deployment cuts both ways.
Boone But the fundamental shift is real – we're moving from 'AI as a chatbot' to 'AI as a persistent system process.' That's a completely different paradigm for how you architect applications.
Izzo For Build Next, if you want to get hands-on with this stuff – first, check out the agentskills.io specification. It's the open standard both companies are using, and you can start writing portable skills today.
Boone Second, clone the OpenClaw repo from GitHub. It supports multiple models and you can experiment with the Skills format without needing OpenAI credits. Plus all those ClawHub extensions work out of the box.
Izzo And if you're already using OpenAI's API, upgrade to the latest Responses API version and try the container_auto option for the Shell Tool. Build something that needs persistent state – maybe a data analysis agent that works across multiple sessions.
Boone I'm definitely adding that to the weekend project list. A persistent agent that can actually finish what it starts? That's been the missing piece.
Izzo The memory problem that's plagued agents since day one might actually be solved. And with portable skills becoming the norm, we're looking at a real ecosystem forming around agent capabilities. This is Exploring Next – the infrastructure just got a lot more interesting.