Ep 192 article 5:19 w/ Justy & Cody

openclaw with ollama (Zero cost AI Assistant)

Izzo and Boone explore OpenClaw, an open-source AI assistant framework that runs entirely locally with Ollama. They dig into how it creates zero-cost AI workflows, the agent architecture with workspace management and subagent spawning, and why running your own AI stack locally matters for both privacy and cost control.

Script: Sonnet 4.5 Voice: OpenAI TTS

Transcript

Izzo Your AI bill hit four figures last month.

Izzo You're listening to Exploring Next, episode 193. I'm Izzo, and with me is Boone. Today we're talking OpenClaw with Ollama — a zero-cost AI assistant that runs entirely on your machine.

Boone And by zero-cost, we mean it. No API keys, no per-token charges, no sending your data to someone else's servers.

Izzo Which honestly, Boone, feels like where a lot of people are headed right now. I'm seeing teams that started with ChatGPT API integration now looking at their monthly bills going 'wait, what?'

Boone Right, and it's not just the cost. It's the latency of round trips to external APIs, plus you're basically broadcasting your workflow to OpenAI or whoever.

Izzo So what exactly is OpenClaw doing here? Break this down for me.

Boone OpenClaw is essentially an agent orchestration framework. It runs locally, manages multiple AI agents that can spawn subagents, and handles workspace isolation so different projects don't step on each other.

Izzo And it's talking to Ollama under the hood?

Boone Exactly. Ollama runs the actual language models locally — in this case they're using gpt-oss:20b, which is a 20-billion parameter model. OpenClaw treats Ollama as just another API provider, using the OpenAI-compatible endpoint.

Izzo Smart. So they get the familiar API surface but everything stays local.

Boone The config is pretty elegant actually. They're hitting localhost:11434 with an OpenAI-style completions API, setting costs to zero across the board, and defining a 131K context window with 8K max output tokens.

Izzo Wait, 131K context window? That's... that's huge for a local model.

Boone That's what caught my eye too. Most local setups you're dealing with much smaller windows. This gives you room for substantial document processing or long conversations without losing context.

Izzo Okay but let's talk about the agent stuff. What does 'maxConcurrent: 4' with 'subagents maxConcurrent: 8' actually mean in practice?

Boone So you can have four primary agents running simultaneously, and each of those can spawn up to eight subagents. Think of it like a main thread that can fork worker processes.

Izzo That's... potentially 32 concurrent operations if everyone's maxed out?

Boone Theoretically, yeah. Though that would probably melt most consumer hardware. The nice thing is it's configurable — you tune it based on your machine's capabilities.

Izzo I'm giving the architecture a solid A-minus. The workspace isolation is particularly smart — keeps different projects from contaminating each other's context.

Boone And look at the tools config. Web search is disabled but web fetch is enabled. That's a practical trade-off — you get document retrieval without the complexity of search integration.

Izzo Makes sense from a product standpoint. Web search is where you'd typically hit external APIs anyway, so disabling that keeps the zero-cost promise intact.

Boone The Telegram integration is interesting too. You can interact with your local AI assistant from your phone, but the processing still happens on your local machine.

Izzo That's actually brilliant. Mobile interface, local compute. Best of both worlds.

Boone Installation looks dead simple — curl pipe bash for both OpenClaw and Ollama. Though I'd probably want to inspect those scripts first.

Izzo Always. But assuming they're clean, you could have this running in what, five minutes?

Boone Maybe ten if you include pulling the gpt-oss:20b model. That's gonna be a few gigs.

Izzo Still faster than most enterprise software deployments. And once it's running, no ongoing costs, no rate limits, no data leaving your network.

Boone I'm definitely adding this to the weekend project list. The config flexibility alone makes it worth exploring.

Izzo Alright, build next time. Three concrete things: First, install both OpenClaw and Ollama with those curl commands, then pull the gpt-oss:20b model. Second, experiment with the concurrency settings. Start conservative — maybe maxConcurrent 2 and subagents 4 — then scale up based on your hardware. And third, if you're feeling ambitious, dig into that Telegram integration. Having a local AI assistant you can chat with from anywhere is pretty compelling. Plus you get to see how a