American AI startup Poolside launches free, high performing open model Laguna XS.2 for local agentic coding
Justy and Cody unpack Poolside’s new Laguna XS.2, an Apache 2.0 open model aimed at local agentic coding, plus the bigger Laguna M.1, the pool agent harness, and the shimmer coding environment.
Script: GPT-5.5 Voice: ElevenLabs
Transcript
Justy Your codebase is private, your deadline is real, and the AI helper still wants the internet.
Justy This is Exploring Next, episode 341, for April 29th. Cody’s in my kitchen today, running on airport coffee.
Cody Barely running, but yes. Poolside just released Laguna XS.2, and the timing is interesting because coding agents are moving from novelty into actual developer workflow.
Justy Right. People want the agent to fix tests, wire up features, maybe touch five files, but they also do not want to paste the whole repo into a cloud box.
Cody Poolside launched two Laguna models. M.1 is the bigger proprietary one. XS.2 is the open one, under Apache 2.0, aimed at local agentic coding. They also released pool, a terminal agent harness, and shimmer, a web-based coding environment with mobile-friendly previews.
Justy So the user story is a developer, a small team, maybe an enterprise team with strict data rules, saying: I want Cursor-like help, but closer to my machine.
Cody The architecture is the headline for me. M.1 is a 225 billion parameter mixture-of-experts model with 23 billion active. XS.2 is 33 billion total, but only 3 billion active. That matters because you get capacity without paying full dense-model cost on every token.
Justy And Cody, that is the part normal buyers will squint at. They hear 33 billion and think, okay, do I need a space heater under my desk?
Cody Sometimes yes, but less than you’d expect. Poolside says XS.2 is built for fine-tuning, quantization, and serving on a single GPU. The local pitch is privacy, offline use, and customization.
Justy They also have a very dramatic training stack. Model Factory, Titan, Muon, AutoMixer. It sounds like a kitchen renovation with venture funding. [chuckles]
Cody The names are a lot. The mechanisms are more useful. Muon is their optimizer, and they claim about 15 percent faster learning versus standard methods at 30 trillion tokens. AutoMixer used a swarm of 60 proxy models to find better mixes of code, math, and web data.
Justy And 13 percent synthetic data, right? That feels relevant for coding, because real examples of clean multi-step bug fixes are not exactly falling from the sky.
Cody Exactly. Then they do reinforcement learning in sandboxed software environments. The model tries fixes, runs against signals, and learns agent behavior. That is different from just predicting the next line of code in a static file.
Justy The benchmarks are the part that made me pause. XS.2 hits 44.5 percent on SWE-bench Pro, near M.1 at 46.9. That is weirdly close.
Cody It is close, and it suggests the smaller model got a lot from the bigger training lessons. M.1 scored 72.5 on SWE-bench Verified. XS.2 also beats Claude Haiku 4.5 on SWE-bench Pro in their numbers, but it trails specialized nano models on Terminal-Bench 2.0.
Justy So my honest concern is not whether this is impressive. It is. My concern is adoption friction: setup, memory, storage, and trust. Developers abandon tools fast.
Cody That is fair. Apple Silicon needs around 36 gigs of unified memory. On PC or Linux, standard weights are heavy, but Q4 quantization puts it into the 24 to 32 gig VRAM range. Storage is around 70 gigs full, or 20 to 35 compressed.
Justy Which means this is not every laptop. The budget laptop crowd is out. The serious local-dev crowd, though, now has something free and permissively licensed to test.
Cody And if it works through Ollama or pool cleanly, that lowers the pain. The clever part is making the model natively useful for tool calls and reasoning, not just hoping an open chat model behaves inside an agent loop.
Justy Also, shoutout to shimmer for coding on the go, though if I see you fixing a repo from the airport boarding line, I’m taking your phone. [laughs]
Cody Honestly, that may improve my code quality. For Build Next, start at Poolside’s Laguna XS.2 Hugging Face model card. Try the listed Q4 quantized build through Ollama, or download weights with huggingface-cli into a local sandbox.
Justy For a solo weekend project: point pool, OpenHands, or Continue.dev at a tiny throwaway repo. Give it one bug, one failing test, and no production secrets. See if it can make a clean patch.
Cody If you want a more serious test, borrow the Harbor Framework idea: sandbox execution, fixed tasks, repeatable scoring. Compare Laguna XS.2 against your current coding model on five real issues, not vibes.
Justy That’s it for this one. Keep the repo private, watch the VRAM, and maybe don’t code from the boarding line.