Hermes Unlocks Self Improving AI Agents, Powered by NVIDIA RTX PCs and DGX Spark
Hermes is a rapidly growing, self-improving AI agent framework that runs locally on NVIDIA RTX PCs and DGX Spark, using small but powerful Qwen models to do what previously required data-center scale.
Script: Kimi K2.6 Voice: Deepgram TTS
Transcript
Justy You ever have your cloud AI go down right when you're in the middle of something? Or they move a feature behind a new tier? I'm so done with that.
Cody Same. That's exactly why the local agent thing stopped feeling like a hobby project. There's this framework called Hermes — crossed a hundred and forty thousand GitHub stars in three months.
Justy Hermes. I've seen the logo, I think. What's the actual pitch? Like, my notes app already has an AI thing.
Cody So the big difference is it's built to improve itself. Every time it hits a weird task or gets feedback, it writes a new skill and keeps it. It's not just calling an API and forgetting what happened.
Justy Right, right.
Cody And it runs these little contained sub-agents for specific jobs, so one doesn't get confused by what the other is doing. Nous Research curates all the plugins too, so it doesn't break every time you sneeze at it.
Justy Nous — that's the same crew behind some of the open weight model releases, yeah?
Cody Yeah. So there's pedigree. The other thing is, put the same model in Hermes versus another framework and Hermes returns better results. It's doing active orchestration, not just wrapping a chat call.
Justy Okay, but what model even runs this locally? I don't have a data center.
Cody Qwen 3.6. Twenty-seven billion or thirty-five billion parameters. The thirty-five B runs in about twenty gigs of memory and it's outperforming their old hundred-twenty-billion model. The twenty-seven B is supposedly matching stuff that used to need four hundred billion parameters and seventy-plus gigs.
Justy That's wild. So my RTX can actually do this without catching fire.
Cody That's the whole point. RTX PCs, RTX Pro workstations, DGX Spark — Tensor Cores matter here because it's not just one prompt. It's multistep tasks, refining skills, running while you sleep. You want throughput and you want it local.
Justy Who's this for, though? Like, am I shipping a product with this or is this a power user weekend thing?
Cody Right now it's mostly developers and serious prosumers. The barrier is you still have to manage the stack — local model server, the agent loop, maybe some Docker stuff. It's not 'install and double-click' yet.
Justy So it's still rough.
Cody A little. But the 'always on' part is real. You set it to watch a folder, or your messages, or whatever — it just keeps going. The self-improving part means two weeks in it's doing things you didn't explicitly teach it.
Justy That's either exciting or terrifying.
Cody Both. I tried pointing it at a messy downloads folder and told it to organize by project. First pass was okay. Third pass it had written a skill that checked file contents, not just extensions.
Justy Okay, that is actually useful.
Cody And it was running on a single GPU while I was also using the machine.
Justy What about the model selection? Do I need to be a Qwen stan or can I swap?
Cody It's provider and model agnostic by design. Qwen 3.6 is just the one they optimized for because the efficiency is so good. But you could point it elsewhere.
Justy Alright. If someone's gonna try this, like this weekend, what do they actually do?
Cody Grab the Hermes repo, get Qwen 3.6 running through Ollama or llama.cpp on your RTX if you have one. Then pick one boring repetitive task and let it try to build a skill for it. Don't ask it to do ten things. One thing.
Justy One thing. I like that. Cody, I might actually try this.
Cody Send me a screenshot when it inevitably moves your important PDFs to a folder named 'misc'.
Justy I will blame you directly.