Exploring Next

Exploring Next — Ep 350 w/ Justy & Cody — You don't need an expensive GPU to run a local LLM that actually works

Cody and Justy examine the claim that you don't need an expensive GPU to run capable local LLMs. Cody opens skeptical about quantization trade-offs and real-world inference speed; Justy pushes back with the actual user story—cost-conscious builders and privacy-first home automation. They dig into what 'works' really means, explore the CPU-only vs. GPU trade-off, and land on a nuanced take: smaller quantized models on mid-range hardware are genuinely usable now, but marketing around this can oversell the experience. Build Next includes testing Ollama on a specific budget GPU and benchmarking a 7B quantized model on a CPU-only rig.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →