Exploring Next

Exploring Next — Ep 358 w/ Justy & Cody — Google AI breakthrough means chatbots use six times less memory during conversations without compromising performance

Google's TurboQuant compresses AI working memory (the KV cache) by up to 6x in real time using two novel techniques — PolarQuant and QJL — without degrading model performance. Justy and Cody dig into what this actually means for inference costs, who benefits first, and why the 'DeepSeek moment' framing is both apt and a little overblown.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →