Exploring Next
Exploring Next — Ep 358 w/ Justy & Cody — Google AI breakthrough means chatbots use six times less memory during conversations without compromising performance
Google's TurboQuant compresses AI working memory (the KV cache) by up to 6x in real time using two novel techniques — PolarQuant and QJL — without degrading model performance. Justy and Cody dig into what this actually means for inference costs, who benefits first, and why the 'DeepSeek moment' framing is both apt and a little overblown.