Exploring Next

Exploring Next — Ep 340 w/ Justy & Cody — Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing

Justy and Cody dig into Stochastic KV Routing, a paper on cutting transformer KV cache memory by sharing caches across layers instead of only squeezing along the token axis. They unpack random cross-layer attention, why it helps models tolerate missing per-layer caches, and where this could matter in real serving stacks.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →