Exploring Next

Exploring Next — Ep 470 w/ Justy & Cody — FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

Researchers propose Lookahead Sparse Attention (LSA) with a Neural Memory Indexer to slash GPU memory usage for ultra-long LLM context by pre-predicting which KV cache chunks matter, trained independently without the full backbone. FlashMemory-DeepSeek-V4 cuts physical KV cache to 13.5% of baseline on average while maintaining or improving accuracy (+0.6% abs) across LongBench-v2, LongMemEval, RULER—at 500K tokens, it suppresses KV overhead by over 90%. Project paused due to org changes; code not yet public.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →