Exploring Next

Exploring Next — Ep 484 w/ Justy & Cody — End-to-End Context Compression at Scale

Justy and Cody dig into Latent Context Language Models (LCLMs) — encoder-decoder compressors that shrink long prompts into short latent sequences, cutting memory and latency at ratios up to 1:16 while staying competitive on accuracy. They cover the architecture search, the training recipe, the agent use-case, and what production deployment actually looks like.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →