Exploring Next

Exploring Next — Ep 310 w/ Justy & Cody — LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Justy and Cody dig into LeWorldModel, a pixel-to-latent world model that tries to make JEPA training boring in the best way. The paper’s claim is simple but pretty important: you can jointly train the encoder and dynamics model from raw pixels without EMA tricks, stop-gradient, pretraining, rewards, or reconstruction, and still avoid collapse. They unpack the Gaussian latent regularizer, the autoregressive next-embedding prediction setup, and why a 15M-parameter model that runs on one GPU could matter more for builders than a flashier giant model.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →