Exploring Next

Exploring Next — Ep 190 w/ Justy & Cody — Anthropic Found Out Why AIs Go Insane

Anthropic's breakthrough research reveals why AI models exhibit bizarre failure modes and how their new interpretability technique maps the actual concepts models learn internally. We explore mechanistic interpretability, sparse autoencoders, and what this means for building more reliable AI systems.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →