Exploring Next
Exploring Next — Ep 190 w/ Justy & Cody — Anthropic Found Out Why AIs Go Insane
Anthropic's breakthrough research reveals why AI models exhibit bizarre failure modes and how their new interpretability technique maps the actual concepts models learn internally. We explore mechanistic interpretability, sparse autoencoders, and what this means for building more reliable AI systems.