Exploring Next

Exploring Next — Ep 206 w/ Justy & Cody — Towards a Science of AI Agent Reliability

Title: arXiv Query: search_query=&id_list=2602.16666&start=0&max_results=10 Authors: Stephan Rabanser, Sayash Kapoor, Peter Kirgis, Kangheng Liu, Saiteja Utpala, Arvind Narayanan Abstract: AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →