Exploring Next
Exploring Next — Ep 379 w/ Justy & Cody — Validating agentic behavior when “correct” isn’t deterministic
GitHub's new validation framework for agentic systems moves beyond brittle, step-by-step testing toward outcome-focused validation. When autonomous agents (like Copilot Coding Agent) interact with real environments, correctness is no longer deterministic—loading screens may appear or vanish, timing shifts, and multiple valid action sequences can succeed. The framework uses dominator analysis and graph-based modeling (Prefix Tree Acceptors) to distinguish between essential outcomes and incidental noise, requiring only 2–10 successful traces to build a ground-truth model. Cody finds the approach clever but questions whether it scales beyond UI automation; Justy sees real market traction in CI/CD reliability and enterprise adoption.