Exploring Next

Exploring Next — Ep 339 w/ Justy & Cody — SketchVLM: Vision language models can annotate images to explain thoughts and guide users

In this episode, Justy and Cody dig into SketchVLM, a training-free framework that lets vision-language models explain answers by drawing editable SVG annotations on top of images. They talk through why text-only answers are hard to verify, how SketchVLM uses a draft-and-refine loop plus visual grounding to produce overlays, where it looks production-friendly, and where the trade-offs still show up.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →