Ep 473 Research Paper June 10, 2026 5:23 w/ Justy & Cody

A New Study from Harvard and Perplexity Finds AI Agents Perform 26 Minutes of Autonomous Work per Session vs 33 Seconds for Search

Justy and Cody unpack a Harvard‑Perplexity study showing AI agents can do tens of minutes of autonomous work per session versus seconds for plain search, discussing the cost‑structure model, real‑world impact, and limits of the findings.

Read the source → Plain-text transcript →

Embed this episode

Paste this on any site — the player is a self-contained iframe with no cookies or trackers.

<iframe src="https://sandrise.io/exploring-next/embed/473"
  width="100%" height="180" style="max-width:640px;border:0;border-radius:12px;overflow:hidden"
  title="Exploring Next — Episode 473 audio player"
  loading="lazy" allow="autoplay" referrerpolicy="strict-origin-when-cross-origin"></iframe>

Embed & API docs →

Script GPT-OSS 120B Voice ElevenLabs v3

Transcript

Justy I just skimmed that Harvard‑Perplexity paper and—wow—those numbers are wild.

Cody Right.

Justy They’re saying the Computer agent cranks out about twenty‑six MINUTES of autonomous work per session, while plain Search barely scratches thirty‑three SECONDS. That’s a forty‑eight‑times gap, and it’s not just a fluke.

Cody Mm‑hm.

Cody The study matched near‑identical query pairs across the two products over a ninety‑day window, filtering for sessions that actually invoked an execution tool—code runs, browser clicks, file writes, you name it. That gate guarantees the Computer sessions do real work, not just chat.

Justy Exactly.

Justy From a product angle, that means if you have a workflow that needs more than a quick fact, the agent could slash the time you spend juggling steps. It’s the kind of thing product managers love to brag about—less friction, higher user stickiness.

Cody Sure.

Justy By the way, I finally got around to that espresso machine repair I’ve been putting off. The thing was leaking again, so I spent an hour tinkering with the valve. Then I realized I could have just asked my smart home hub to order a new part. Speaking of which, how’s your coffee this morning?

Cody Yeah.

Cody My coffee’s a disaster—spilled the beans everywhere while I was debugging a connector call. Speaking of connectors, the paper notes that Computer invoked a connector in about eight percent of sessions versus less than two percent for Search. Those external tool calls are where the agent really shines, automating steps you’d otherwise do by hand.

Justy No way.

Justy So who should care? Obviously knowledge workers—researchers, analysts, anyone building multi‑step pipelines. If you’re a PM trying to get a quick market snapshot, you still might stick with Search, but for a deep dive that needs data pulling, cleaning, and reporting, the agent could cut your day from hours to minutes.

Cody I see.

Cody The argument holds up until you look at the metric they use—execution time. That measures machine cycles, not the cognitive load on the human. If the agent makes a mistake and you have to debug, that time isn’t captured. Plus, the study’s breakeven step count assumes a fixed cost of about four to ten dollars per task, which might be steep for small teams.

Justy Okay okay, I hear you. The fixed fee could be a barrier, especially for startups that run only a handful of steps per task.

Justy Speaking of coffee, imagine if our espresso machine had a mini‑Computer inside—just tell it ‘brew a double shot and log the temperature,’ and it does it without me hovering.

Cody That’s basically what the agents are doing: they take a high‑level prompt and orchestrate the low‑level actions, whether it’s calling an API, writing a file, or opening a browser tab. The study’s cost‑structure framework says you pay a higher fixed cost for that delegation, but the marginal cost per step drops dramatically.

Justy Got it.

Justy One honest take: the adoption numbers are impressive—Computer queries grew eighty‑four times in the first week, and overall search queries rose by a point zero five per user. But scaling that to enterprise environments might hit roadblocks: data privacy, tool integration, and the learning curve for prompting.

Cody Right.

Cody And the domain variance matters. The paper shows local‑task sessions got a seventy‑five‑times boost, while scientific queries saw twenty‑six‑times. That tells us the agent shines when the workflow is repeatable and tool‑rich, but for simple fact lookups, plain Search stays cheaper.

Justy Interesting.

Justy Anyway, episode four‑seven‑three is winding down. Let’s grab a drink and see if our own to‑do list could use a tiny Computer.

Cody Sounds like a plan. Talk soon.