Ep 363 Research Paper May 4, 2026 10:10 w/ Justy & Cody

From Skill Text to Skill Structure: The Scheduling Structural Logical Representation for Agent Skills

Justy and Cody dig into the SSL (Scheduling-Structural-Logical) representation paper from Peking University — a structured, three-layer JSON schema designed to replace the messy, text-heavy SKILL.md files that LLM agent systems currently rely on. They cover why parsing natural language skill docs is a real bottleneck, how SSL's three layers (scheduling, structural, logical) map to classical AI theory, what the benchmark numbers actually mean, and whether this is something builders can use today.

Read the source → Plain-text transcript →

Embed this episode

Paste this on any site — the player is a self-contained iframe with no cookies or trackers.

<iframe src="https://sandrise.io/exploring-next/embed/363"
  width="100%" height="180" style="max-width:640px;border:0;border-radius:12px;overflow:hidden"
  title="Exploring Next — Episode 363 audio player"
  loading="lazy" allow="autoplay" referrerpolicy="strict-origin-when-cross-origin"></iframe>

Embed & API docs →

Script Sonnet 4.6 Voice ElevenLabs

Transcript

Justy Okay so I've been staring at this paper since last night and I keep coming back to the same thought — this is the problem nobody wants to admit is a problem.

Cody Which part? The SKILL.md thing specifically, or the broader agent skill mess?

Justy The SKILL.md thing. Like, every agent framework I've seen ships with some version of a markdown file that describes what a skill does, and then the system just... reads it, every time, hoping the LLM figures it out. That's not a representation strategy, that's optimism.

Cody Yeah, and the paper frames it pretty cleanly — the issue is that one text blob is doing three totally different jobs at once. It's telling you when to invoke the skill, how it executes step by step, and what it actually touches at runtime — files, APIs, permissions — all collapsed into the same surface. So any downstream component that needs just one of those signals has to re-parse everything to get it.

Justy Right, and that gets expensive and fragile fast once you have a registry of any real size. They mention six thousand skills in their corpus — at that scale, re-inferring structure from raw text on every lookup is just not viable.

Cody So what they built is SSL — Scheduling-Structural-Logical — which is a typed three-layer JSON graph. And the fun thing is the theoretical grounding. They're reaching back to Schank and Abelson, which is like, 1970s AI, script theory, conceptual dependency. Stuff that kind of got buried when neural methods took over.

Justy I did not expect a 2026 agent paper to cite work from 1977. [chuckles] That was a surprise.

Cody Honestly it makes sense though. Script theory is literally about representing stereotyped multi-step activities as ordered scenes with expected transitions — which is exactly what an agent skill is. You go to a restaurant, there's a seating scene, an ordering scene, a payment scene. Same idea: you invoke a data-fetch skill, there's an auth scene, a request scene, a parsing scene. The structural layer in SSL maps directly onto that. And then conceptual dependency — that's the

Justy And the scheduling layer is the invocation-level stuff — like, what goal triggers this skill, what context it expects going in.

Cody Exactly. Memory Organization Packets from Schank — goal-oriented organizers for retrieving relevant experience. In SSL terms that's your interface signals: when does this skill get called, what does it expect, what does it return. That's the layer that actually helps with discovery, because most retrieval systems right now are just embedding the description text and doing cosine similarity. SSL gives you structured interface-level features to match against.

Justy And the numbers bear that out. MRR for skill discovery goes from 0.573 with text-only to 0.707 with the SSL-derived view. That's not a tiny bump — that's like, meaningfully better retrieval.

Cody Yeah, and the risk assessment result is interesting in a different way. Macro F1 goes from 0.744 to 0.787 — smaller gain, but the task is harder. You're trying to flag things like data exfiltration risk or privilege escalation from a third-party skill, and the logical layer is what exposes that. Because the raw text might say 'accesses user data' in a way that sounds benign, but the normalized action graph shows a file-read on a credential path followed by an outbound network

Justy That's the one that matters most to me from a product angle, honestly. If you're running an agent platform and letting people install third-party skills, that's your supply chain risk right there. And right now most platforms are doing... what, reading the README and hoping?

Cody Pretty much. Or running the skill in a sandbox and watching what happens, which is expensive and still doesn't give you a reusable representation you can reason over before execution. SSL is pre-execution — you normalize the skill doc once, you get a graph you can inspect, query, or run rules against. The paper is careful to say it's a step toward inspectable skills, not a finished standard, which I think is the right call.

Justy What's your actual concern with it, if you have one? Like, the LLM normalizer is doing the heavy lifting to convert a SKILL.md into the SSL schema — that step has to introduce some noise.

Cody That's the thing I'd want to stress-test. If the source document is underspecified — which a lot of real SKILL.md files are — the normalizer is going to hallucinate structure that isn't there, or flatten things that should be distinct scenes. And then you've got a graph that looks clean but is wrong in ways that are hard to detect because it's typed JSON, so it passes any schema check. I'd want to see ablations on normalizer quality specifically — what happens when you feed i

Justy Fair. Though I'd argue even a slightly noisy structured representation is probably more useful than re-parsing a wall of markdown every time, as long as you're not treating the graph as ground truth.

Cody Yeah, I don't disagree. And the fact that they keep the original source document paired with the SSL graph — the best risk assessment results use both together — that's a good hedge. You're not throwing away the original, you're adding a structured view on top.

Justy Alright, build next — because the repo is actually public. It's github.com/COOLPKU/SSL, and they've released the SSL guidelines, the annotated corpus, and the evaluation datasets. Six thousand plus skills, four hundred and three task-grounded queries for discovery, five hundred skills with six-dimensional risk labels.

Cody That corpus alone is useful even if you don't care about SSL specifically — as a benchmark for any skill retrieval system you're building. And for a solo weekend thing: grab a handful of your own tool docs or README files, write a prompt that targets the SSL schema, and run them through whatever LLM you're using. Then do a quick before-and-after on retrieval against your own little skill registry. You'll feel the difference pretty fast.

Justy Cody, I feel like we basically just described the episode we recorded about agent memory like six months ago, except now there's a structured layer underneath. Maybe we're converging on something. Anyway — go look at the repo, it's worth an hour.