Ep 31 article 0:51 w/ Justy & Cody

Let’s Build the GPT Tokenizer: A Complete Guide to Tokenization in LLMs – fast

18 months ago, Andrej Karpathy set a challenge : “Can you take my 2h13m tokenizer video and translate the video into the format of a book chapter”. We’ve done it, and the chapter is below, including key pieces of code inlined, and images from the video at key points (hyperlinked to the video timestamp).

Voice: OpenAI TTS

Transcript

Host A Welcome back to Exploring Next! Today we're looking at Let’s Build the GPT Tokenizer: A Complete Guide to Tokenization in LLMs – fast.ai.

Host B Yeah, this one caught our eye because 18 months ago, Andrej Karpathy set a challenge : “Can you take my 2h13m tokenizer video and translate the video into the format of a book chapter”.

Host A So the big idea is We’ve done it, and the chapter is below, including key pieces of code inlined, and images from the video at key points (hyperlinked to the video timestamp).

Host B What stood out to me is It’s a great video for learning this key piece of how LLMs work, and this new text version is great too.

Host A If you're curious, give the original a read: https://www.fast.ai/posts/2025-10-16-karpathy-tokenizers.

Host B And let us know what you try next!