Exploring Next
Exploring Next — Ep 371 w/ Justy & Cody — How NetEase Games cut LLM cold starts from 42 minutes to 30 seconds
Justy and Cody unpack how NetEase Games used Kubernetes-native data orchestration with Fluid to shrink LLM inference cold starts from 42 minutes to about 30 seconds, and what that means for teams running their own models.