Scaling real time performance with Bigtable in memory tier | Google Cloud Blog
Justy and Cody geek out over Bigtable's new in-memory tier, which uses RDMA to deliver sub-millisecond reads. Justy sees a product manager's dream for removing cache-layer nightmares, while Cody explains how direct memory access avoids CPU bottlenecks and why the hotspot resistance is the real game-changer.
Script: DeepSeek V4 Pro Voice: Inworld TTS 1.5 Max
Transcript
Justy So the promo goes live, and suddenly it’s two in the morning and you’re staring at a pager that won’t stop screaming because your cache node just melted.
Cody Yeah, I know that sound. It’s the sound of a hot key taking down the whole house of cards. You’ve got a database struggling to breathe and a caching layer that’s supposed to be the shield, but it just saturated. Now you’re on a call debating whether to double the instance size or just add more read replicas at 2 AM.
Justy Right. And you’re managing two systems, praying the cache hasn’t drifted from the source of truth. I’ve definitely overprovisioned RAM just to avoid that complexity, paying for warm data that didn’t need to be in memory at all. It feels like throwing money at a problem just to go back to sleep.
Cody Mm-hm.
Cody And that’s exactly the loop the Bigtable in-memory tier is supposed to break. They’ve basically taken the cache and baked it into the database itself using a hybrid storage architecture. It’s not a separate layer you have to keep in sync. When a row gets hot, it automatically moves into RAM. When it cools off, it drops back to SSD or even HDD. You stop being a plumber connecting two pipes and just let the table manage its own heat.
Justy Wait— so they’re claiming sub-millisecond reads without the cache-aside logic? That’s a product manager’s dream. It removes the operational overhead that every team I know complains about. But how do they actually pull that off without just creating another CPU bottleneck on the server side? Usually, saturating memory bandwidth is the next wall you hit.
Cody The secret is RDMA. Remote Direct Memory Access. It’s a networking trick that lets one machine grab data directly from another machine’s memory, completely bypassing the operating system and the CPU on the serving node. So the throughput and latency aren’t bound by the server’s CPU at all. It’s a direct path to the memory sticks.
Justy Oh interesting.
Justy So it’s kind of like Data Boost for disks, but for memory. I remember Data Boost let heavy analytics jobs bypass the main serving CPUs to hit storage directly, which stopped them from clobbering production traffic. This feels like the real-time equivalent of that.
Cody Exactly. Data Boost is a high-speed direct path to disk, and RDMA provides a high-speed direct path to memory. It’s the same philosophy. The real killer feature hidden in the announcement, though, is the hotspot resistance. They’re claiming 120,000 queries per second on a single row. That’s absurd. In a traditional system, a single row getting that much traffic would completely crush the server hosting that shard, but RDMA decouples the read load from that server’s compute.
Justy That solves the celebrity problem. You know, the social media example where a handful of users have a hundred million followers while everyone else has a few hundred. Or in e-commerce, when a single product page goes viral during a flash sale. Normally, you’d have to manually shard that product key or build a complex multi-layered cache just to survive the stampede. If Bigtable handles that automatically, the TCO argument almost writes itself.
Cody Yeah, you stop overpaying for idle RAM. The tiering logic keeps only the hot blocks in memory, and the cold stuff sits cheaply on disk. But I do wonder about the write path. The announcement focuses heavily on read throughput and latency. If the in-memory tier is mostly a read-optimized layer, you still need to make sure writes are durable on the SSD nodes. There’s a consistency contract there that the blog post glosses over a bit. I’d want to see the p99 latency during a hea
Justy Fair point. The peace of mind pitch is great for reads, but if the write path introduces a weird stall during compaction, you’re still getting paged at 2 AM. Still, for a team sitting on a massive read-heavy table, this feels like a pretty big deal. It’s one less system to manage.
Cody For sure. If you want to play with the mechanics of it without spinning up a full enterprise cluster, you can actually simulate the RDMA behavior locally. There’s a library called libfabric that lets you experiment with direct memory access patterns. You could set up a little weekend project where you run a gRPC server in front of a simple in-memory store, but have the client read via RDMA to see how the CPU usage flatlines compared to a standard socket read.
Justy I like that. And for a solo builder who just wants to see the managed version, a free Bigtable instance paired with a simple load generator script in Go or Python would be a solid Build Next. Just hammer a single row key with read requests and watch the latency graph stay flat while the CPU graph barely twitches. That’s a compelling demo to show a CTO.
Cody Yeah, just don’t forget to clean up the instance or you’ll get a nasty surprise on your billing dashboard. Anyway, this feels like one of those rare releases that actually shrinks the architecture diagram instead of adding a new box to it.
Justy Agreed. And honestly, if it means one less middle-of-the-night call about a saturated cache node, I’d call that a win for the whole industry. Okay, I need to go actually test this thing. Talk to you later, Cody.