We recognized two main alternatives with the prevailing implementation: bettering utilization and decreasing CPU overhead.
First, our statistics indicated that we had been utilizing Bloom filters in a decrease than anticipated share of requests. This was on account of our Bloom filter implementation anticipating each the “column household” and the “column” within the learn filter, whereas a excessive share of shoppers filter by “column household” solely — which implies the Bloom filter can’t be used. We elevated utilization by implementing a hybrid Bloom filter that was relevant in each circumstances, leading to a 4x enhance in utilization. Whereas this transformation made the Bloom filters bigger, the general disk footprint elevated by solely a fraction of a %, as Bloom filters are sometimes two orders of magnitude smaller than the information they characterize.
Second, the CPU price of accessing the Bloom filters was excessive, so we made enhancements to Bloom filters that optimize runtime efficiency:
Native cache for particular person reads: When queries choose a number of column households and columns in a single row, it’s common that the question will use the identical Bloom filter. We reap the benefits of this by storing a neighborhood cache of the Bloom filters used for the question being executed.
Bloom filter index cache: Since Bloom filters are saved as knowledge, accessing them for the primary time includes fetching three blocks — two index blocks and an information block — then performing a binary search on all three. To keep away from this overhead we constructed a customized in-memory index for simply the Bloom filters. This cache tracks which Bloom filters now we have in our block cache and gives direct entry to them.
General these adjustments decreased the CPU price of accessing Bloom filters by 60-70%.
Within the earlier part we famous that knowledge for a single row could also be saved in a number of SSTables. Row knowledge from these SSTables is merged right into a remaining outcome set, and since blocks can both be in reminiscence or on disk, there’s a danger of introducing extra latency from filesystem entry. Bigtable’s prefetcher was designed to learn forward of the merge logic and pull in knowledge from disk for all SSTables in parallel.