July 27, 2024

[ad_1]

Editor’s be aware: At present we hear from Grupo Globo, the most important media group in Latin America, which operates the Globoplay streaming service. This submit outlines their migration from Apache Cassandra to Bigtable and learnings alongside the best way.


Grupo Globo, Latin America’s largest media group, owns and operates Globoplay — a streaming service the place customers can entry reside TV broadcasts along with on-demand video and audio content material. Since most of our customers don’t devour content material in a single sitting, and swap between a number of gadgets, the flexibility for them to renew watching a title the place they left off is a key functionality for our service.

Our Proceed Watching API (CWAPI) is an utility written in Go that processes audio and video watched timestamps, a workload which consists of 85% write and 15% learn requests. To deal with such write site visitors in a performant manner, we traditionally relied on Apache Cassandra, which is understood for its excessive write-throughput and low write latencies.

Initially, Cassandra was put in on bodily machines in Globo’s proprietary knowledge heart. Given the easy compatibility with their present on-premises setup, a lift-and-shift method to Compute Engine made probably the most sense on the time. Though the appliance functioned effectively on this setup, accommodating variations in consumer site visitors required including and eradicating nodes, a time-consuming observe, which in the end resulted in over-provisioned clusters and drove up our infrastructure prices. Working with Cassandra additionally meant patching the software program usually, an usually ignored, however vital operational overhead.

We started the method of evaluating doable options to exchange Cassandra. We had examine how Bigtable was getting used at YouTube, and we have been inspired to see that different streaming companies like Spotify, had made the swap from Cassandra to Bigtable, realizing financial savings of as much as 75%. That mentioned, to make sure, we needed to conduct our personal analysis utilizing our personal particular workloads and serving site visitors.

We initially checked out “Cassandra as a Service” options on Google Cloud, which supplied the comfort of lift-and-shift with no code modifications. Nonetheless, after benchmarking with our personal knowledge and working our load assessments, we discovered that Bigtable was the best choice for us. It had a decrease value of possession and extra capabilities, though it required extra migration work than managed Cassandra within the cloud.

Why we selected Bigtable

Bigtable proved to be a powerful various, primarily because of its traits of low latency at excessive learn/write throughput, scalability to giant volumes of knowledge, resilience, built-in Google Cloud integrations, and having been validated by Google merchandise with billions of customers. Being a managed service, it additionally simplifies operation in comparison with self-managed databases. Capabilities corresponding to excessive availability, knowledge sturdiness and safety are assured out-of-the-box.

Moreover, multi-primary replication between globally distributed areas will increase availability and ensures quicker entry by robotically routing requests to the closest area, which additionally helped us ship read-your-writes consistency for our use case with ease. Bigtable supplies native instruments corresponding to utilization metrics dashboards and Key Visualizer, which helps discover factors for efficiency enhancements by analyzing the sample of entry to keys. Information in Bigtable may also be queried from BigQuery with out having to repeat the information to the information warehouse.

Implementation, knowledge migration and rollout

After deciding that Bigtable can be the very best various to exchange Cassandra, the crew deliberate the migration within the following steps.

Porting Cassandra code to Bigtable

Bigtable supplies all kinds of shopper libraries, together with Go. We targeted on the code paths that write to the database first, which might permit us emigrate knowledge between databases. As soon as writing was completed, we carried out the studying options and transformed all Cassandra code to Bigtable with none points. Exams have been created to confirm that every characteristic built-in correctly with the remainder of the system.

Enabling duplicate writing in each databases

To make sure that no new knowledge can be misplaced in the course of the migration, we enabled duplicate writing on each databases. We started by writing 1% of the information to every database, then regularly elevated the share as we confirmed that there have been no points. This allowed us to validate how the database and utility behaved with out impacting the consumer, since Cassandra remained the first database all through the transition.

Information migration

We determined to create a pipeline utilizing Dataflow to carry out batch migration. Utilizing Bigtable’s Dataflow template as our start line, we discovered the method simple to implement and really performant.

The script learn a static file from the Cassandra dump, which was saved in a bucket on Cloud Storage. Every line of the file represented a line from the Cassandra desk. The script then remodeled the information for Bigtable and inserted it into the desk. On the similar time, CWAPI was writing new site visitors to Bigtable.

After validating the script within the growth setting, we ready it for execution in manufacturing. Because of the giant quantity of knowledge, we break up the dump file into a number of information, every roughly 190 GB in measurement. This technique lowered the probability of getting to reprocess knowledge within the occasion of an sudden error in the course of the execution of the Dataflow script.

Validating the migration

To validate the migration, we created a easy API that was deployed internally. This API uncovered two ports, every with an endpoint that was equal by way of parameters and response, however searching for knowledge from its respective database: Cassandra and Bigtable.

[ad_2]

Source link