Cloudsviewer
  • Home
  • Google Cloud
  • AWS Amazon
  • Azure
No Result
View All Result
  • Home
  • Google Cloud
  • AWS Amazon
  • Azure
No Result
View All Result
cloudsviewer.com
No Result
View All Result
Home Google Cloud

Apache Hive to BigQuery | Google Cloud Blog

August 16, 2022
Five Behaviors for Digital Diffusion in EMEA
Share on FacebookShare on Twitter


Are you trying to migrate a considerable amount of Hive ACID tables to BigQuery? 

ACID enabled Hive tables help transactions that settle for updates and delete DML operations. On this weblog, we are going to discover migrating Hive ACID tables to BigQuery. The strategy explored on this weblog works for each compacted (main / minor) and non-compacted Hive tables. Let’s first perceive the time period ACID and the way it works in Hive.

ACID stands for 4 traits of database transactions:  

  • Atomicity (an operation both succeeds fully or fails, it doesn’t depart partial knowledge)

  • Consistency (as soon as an utility performs an operation the outcomes of that operation are seen to it in each subsequent operation)

  • Isolation (an incomplete operation by one consumer doesn’t trigger surprising uncomfortable side effects for different customers)

  • Sturdiness (as soon as an operation is full it will likely be preserved even within the face of machine or system failure)

Beginning in Model zero.14, Hive helps all ACID properties which permits it to make use of transactions, create transactional tables, and run queries like Insert, Replace, and Delete on tables.

Underlying the Hive ACID desk, information are within the ORC ACID model. To help ACID options, Hive shops desk knowledge in a set of base information and all of the insert, replace, and delete operation knowledge in delta information. On the learn time, the reader merges each the bottom file and delta information to current the most recent knowledge. As operations modify the desk, a whole lot of delta information are created and have to be compacted to keep up satisfactory efficiency.  There are two forms of compactions, minor and main.

  • Minor compaction takes a set of current delta information and rewrites them to a single delta file per bucket.

  • Main compaction takes a number of delta information and the bottom file for the bucket and rewrites them into a brand new base file per bucket. Main compaction is dearer however is more practical.

Organizations configure automated compactions, however additionally they must carry out guide compactions when automated fails. If compaction just isn’t carried out for a very long time after a failure, it leads to a whole lot of small delta information. Operating compaction on these massive numbers of small delta information can change into a really useful resource intensive operation and might run into failures as effectively. 

Among the points with Hive ACID tables are:

  • NameNode capability issues attributable to small delta information.

  • Desk Locks throughout compaction.

  • Operating main compactions on Hive ACID tables is a useful resource intensive operation.

  • Longer time taken for knowledge replication to DR attributable to small information.

Advantages of migrating Hive ACIDs to BigQuery

Among the advantages of migrating Hive ACID tables to BigQuery are:

  • As soon as knowledge is loaded into managed BigQuery tables, BigQuery manages and optimizes the info saved within the inner storage and handles compaction. So there won’t be any small file challenge like we’ve got in Hive ACID tables.

  • The locking challenge is resolved right here as BigQuery storage learn API is gRPC based mostly and is extremely parallelized. 

  • As ORC information are fully self-describing, there is no such thing as a dependency on Hive Metastore DDL. BigQuery has an in-built schema inference characteristic that may infer the schema from an ORC file and helps schema evolution with none want for instruments like Apache Spark to carry out schema inference. 

Hive ACID desk construction and pattern knowledge

Right here is the pattern Hive ACID  desk  “employee_trans” Schema



Source link

Guest

Guest

Next Post
5 steps to prepare developers for cloud modernization | Azure Blog and Updates

Gain Deeper Insights with Microsoft Intelligent Data Platform | Azure Blog and Updates

Recommended.

MLOPs Blog Series Part 3: Testing scalability of secure machine learning systems using MLOps | Azure Blog and Updates

Responsible AI investments and safeguards for facial recognition | Azure Blog and Updates

July 3, 2022

Passwords for social media accounts could be required for some to enter country

April 11, 2022

Trending.

Complete list of Google Cloud blog links 2021

Complete list of Google Cloud blog links 2021

April 18, 2021
AWS Named as a Leader for the 11th Consecutive Year in 2021 Gartner Magic Quadrant for Cloud Infrastructure & Platform Services (CIPS)

AWS Named as a Leader for the 11th Consecutive Year in 2021 Gartner Magic Quadrant for Cloud Infrastructure & Platform Services (CIPS)

August 2, 2021
Global AR WYSIWYG Editor Software Market Research Analysis of COVID 19

Global AR WYSIWYG Editor Software Market Research Analysis of COVID 19

August 20, 2020
Introducing a Google Cloud architecture diagramming tool

Introducing a Google Cloud architecture diagramming tool

February 17, 2022
Google Cloud Celebrates International Women’s Day

Google Cloud Celebrates International Women’s Day

March 9, 2021
  • Advertise
  • Privacy & Policy

© 2022 Cloudsviewer - Cloud computing news. Quick and easy.

No Result
View All Result
  • Home

© 2022 Cloudsviewer - Cloud computing news. Quick and easy.