The BigQuery Java consumer library makes it simpler to entry BigQuery APIs utilizing Java. Since its inception in 2016, quite a few options and enhancements have been added to the consumer library to allow BigQuery builders and practitioners. Most notably, a brand new Connection interface was lately launched within the Java consumer library. It goals to enhance the usability and efficiency of the primary characteristic of BigQuery – its potential to run SQL queries. This new interface defines BigQuery Java consumer APIs in an industry commonplace approach for database functions.
The primary technique we’re introducing on this new interface is `executeSelect`. It helps read-only SELECT queries and supplies greater than 30x sooner question efficiency on excessive throughput workloads (studying 100 million rows). We plan on introducing `executeUpdate` and `execute` afterward to assist any SQL (i.e. DML, DDL, and scripts).
On this weblog publish, we’ll delve into the design and implementation of this new interface and focus on how one can shortly get began with it
What has modified in comparison with bigquery.question?
The legacy bigquery.question consumer library technique solely makes use of BigQuery’s jobs.getQueryResults and jobs.question (when relevant) APIs. The brand new executeSelect technique, alternatively, additionally makes use of the extra performant tabledata.checklist API and the BigQuery Storage Learn API for top throughput queries which makes use of Apache Arrow because the row serialization format. As well as, we return BigQueryResult (as an alternative of TableResult) which incorporates an underlying java.sql.ResultSet object for industry-standard consumption of question outcomes.
Question execution logic
Previous to introducing executeSelect, bigquery.question solely used REST API endpoints (jobs.getQueryResults and jobs.question) to retrieve question outcomes. One main enchancment that’s launched by the brand new executeSelect technique is the mixing with the BigQuery Storage Learn API under-the-hood. The library determines the optimum mechanism for returning rows primarily based on heuristics similar to end result dimension. If the prerequisite situations for utilizing the Storage Learn API are met, then we initialize a background thread which reads a stream of data from the desk. The BigQuery Java consumer library makes use of Apache Arrow for row serialization and takes care of the column to row translation in order that information might be consumed conventionally.
Moreover, we’re utilizing tabledata.checklist API as an alternative of jobs.getQueryResults API when the BigQuery Storage Learn API shouldn’t be used since tabledata.checklist API is quicker in fetching question outcomes.
Return kind
We determined to return BigQueryResult which supplies a BigQuery-esque ResultSet object for customers emigrate their workload extra conveniently. The underlying ResultSet object additionally permits us to summary away the implementation particulars on REST API and Learn API end result dealing with (pagination, row serialization, and so on.). Nevertheless, not all strategies in ResultSet are related to BigQuery. Due to this fact, the strategies which are needed are applied in BigQueryResultImpl and the strategies which are irrelevant are dealt with in AbstractJdbcResultSet. We applied all the information kind accessors to allow utilization. The code follows a JDBC-esque syntax which ought to cut back the onboarding time for builders. Let’s check out how one can get began shortly under.
How will you get began with executeSelect?
You can begin utilizing the executeSelect technique by upgrading your BigQuery Java consumer library model to 2.12.zero and above.
We anticipate consumer library customers to code in opposition to the brand new Connection interface. SQL statements are executed and outcomes are returned with out the context of a Connection. Due to this fact, the very first thing to do is to create a Connection object and the Java consumer library will use it to find out whether or not to make use of jobs.question, jobs.getQueryResults, tabledata.checklist, or the BigQuery Storage Learn API to execute the question and fetch question outcomes. By default, the Learn API is enabled.
Create a Connection
To create a Connection with none particular configuration: