In previous article I’ve described how to use BigQuery by using DataGrip (cross-platform IDE for database management and development) from the client machine.

This is useful when you want to quickly get result set based on the query and allows you to stay within a tool you already use.

In this article I’ll describe how to integrate Google BigQuery with Oracle database that is installed on your local network.

Instead of Oracle database, you can also integrate BigQuery with ESB (Enterprise Service Bus) or streaming platform like Kafka.

First you need to get the JSON key for your BigQuery service, which is described in my previous article, and save that file in your local machine.

https://www.josip-pojatina.com/en/accessing-google-bigquery-from-your-local-machine/

 

After that you need to install BigQuery client libraries that are available for different programming languages such as C#, Go, Java, Node.JS, PHP, Python and Ruby.

Installation is usually very simple, and you can find a good description on the following link:

https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python

 

In this article I’ll use Python, as one of the most popular language today, to keep size if this post under control (other programming languages are usually more verbose).

The following command will install Google BigQuery libraries, as described in the previous link.

pip install --upgrade google-cloud-bigquery

In case you are using Anaconda Python distribution, you should use the following link:

https://anaconda.org/conda-forge/google-cloud-bigquery

 

The following command (executed from the conda virtual environment) will install BigQuery client libraries for Anaconda Python distribution.

conda install -c conda-forge google-cloud-bigquery

 

Next step is to find out BigQuery table (or view) definition.

To do that, you need to connect to your Google Cloud instance, find your table as you can see on the following picture:

Dataset tables

Next I suggest is to click on preview to get familiarized with table content:

Table data preview

Finally click on the Schema tab to get the datatype for all columns:

BigQuery Table definition

Next step is to connect to the Oracle database and to create a target table.

Here I’m using famous Scott demo schema.

create table bigquery_test
(
tst_state varchar2(2), 
tst_gender varchar2(1), 
tst_year number(4), 
tst_name varchar2(255), 
tst_number number
)
tablespace users; 

Table BIGQUERY_TEST created.

 

Now it’s time to code hybrid Cloud solution.

For integration with Oracle, you need to install cx_Oracle Python package which I’ve already described (just search for cx_Oracle within the site).

After that you need to create a Python file and fill it with the code from below.

import os

#Oracle setup
import cx_Oracle
os.environ["ORACLE_HOME"] = "/home/app/product/18.0.0/client_1"

connection = cx_Oracle.connect('scott/tiger@orcl')

cursor = connection.cursor()



#BigQuery setup
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/home/GoogleCloud/Google_BigQuery/bigquery/bigquery-test-8943236d1bf.json"

from google.cloud import bigquery

client = bigquery.Client()

#query I want to execute
query = (
    "SELECT * FROM `bigquery-public-data.usa_names.usa_1910_current` "
    'WHERE state = "TX" '
    "LIMIT 100"
)
query_job = client.query(
    query,
    # Location must match that of the dataset(s) referenced in the query.
    location="US",
)  # API request - starts the query

for row in query_job:  # API request - fetches results
    # Row values can be accessed by field name or index
    cursor.execute("insert into bigquery_test(tst_state, tst_gender, tst_year, tst_name, tst_number) values(:tst_state, :tst_gender, :tst_year, :tst_name, :tst_number)", {'tst_state' : row.state, 'tst_gender' : row.gender, 'tst_year' : row.year, 'tst_name' : row.name, 'tst_number' : row.number})

cursor.close()
connection.commit()
connection.close()

 

By using the Oracle SQL Developer you need to execute the following query:

select * from bigquery_test;

to be able to check whether records have been populated.

Oracle table populated with BigQuery data

 

Summary:

This is an example of how to create Hybrid Cloud solution.

Similar code can be used for populating BigQuery with data from your local (Oracle) database (to make a data flow in the opposite direction from this example).

It would be interesting to measure performance of hybrid cloud solution as it spans across the on-premise infrastructure and adds many moving parts, but that discussion I’ll leave for one of the future posts.



Get notified when a new post is published!

Loading

Comments

There are no comments yet. Why not start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.