In previous article I’ve described how to use BigQuery by using DataGrip (cross-platform IDE for database management and development) from the client machine.
This is useful when you want to quickly get result set based on the query and allows you to stay within a tool you already use.
In this article I’ll describe how to integrate Google BigQuery with Oracle database that is installed on your local network.
Instead of Oracle database, you can also integrate BigQuery with ESB (Enterprise Service Bus) or streaming platform like Kafka.
First you need to get the JSON key for your BigQuery service, which is described in my previous article, and save that file in your local machine.
https://www.josip-pojatina.com/en/accessing-google-bigquery-from-your-local-machine/
After that you need to install BigQuery client libraries that are available for different programming languages such as C#, Go, Java, Node.JS, PHP, Python and Ruby.
Installation is usually very simple, and you can find a good description on the following link:
https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python
In this article I’ll use Python, as one of the most popular language today, to keep size if this post under control (other programming languages are usually more verbose).
The following command will install Google BigQuery libraries, as described in the previous link.
pip install --upgrade google-cloud-bigquery
In case you are using Anaconda Python distribution, you should use the following link:
https://anaconda.org/conda-forge/google-cloud-bigquery
The following command (executed from the conda virtual environment) will install BigQuery client libraries for Anaconda Python distribution.
conda install -c conda-forge google-cloud-bigquery
Next step is to find out BigQuery table (or view) definition.
To do that, you need to connect to your Google Cloud instance, find your table as you can see on the following picture:
Next I suggest is to click on preview to get familiarized with table content:
Finally click on the Schema tab to get the datatype for all columns:
Next step is to connect to the Oracle database and to create a target table.
Here I’m using famous Scott demo schema.
create table bigquery_test
(
tst_state varchar2(2),
tst_gender varchar2(1),
tst_year number(4),
tst_name varchar2(255),
tst_number number
)
tablespace users;
Table BIGQUERY_TEST created.
Now it’s time to code hybrid Cloud solution.
For integration with Oracle, you need to install cx_Oracle Python package which I’ve already described (just search for cx_Oracle within the site).
After that you need to create a Python file and fill it with the code from below.
import os
#Oracle setup
import cx_Oracle
os.environ["ORACLE_HOME"] = "/home/app/product/18.0.0/client_1"
connection = cx_Oracle.connect('scott/tiger@orcl')
cursor = connection.cursor()
#BigQuery setup
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/home/GoogleCloud/Google_BigQuery/bigquery/bigquery-test-8943236d1bf.json"
from google.cloud import bigquery
client = bigquery.Client()
#query I want to execute
query = (
"SELECT * FROM `bigquery-public-data.usa_names.usa_1910_current` "
'WHERE state = "TX" '
"LIMIT 100"
)
query_job = client.query(
query,
# Location must match that of the dataset(s) referenced in the query.
location="US",
) # API request - starts the query
for row in query_job: # API request - fetches results
# Row values can be accessed by field name or index
cursor.execute("insert into bigquery_test(tst_state, tst_gender, tst_year, tst_name, tst_number) values(:tst_state, :tst_gender, :tst_year, :tst_name, :tst_number)", {'tst_state' : row.state, 'tst_gender' : row.gender, 'tst_year' : row.year, 'tst_name' : row.name, 'tst_number' : row.number})
cursor.close()
connection.commit()
connection.close()
By using the Oracle SQL Developer you need to execute the following query:
select * from bigquery_test;
to be able to check whether records have been populated.
Summary:
This is an example of how to create Hybrid Cloud solution.
Similar code can be used for populating BigQuery with data from your local (Oracle) database (to make a data flow in the opposite direction from this example).
It would be interesting to measure performance of hybrid cloud solution as it spans across the on-premise infrastructure and adds many moving parts, but that discussion I’ll leave for one of the future posts.
Comments