In this article I want to check the truth about the slowness of Python language when it comes to retrieving data directly from Oracle database.

Second goal is to show graphically impact of changing the array size on performance, which is not possible and convenient to do with SQL*Plus.

 

Python is widely used as the best tool for Machine Learning and AI, which are one of the most compute intensive tasks computer can solve.

On the other side, Python is dynamic, interpreted language, which should mean that Python is also very slow in all CPU intensive tasks.

Although fetching a data from database is more I/O intensive than CPU, I want to explore how efficient Python/cx_Oracle is when compared with SQL*Plus & SQL*Net in fetching data out from the Oracle database.

From this article you can also learn how Python is convenient to test & prove some point (in this case array size).

I’m using Oracle’s SH demo schema, and it’s SALES table which has almost 1 million rows.

I assume you have Oracle client installed (or Instant client). If you don’t have it, you can search my blogs where I explain how to do that properly.

As prerequisite you need to have Python 3+ version (I’m using 3.6), and you need to have cx_Oracle installed.

All of that you can find in one of my previous articles.

Here is a code you can use:

import cx_Oracle
import os
import datetime

os.environ["ORACLE_HOME"] = "/path/to/oracle/client/home"

connectionstring = cx_Oracle.makedsn(host = 'localhost', port = 9999, service_name = 'orcl.localdomain')
con = cx_Oracle.connect(user='sh', password = 'sh', dsn = connectionstring)

elapsed = []
sql = 'SELECT * FROM sales'


try:

    cur_array_size = (1, 10, 100, 500, 1000, 2000, 3000, 4000, 5000)
    for size in cur_array_size:
        cur = con.cursor()
        cur.arraysize = size
        beginTime = datetime.datetime.now()
        r = cur.execute(sql).fetchall()
        ela = round(((datetime.datetime.now() - beginTime).seconds) + ((datetime.datetime.now() - beginTime).microseconds/1000000), 2)
        elapsed.append(ela)
        print('array size = ', size, '=', datetime.datetime.now() - beginTime)
        cur.close()
except cx_Oracle.DatabaseError as ora_exception:
     error, = ora_exception.args
     print("Oracle Error Code: ",  error.code)
     print("Oracle Error Message: ", error.message)

con.close()

import matplotlib.pyplot as plt

mat = list(zip(cur_array_size, elapsed))
print(mat)

plt.plot(cur_array_size, elapsed, linewidth=4)
plt.title("Array size / Elapsed time in sec")
plt.xlabel("Array size")
plt.ylabel("Seconds")
plt.grid(True, linestyle='-', color='0.75')
plt.show()

On the next two figures you can graphically see the impact of the array size.

Everything else remains the same as in previous article.

Here you can find numerical results to be able to compare the results with SQL*Plus & SQL*Net.

  • array size = 1 elapsed(sec) 0:06:49.438523
  • array size = 10 elapsed(sec) 0:01:14.309449
  • array size = 100 elapsed(sec) 0:00:03.177730
  • array size = 500 elapsed(sec) 00:01.890235
  • array size = 1000 elapsed(sec) 0:00:01.612259
  • array size = 2000 elapsed(sec) 0:00:01.457572
  • array size = 3000 elapsed(sec) 0:00:01.500079
  • array size = 4000 elapsed(sec) 0:00:01.413637
  • array size = 5000 elapsed(sec) 0:00:01.254974

If you compare Python results with SQL*Plus & SQL*Net from the previous article, results are almost identical.

 

Summary:

I’ve already described impact of the array size on performance.

Here I added graph to visualize the results which is not convenient to do with SQL*Plus.

But the main reason for writing this article is to show that performance of cx_Oracle are identical as SQL*Plus & SQL*Net combination.

I’ll continue to break a myths around Python, as I’ll use it in many future articles.



Get notified when a new post is published!

Loading

Comments

There are no comments yet. Why not start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.