How to create a real time machine learning pipeline with StreamSets Transformer
Artificial Intelligence (AI) with its subset ML (Machine learning) is probably one of the hottest topics in IT industry today. Many companies are struggling to implement AI algorithms into data pipelines to make smarter decisions with more or less success. First of all, the AI is a wide topics which requires knowledge of math, statistics, […]
Complex near real-time transformations in data pipelines
For many years, ETL daily batch job was the dominant way to perform data transformations before loading in Data Warehouse. These days requirements are quite different starting with the most important one which is to ensure that new data has to be available for AI/ML and analysis near real time. Moreover, classical DWH databases are […]
Processing billions of records with Python & Oracle
Suppose you want to analyze data set by using your favorite tools (Pandas/NumPy). By reading my previous article: How to efficiently load data into Python from the Oracle RDBMS , you should realize how important is to do as much of data processing at the database SQL engine layer to get out dataset suitable for […]
How to efficiently load data into Python from the Oracle RDBMS
Python has many different ways to fetch the data needed to do some processing. Although majority of examples you can find around are using CSV file load, fetching a data from the database is still the most common way in practice. For all tests I’ll use SALES table from the Oracle SH sample schema. 1. […]
CUDA Tuning – GPU card details
In the last post I’ve explained how to install nVidia Toolkit 10.1 on Ubuntu 18.04 LTS. Details can be found on the following pages: https://www.josip-pojatina.com/en/how-to-install-cuda-toolkit-on-ubuntu-18-04-lts/ or https://www.performatune.com/en/how-to-install-cuda-toolkit-on-ubuntu-18-04-lts/ In this article I’ll explain the most important card details you need to know. Prerequisite for this article (besides nVidia drivers and Toolkit 10.1 Toolkit installed) is to […]
Performance comparison: Python & cx_Oracle versus SQL*Plus & SQL*Net
In this article I want to check the truth about the slowness of Python language when it comes to retrieving data directly from Oracle database. Second goal is to show graphically impact of changing the array size on performance, which is not possible and convenient to do with SQL*Plus. Python is widely used as […]
Python & Oracle Instant Client connection setup on Linux – part 2
In the previous article I’ve described how to install Oracle Instant Client and setup cx_Oracle Python driver correctly. You only need Basic package to install: Version 18.3.0.0.0 Base – one of these packages is required Basic Package – All files required to run OCI, OCCI, and JDBC-OCI applications Download instantclient-basic-linux.x64-18.3.0.0.0dbru.zip (72,794,506 bytes) (cksum – 3435694482) […]
Python & Oracle Instant Client connection setup on Linux
In the previous post https://www.josip-pojatina.com/en/python-oracle-connection-options/ you can find how to connect to Oracle database by using cx_oracle Python driver, full Oracle Client installation and Red Hat rpm based distribution (Red Hat, CentOS, Oracle Linux, Fedora). In reality, more than 90% of all Linux servers in a Cloud belongs to Ubuntu (unlike on premise situation where […]
Python – Oracle connection options
In this blog I’ll present several ways for connecting to Oracle database. As a first step To connect to Oracle database you need to import cx_oracle package. You can think of cx_oracle as Oracle jdbc driver for Java programmers. Since the first public appearance in July, 20017, Oracle is constantly improve Python cx_driver by adding […]
Python as bash replacement
Even today, on many projects I can still find that many developers are still using bash & Korn shell or even Pro*C as a main tool for developing scripts that will be executed as part of batch job in one of the following ways: Unix/Linux cron Oracle’s dbms_job / dbms_scheduler commercial enterprise job scheduling software […]