Complex near real-time transformations in data pipelines
For many years, ETL daily batch job was the dominant way to perform data transformations before loading in Data Warehouse. These days requirements are quite different starting with the most important one which is to ensure that new data has to be available for AI/ML and analysis near real time. Moreover, classical DWH databases are […]
When visual tool for monitoring appears to lie
A few days ago I was asked to take a look at two queries that shows up among the top queries in the Oracle SQL Developer instance viewer. I’ve extracted two statements that are relevant for this case. The select statements for both records are almost identical: From the SELECT statement it is obvious […]
Functional monitoring of Microservices architecture by using Apache Superset
Many of you who have started to develop modern apps by using Microservices approach, have already learned that development tools, debuggers, performance monitoring and tracing lag behind the desired architecture. Situation is even worse when it comes to functional monitoring, where your goal is to find out what is going on with your system from […]
Missing columns in PrestoSQL
One of the first issues when starting to use PrestoSQL distributed query engine is related to missing columns of certain data types, especially numeric and all variants of date. This issue is usually because of missing precision at the data source, which is not only one of the most common, but also one of the […]
Hybernate FetchType Eager performance issue
Quite recently I had one interesting case related to the quality of code generated by the Hybernate, in which I developed code which runs 25 thousand times faster than the code generated by the framework. It’s well known when you decide to use any framework to speed up code development process, you silently agree […]
Tuning Connection pool in modern Microservice architecture
Connection pool has always been a great way to ensure a low latency when establishing connection with a database, while at the same time keeping the number of open sessions under control. It’s one of the best ways to balance speed with resource consumption. With connection pool in place, connection is already established and ready […]
Trino (ex. Presto) – troubleshooting distributed transactions among various data sources
In this post I’ll demonstrate one of many use cases of Presto technology, that you might overlooked – How to troubleshoot distributed transactions which are very common these days as a result of a complex Microservices architecture. In the following SELECT statement I’ll combine three different data sources: Oracle Postgres Kafka by using good old […]
Trino (ex. Presto) – high performance distributed query engine
In this article I’ll share some of my experiences with Trino (ex. Presto) – high performance distributed query engine. First some intro about the project Presto. Couple of members from the Facebook infrastructure team created the project Presto to address problems they have with 300 Petabytes Hadoop Data Warehouse. The main goal of the […]
Postgres monitoring with Percona PMM
For those who are coming from Oracle world, the best alternative database is probably Postgres, because of many similarities between those two Db engines (data types, tablespace concept etc.). However, one of the first thing you want to do is to grab a full control over what is going on in your database. If […]
Dockly -Swiss army knife for managing Docker containers without Kubernetes
These days more and more vendors embrace Docker containers as preferable way of software distribution. In many cases you don’t need to deal with all the complexities that comes with Kubernetes orchestration. Instead you just want to have a simple tool which provides a full control and visibility over what is going on in containers […]