30. 01. 2023.

Real-time Deep Learning at Bug Future Show conference

On Thursday, February 2, 2023, the tenth Bug Future Show will be held, which for the first time will introduce a third parallel track called “.debug Future Show” intended primarily for IT people, where, among other things, you can find my lecture titled called “Real-time Deep Learning” starting at 11:15.   This is a short […]

30. 05. 2022.

How to get 100% cache hit rate by using Change Data Capture & Redis

In this blog I’ll explain how to get 100% cache hit rate by using CDC (Change Data Capture) technology and Redis cache.   There are multiple benefits of having caching layer in front of back-end database system. By fetching data from the cache instead of back-end we are actually free up valuable database resources for […]

25. 02. 2022.

Redis performance tuning – Top 10 mistakes

Redis is the most popular Key-value data store and one of the most popular database systems overall.   According to Db-engine ranking: https://db-engines.com/en/ranking/key-value+store (You can check the picture below), Redis is on the top position by large margin among Key-values data stores. Amazon DynamoDb on the second place, lags behind Redis more than a twice. […]

26. 01. 2022.

StreamSets review – creating real time data pipelines in no time

In this post I’ll try to review StreamSets Data Collector, one of the most popular tools for creating smart data pipelines for streaming, batch and change data capture, which allows you to move data around in a near real time.   First I’d like to point out that the whole review is my own personal […]

06. 12. 2021.

How to create a real time machine learning pipeline with StreamSets Transformer

Artificial Intelligence (AI) with its subset ML (Machine learning) is probably one of the hottest topics in IT industry today. Many companies are struggling to implement AI algorithms into data pipelines to make smarter decisions with more or less success. First of all, the AI is a wide topics which requires knowledge of math, statistics, […]

29. 11. 2021.

Complex near real-time transformations in data pipelines

For many years, ETL daily batch job was the dominant way to perform data transformations before loading in Data Warehouse. These days requirements are quite different starting with the most important one which is to ensure that new data has to be available for AI/ML and analysis near real time. Moreover, classical DWH databases are […]

05. 06. 2021.

Missing columns in PrestoSQL

One of the first issues when starting to use PrestoSQL distributed query engine is related to missing columns of certain data types, especially numeric and all variants of date. This issue is usually because of missing precision at the data source, which is not only one of the most common, but also one of the […]

01. 04. 2021.

Trino (ex. Presto) – troubleshooting distributed transactions among various data sources

In this post I’ll demonstrate one of many use cases of Presto technology, that you might overlooked – How to troubleshoot distributed transactions which are very common these days as a result of a complex Microservices architecture. In the following SELECT statement I’ll combine three different data sources: Oracle Postgres Kafka by using good old […]

17. 03. 2021.

Trino (ex. Presto) – high performance distributed query engine

In this article I’ll share some of my experiences with Trino (ex. Presto) – high performance distributed query engine.   First some intro about the project Presto. Couple of members from the Facebook infrastructure team created the project Presto to address problems they have with 300 Petabytes Hadoop Data Warehouse. The main goal of the […]

16. 02. 2021.

Postgres monitoring with Percona PMM

For those who are coming from Oracle world, the best alternative database is probably Postgres, because of many similarities between those two Db engines (data types, tablespace concept etc.).   However, one of the first thing you want to do is to grab a full control over what is going on in your database. If […]