06. 12. 2021.

How to create a real time machine learning pipeline with StreamSets Transformer

Artificial Intelligence (AI) with its subset ML (Machine learning) is probably one of the hottest topics in IT industry today. Many companies are struggling to implement AI algorithms into data pipelines to make smarter decisions with more or less success. First of all, the AI is a wide topics which requires knowledge of math, statistics, […]

29. 11. 2021.

Complex near real-time transformations in data pipelines

For many years, ETL daily batch job was the dominant way to perform data transformations before loading in Data Warehouse. These days requirements are quite different starting with the most important one which is to ensure that new data has to be available for AI/ML and analysis near real time. Moreover, classical DWH databases are […]

21. 01. 2021.

Kafka & TCP Retrans Error rate

Recently I had an interesting case where in the data pipeline I’ve found duplicate messages in the Kafka topics. Duplicate records in Kafka topics might appear for many different reasons, but most of them you can find only those related to the Kafka settings (especially those related to the Kafka settings). In this article you […]