Recently I’ve got a chance to try YugabyteDb, one of the new age databases which try to tackle with new requirements such as scalability, resilience, high availability, Cloud/Hybrid readiness and new architecture styles based on microservices.
Although Yugabyte is relatively young company, it attracts a lot of attention, not only from architects/developers/admins, but also from the business.
On the following link you can find more about latest Yugabyte funding and company evaluation.
In the NewSQL field Yugabyte becomes second unicorn, after the CockroachDB
https://www.cockroachlabs.com/blog/explosive-growth-2021/
According to the DB Engine rankings, both distributed databases are on the rise which confirm NewSQL as an excellent concept for a new age.
On the following link you can find out more:
https://db-engines.com/en/ranking/relational+dbms
Although I haven’t tested CockroachDb yet, it shares the similar concept with Yugabyte: Raft protocol, Facebook’s RocksDB distributed key/value store and Postgres API on top.
According to the documentation, the major difference between them is in the level of compatibility with Postgres, where Yugabyte has higher level when compared to Cockroach.
You can check Cockroach compatibility with Postgres here:
https://www.cockroachlabs.com/docs/stable/postgresql-compatibility.html
and Yugabyte compatibility with Postgres here:
https://docs.yugabyte.com/latest/explore/ysql-language-features/
In essence, like most of the other new SQL databases, YugabyteDb is created by leveraging layer architecture:
- Query layer YSQL (Postgres API), YCQL (Cassandra API) and Yedis (Redis) which are endpoint that accept client requestst and is responsible for query compilation, execution plan optimization and query execution
- DocDB (Document Store) is distributed document store inspired by Google Spanner, created on top of Facebook RocksDb which has the following features
- write consistent based on Raft protocol
- failure resiliency based on the number of nodes/replicas
- automatic sharding and load balancing in case of hash partitioning
- Zone/region Cloud aware data placement based on the Tablespace concept
Despite YugabyteDb is Apache 2 licensed product, normally you’ll need to purchase Yugabyte Platform to easily install and manage your clusters and get support.
During the installation you can choose from the following list of different deployment options which should satisfy all your needs:
- Cloud (AWS, Azure, GCP)
- VM Ware Tanzu
- RedHat OpenShift
- Kubernetes
- On-premises
On the following picture you can see the starting screen of the Yugabyte Platform Web UI.
Three node cluster is the smallest production ready configuration you can have.
With 3 nodes, cluster can survive falling of one node with no interruptions.
I’ve tested that case while streaming data into YugabyteDB, and it really works like a charm.
Transactions handling in Yugabyte:
Transactions are the most critical part of every rdbms.
Yugabyte is ACID compliant distributed database that supports SERIALIZABLE and SNAPSHOT (REPEATABLE READ) isolation levels, while READ COMMITED is mapped to REPEATABLE READ.
When transaction needs to update multiple rows across multiple shards/tablets and across multiple nodes, Yugabyte relies on globally synchronized clock called Hybrid Logical Clocks (HLCs) which is similar with True Time in terms of Google Spanner.
HLC is critical for ordering events (transactions) across multiple shards/nodes since it has a tight error bounds among nodes in the cluster.
This is much better than to use ancient 2PC (two phase commit) for committing distributed transactions.
Yugabyte “performance tests”
Reason why I put double quotes around performance tests is because they are not classical performance tests on production cluster.
Main goal of tests that I’ve performed is to try some feature on my local VirtualBox cluster to see conceptually how it works.
For that reason, instead of real measurement, I’ll use terms like “faster” or “slower”, and if you want the real numbers, you can try the same test on your system to get the real values.
I’m comparing Yugabyte with Postgres 13 single instance database, two completely different rdbms who share the same interface.
It is also important to note that Postgres has fewer options when you need to satisfy HA and scale out requirements along with the volume of data.
Test 1: Loading 1.5 million records in a table
For initial load, I’m getting better result with Postgres, mainly due to work that distributed database need to perform behind the scenes, starting with table sharding, replication (3 times for three node cluster), indexing, network round trips etc.
For that reason, this is expected result.
Test 2: Reading 1.5 million records
To read all rows from the table, Yugabyte needs to read all shards across all nodes.
Even in single instance rdbms, reading all rows from hash partitioned tables are slower than from unpartitioned table.
If you add shards placed on different nodes plus network latency on top of that, it is expected that Postgres is faster than YugabyteDb.
Test 3: Reading small number of records
In this test Yugabyte and Postgres both return result in approximately the same amount of time.
Therefore, if you have index on column that avoids FTS (full table scan), and your query returns small number of rows (WHERE condition on highly selective column in query) is best use case.
Reason for that is HASH partitioned table on shards, and cached indexes across nodes.
There might be some variants in results which depend on whether all rows are coming from one shard/node or not.
This is a kind of load you can expect in microservice based architecture.
Here are couple of slides from the performance tests:
Execution plan tests:
In this test I’m using famous SCOTT schema from Oracle examples which is imported into Postgres and YugabyteDB.
Here is explain plan for query of interest in Postgres:
explain analyze
select * from test_in.emp e, test_in.dept d where e.deptno = d.deptno;
Hash Join (cost=23.95..40.02 rows=480 width=244) (actual time=11.210..11.700 rows=14 loops=1)
Hash Cond: (e.deptno = d.deptno)
-> Seq Scan on emp e (cost=0.00..14.80 rows=480 width=142) (actual time=6.069..6.072 rows=14 loops=1)
-> Hash (cost=16.20..16.20 rows=620 width=102) (actual time=5.040..5.169 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on dept d (cost=0.00..16.20 rows=620 width=102) (actual time=5.014..5.142 rows=4 loops=1)
Planning Time: 29.897 ms
Execution Time: 11.763 ms
The same query in case of YugabyteDb:
explain analyze
select * from emp e, dept d where e.deptno = d.deptno;
Nested Loop (cost=0.00..213.89 rows=1000 width=244) (actual time=36.547..65.982 rows=14 loops=1)
-> Seq Scan on emp e (cost=0.00..100.00 rows=1000 width=142) (actual time=32.725..32.924 rows=14 loops=1)
-> Index Scan using dept_pkey on dept d (cost=0.00..0.11 rows=1 width=102) (actual time=2.302..2.302 rows=1 loops=14)
Index Cond: (deptno = e.deptno)
Planning Time: 0.408 ms
Execution Time: 66.180 ms
Here it is important to note is that Postgres will use Hash join, while Yugabyte will use Nested loop.
If you look at the estimated costs/rows, Yugabyte is using predefined values (rows=1000), while Postgres estimation is accurate (it’s cost based, not heuristics based).
All in all, Postgres CBO generates better execution plan than Yugabyte, and execution time for this test is 6x faster.
Reason for that is complexity that every distributed database has to face with: locality, network round trips across the nodes, sharding by HASH by default etc.
My guess is that might be a reason why YugabyteDb allows you to use hints by enabling pg_hint_plan Postgres extension by default.
The same extension doesn’t work (or at least I failed to make hint to work) with Postgres 13.
To correct execution plan in Yugabyte can be produced by using hints like in this example:
explain analyze
select /*+ HashJoin(e d) */ * from emp e, dept d where e.deptno = d.deptno;
Hash Join (cost=112.50..215.14 rows=1000 width=244) (actual time=4.333..4.522 rows=14 loops=1)
Hash Cond: (e.deptno = d.deptno)
-> Seq Scan on emp e (cost=0.00..100.00 rows=1000 width=142) (actual time=2.480..2.646 rows=14 loops=1)
-> Hash (cost=100.00..100.00 rows=1000 width=102) (actual time=1.829..1.830 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on dept d (cost=0.00..100.00 rows=1000 width=102) (actual time=1.799..1.814 rows=4 loops=1)
Planning Time: 0.304 ms
Execution Time: 4.722 ms
This time we have correct execution plan which is about 2x faster than Postgres, although estimations (e.g. rows = 1000) are still wrong.
Summary:
Yugabyte is modern, scalable, geo-distributed, resilient (every shard/tablet is replicated among data nodes) database, which is very easy to use and probably the most Postgres compatible distributed database on the market.
If you are coming from the Oracle world, you’ll learn to appreciate simplicity of all operations that are part of daily DBA routine such as backup and recovery, creating cluster of nodes, horizontal scalability and monitoring.
In a minute you can install and setup cluster either on the Cloud or on-prem or Kubernetes.
On a positive side I would also add readiness to listen by adding missing features or to deal with other issues.
It is important to understand what use cases Yugabyte is trying to solve.
If you are migrating legacy architecture & applications into a modern microservice based architecture where you are joining a couple of tables and execute queries with highly selective WHERE clause (usually primary key – foreign key relations), then YugabyteDb is a right choice for transactional/OLTP loads.
In other cases it might be better to stay with classical SQL databases such as Oracle/Postgres/MySQL/MS SQL.
What I miss the most is inability to leverage Yugabyte for data analysis/BI types of workload, which would probably require to add additional analytic columnar storage engine, in addition to the existing DocDB.
It means if you want to perform data analysis for AI or Business Intelligence, you’ll need to move data out by creating an ETL workflow and ingest data into BI database which adds complexity and expenses.
That is probably the reason why LIST partitioning is not supported yet, while RANGE partition is possible but not recommended to use due to possible partition imbalance.
However, since moving towards microservice based architecture is predominant, there will be more than enough cases where Yugabyte will be a perfect fit.
I’ve also found a couple of issues in database Web console where UI could be better, transaction save point, missing features (for example, it is only possible to perform recovery on database, but not on tablespace or table level) and bugs, but nothing unsolvable.
Since YugabyteDb will cover just a fraction of use cases, that means you’ll still need to maintain classical rdbms databases, which is something your DBAs don’t like (they want to maintain one Db for all cases).
Execution plans are weak point of YugabyteDb, but we need to understand the context first.
Oracle for example, is a database leader, and it has been developing RBO/CBO (Rule and Cost based optimizer) for decades, and it has been 24 years since Oracle introduced CBO in version 8.
Despite maturity and with all technologies implemented over the years (cardinality feedback, adaptive plans, SQL plan baselines, profiles, outlines, histograms etc.), it still often generates wrong execution plans, or decide to change the plan.
While Oracle and other rdbms are mainly single instance systems (everything is happening within a single namespace), with distributed databases it’s many times more complex.
This is why I would suggested for all distributed databases to write simple queries with PK-FK relations over a few tables, which is actually use case for microservice based architecture.
You might still need to correct optimizer to correct execution plans which is mainly rule based (based on heuristics not on real costs), and for that you can use pg_hint_plan.
In the next blog I’ll describe another wonderful distributed database, which is quite different from this one.
Disclaimer:
All impressions and opinions presented in this article are all mine.
If you are considering YugabyteDb or any other NewSQL distributed database, you will need to perform deep analysis for your particular case.
Comments