Connection pool has always been a great way to ensure a low latency when establishing connection with a database, while at the same time keeping the number of open sessions under control.
It’s one of the best ways to balance speed with resource consumption.
With connection pool in place, connection is already established and ready to receive client requests for service with a minimal latency.
Additionally, connection pool also protects the database server by limiting the maximum number of connections that can be used.
Vast majority of connection pools share common parameters to configure pool such as:
- Maximum pool size – maximum size that the pool can reach, measured in number of connections.
- Minimum pool size – minimum number of open idle connections that will always be available in the pool.
- Connection timeout – max number of milliseconds that a client will wait for connection from the pool.
- Idle Timeout – max number of milliseconds that connection will be idle in the pool. After the time is reached, pool will automatically close the connection, assuming that a new number of connections in the pool will be grater or equal to minimum pool size.
There are a plenty of other parameters (e.g. auto commit, prepare cache etc.), but previously listed parameters are the most important.
Despite the fact that connection pool is rather old technology available for a long time (especially with three tier architecture), it is also one of the most misunderstood technology overall.
With three tier architecture style, Middleware admins will just guess and throw some numbers for connection pool parameters, without understanding the technology in the background.
Luckily, at that time, they usually cannot make a big mistake.
On the picture above, you can see the classical three tier architecture.
Basically, you have clients that connect to Middleware server (AppServer or ESB – Enterprise service bus) where you can define connection pool, and Database server at the end.
You can define domains and clusters of machines on Middleware server side, but usually small domains or cluster of a few machines will cover all your needs.
As long as the number of Middleware is small, you can’t make a huge mistake with overestimating the connection pool size, and other important parameters, even without knowing the technology behind.
With a modern architecture styles, things are starting to get complicated.
On the slide above you can see the classical problem with Microservice architecture deployed on Kubernetes with many pods, where each pod has its own connection pool.
Since the number of pods can easily reach large numbers, you can end up with a huge number of idle connections on the database server which can slow down instead of improving performance.
You can use the following command to figure out what is going on:
[user@server ~]$ netstat -tn 2>/dev/null | grep :5432 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head
395 <OpenShift IP address>
18 <IP address 1>
6 <IP address 2>
1 <IP address 3>
By using the command above, you can observe that most connections are coming from one server (OpenShift in this case), where many poods are establishing connection pool targeting the same database.
Keeping hundreds or thousands of (mostly) idle connection is, for many reasons (context switching, starting the inactive process, memory distribution on the Db side etc.), against “best practices”.
In a lack of knowledge, people are trying to fix the problem by introducing Database connection pool.
If you are using Oracle, you could leverage RCP – Resident Connection Pooling, which is introduced in 11g (long time ago).
In case of Postgres, you might use PgBouncer as you can see on the following slide.
Although at the beginning database connection pool looks promising to fix lack of knowledge problem, you should also understand the downside of using that technology as well (e.g. failover or, in a case of PgBouncer issue with prepared statements etc).
For that reason, it’s always the best to tune connection pool parameters by performing your typical load.
In couple of iterations, you’ll end up with correct parameters without using an extra component (Db side connection pool), but still using modern, resilient architecture such as Microservices based architecture.
At the end, I want to show one real world example, where by reducing the connection pool from 320 → 40 and changing the pool parameters, results for the three critical database SQL/functional calls has been improved by up to the 50%, and resource consumption (CPU, memory, DB time etc.) was down by 20%.
With connection pool saying – “less is more”, most accurately describe the technology behind.
Comments