This post is a kind of extension of my previous post named “How to index only rows of interest”
which can be found on the following link:
https://www.josip-pojatina.com/en/how-to-index-only-rows-of-interest/
Having issues with pooling queries are very common in old, SOA based architectural style, where ESB (Enterprise Service Bus) are throwing the same query again and again just to check if there are new rows you need to fetch, no matter if such rows exist or not (this is defined by pool interval).
Such queries may adds significant (unnecessary) overhead to your database which, in case you are using commercial rdbms, will add additional license costs (commercial databases are usual licensed by CPU).
Additional issue with ESB (all popular Enterprise Service Bus software such as Oracle Service Bus, Tibco etc. share the similar principle of work) is in that you can only control the pool interval (frequency of how often you want to execute the query to check if there are new records or not), but not the query itself, since query is generated by ESB.
In this particular case, besides the old and inefficient architectural style and method used (pooling instead of CDC – Change Data Capture event pushing), additional mistake is in very complex pooling query with complex conditions within the WHERE clause.
Here is the original SQL pooling query:
SELECT /*+ index(A API_COMMANDS)*/MESSAGE_ID FROM API_COMMANDS A WHERE A.SYNC_STATUS IS NULL AND A.SOURCE_ID NOT LIKE '0%' AND A.SOURCE_ID NOT LIKE '%PRI' AND MOD (REGEXP_REPLACE(SOURCE_ID, '[^0-9]', ''),4)= 0 AND ROWNUM < 10 AND EXISTS ( SELECT 'x' FROM XCUS WHERE ACCOUNT_NUMBER=SUBSTR(A.SOURCE_ID,1,8) AND CONTRACT_NUMBER=SUBSTR(A.SOURCE_ID,9,3) AND CAT_CODE='PRI' )
The query takes 127,237 sec to comolete and here is its execution plan.
Take a note of a very high cost of execution.
To speedup the query, considering you can’t change the query itself since it is generated by the ESB, I’ll combine SQL Plan Baseline Oracle’s feature with virtual columns.
Here is what you need to do:
alter table jp_commands add virtualcol number generated always as
(case when sync_status is null and source_id not like '0%' and source_id not like '%PRI' then
(mod (regexp_replace(source_id, '[^0-9]', ''),4)) else Null end)
virtual;
create index jp_commands_virtualcol on jp_commands(virtualcol);
with tmp as
(SELECT /*+ index(a jp_commands_virtualcol)*/ MESSAGE_ID, SUBSTR(A.SOURCE_ID,1,8) as ACCOUNT_NUMBER, SUBSTR(A.SOURCE_ID,9,3) as CONTRACT_NUMBER
FROM jp_commands a
WHERE
virtualcol = 0
)
select * from tmp a
where EXISTS ( SELECT /*+ index(xcu cucont_pk) */ 'x' FROM XCU WHERE ACCOUNT_NUMBER=a.ACCOUNT_NUMBER AND CONTRACT_NUMBER=a.CONTRACT_NUMBER AND CAT_CODE='PRI' )
and rownum < 10;
After the change, pooling query now takes 3,067 sec which is 40,2x performance boost when compared with original query, and down below you can find a new execution plan.
Additionally, you can see significantly lower value for the execution plan.
After some talks with the folks involved I realized that the reason for REGEX part of the query
“REGEXP_REPLACE(SOURCE_ID, ‘[^0-9]’, ”),4)= 0”
was because of the very poor performance of the pooling query, where the purpose of REGEX was to create 4 parallel threads which actually add significant overhead on the source Oracle database.
After removing the REGEX part from the query (virtual column and its corresponding function based index), we get the pooling query to run in 0,037 sec which is 3.330,2x (almost three and a half thousand times faster) performance boost over the initial query.
Summary:
This is a really good example why you need to follow advantages in IT.
By replacing pooling in SOA driven architecture with event driven CDC (Change Data Capture) architecture style, you will avoid the performance issues from the beginning.
However, even with older architecture styles you can avoid performance issues by having a good database design which will be reflected in simpler WHERE clause in pooling queries.
Thus by correcting database/schema design, you will also achieve excellent performance.
Unfortunately, correcting the database design is time consuming since you’ll need to change the surrounding code, which can take a long time to complete.
Luckily, in this case it was an Oracle database in the background which allows me to correct all above mentioned mistakes and restrictions (SQL code is generated by the ESB, thus you can’t change it in a way to influence the execution plan).
In case of different rdbms (e.g. Postgres, MySQL, MS SQL Server…) it might be more difficult or even impossible to do the same.
Comments