One of the largest table in Oracle Retail is the TRAN_DATA_HISTORY table which holds stock ledger financial transaction data history.
Due to business reporting reason, customer decided to disable salprg Pro*C batch which deletes TRAN_DATA_HISTORY records that are older than the user defined number of retention days defined in SYSTEM_OPTIONS table in the column tran_data_retained_days_no.
For that reason, TRAN_DATA_HISTORY table reach 1 Tb size, and batches/queries were running slowly.
Default partitioning strategy is range partitioning with 13 values based on TRAN_DATE column.
After performing detail analyze, there is another one column used in all queries: LOCATION
For that reason I decided to use both columns, one for partition and the other one for the sub partition.
There are two kind of tests I wanted to perform:
1.
To check if order of columns has influence on performance (to choose first LOCATION for partition and TRAN_DATE for sub partition, and other case when I choose TRAN_DATE for partition and LOCATION for sub partition).
2.
If there are influence regarding partitioning type on query and DML performance (YEAR will always be range partitioned (by column TRAN_DATE), but I change partitioning type for the LOCATION column: range, list and hash).
In case that I’m going to describe here, there are about 1300 locations and 4 years of data history.
I’m using the following SQL statement that targets both: partition and sub partition (partition elimination):
select * from tdh
where
location = 96451
and tran_date between to_date('01.01.2016', 'dd.mm.rrrr') and to_date('22.05.2016', 'dd.mm.rrrr');
——————————
LOCATION – YEAR
——————————
HASH – RANGE 41,28 sec
RANGE – RANGE 49,51 sec
LIST – RANGE 40,22 sec
——————————
YEAR – LOCATION
——————————
RANGE – HASH %
RANGE – RANGE %
RANGE – LIST 39,56 sec
Summary for first question: Table partitioned by LOCATION and sub partitioned by YEAR has identical performance as table partitioned by YEAR and sub partitioned by LOCATION.
You have to decide which column should be use as partition will probably be decided by some logic like archiving for example. If your table is of type YEAR – LOCATION, then it should be easier to archive or delete year 2013 for example.
To answer the second question, I’ve partitioned LOCATION column first by list, then by by range and at the end by hash, and leave YEAR partitioning as before (range).
Summary for second question:
You cannot make a big mistake with partition types if you are in position to choose among range-list-hash partition types as partition type doesn’t influence much on SELECT query performance.
Very often there could be some other reasons to choose particular partition type. For example, if the number of LOCATION will increase, you’ll have to perform split partition maintenance in case of range or list partition types.
If the column on which partitions are based is of DATE type, it is natural to choose RANGE partition over LIST.
But in case like this, where the column on which partitions will be created is of type NUMBER (or VARCHAR) with distinct values, it would be better to choose LIST partition type, because it will be easier to identify partition that must be split.
With hash partition type, you don’t have to perform any partition maintenance.
Once the partitions becomes too large, you can easily add 2^N partitions, and Oracle will equally redistribute rows among all available partitions.
It’s clear to see that hash partitioning type is the most elegant type as it require the least amount of partition maintenance, equally rows distribution and excellent query performance when you are using hash column in the where condition.
There is still one question that I didn’t answer.
All results comes from the SELECT statement.
What are the best option for DML (and batch loading)?
I’ve created two tables: first for LOCATION – YEAR partition – sub partition types and the second one for the YEAR – LOCATION combination, and execute the following statement:
insert /*+ append */ into tdh_test
select * from rms.tran_data_history where rownum < 5000000;
——————————
LOCATION – YEAR
——————————
HASH – RANGE 13,73 sec
RANGE – RANGE 13,96 sec
LIST – RANGE 24,35 sec
——————————
YEAR – LOCATION
——————————
RANGE – HASH 14,28 sec
RANGE – RANGE 16,35 sec
RANGE – LIST 26,76 sec
Summary of results:
Ordering of columns used for build partition – sub partition table (LOCATION – YEAR or YEAR – LOCATION) doesn’t have influence on DML performance .
But, as can be seen from the results, if you are using LIST type to create either partitions or sub partitions, DML performance, in this case, were 2 times slower comparing it with HASH or RANGE types.
I didn’t show DML execution statistics here (to make a blog a bit shorter), but results for LIST partition (recursive calls, db block gets, consistent gets) are even better than results for HASH and PARTITION types.
To be able to find out why DML (INSERT) are almost 2 times slower for LIST partition type, I did tracing for two cases for LOCATION – YEAR: first LIST – RANGE and after that RANGE -RANGE.
This is summary for the LIST -RANGE:
Sort options: prsela exeela fchela
1 session in tracefile.
2 user SQL statements in trace file.
40 internal SQL statements in trace file.
42 SQL statements in trace file.
36 unique SQL statements in trace file.
283608 lines in trace file.
48 elapsed seconds in trace file.
and this is for RANGE – RANGE
Sort options: prsela exeela fchela
1 session in tracefile.
2 user SQL statements in trace file.
4 internal SQL statements in trace file.
6 SQL statements in trace file.
6 unique SQL statements in trace file.
2623 lines in trace file.
32 elapsed seconds in trace file.
As you can see, in case of LIST partition for LOCATION, there are 42 SQL statements (36 unique), and trace file has 283608 lines.
In case of RANGE partition for LOCATION, there are only 6 SQL statements and trace file has only 2623 lines.
This is one of 42 SQL statements:
select type#,blocks,extents,minexts,maxexts,extsize,extpct,user#,iniexts,
NVL(lists,65535),NVL(groups,65535),cachehint,hwmincr, NVL(spare1,0),
NVL(scanhint,0),NVL(bitmapranges,0)
from
seg$ where ts#=:1 and file#=:2 and block#=:3
with elapsed time 2,18 sec.
The other one is:
select obj#, dataobj#, subpart#, hiboundlen, hiboundval, flags, ts#, file#,
block#, pctfree$, initrans, maxtrans, analyzetime, samplesize, rowcnt,
blevel, leafcnt, distkey, lblkkey, dblkkey, clufac, spare2,
length(bhiboundval), bhiboundval
from
indsubpart$ where pobj# = :1 order by subpart#
with 1,44 sec etc.
As you can see, there are a lot of recursive SQL statements that doesn’t exist in case of RANGE – RANGE partition type.
For that reason, LIST partitioning always produce the slowest result against INSERT statements.
Comments
2022-12-12 09:09:13