Josip Pojatina

This is a classical example of what you can achieve by combining several existing features and implemented in a new way, not described within Oracle docs.

Here I’m using the fact that Oracle won’t add entry into the index structure if indexed column value is null, in combination with the function-based indexes functionality.

I’ve created small table (255000 records, with only few narrow columns), with one column of interest named row_status.

That column can have only two values:
– P for process
– X for records not yet processed

Rows with the later status (‘X’) is of our interest, that we’d like to index.

Scenario like this one is pretty common, especially when you have some ETL Data Warehousing processing, or even more often when you need to integrate your Db with external system (for example through the JMS service – Oracle Service Bus or similar product from IBM/JBoss…, as is a case described here).

1.
In the first example, I want to show time and statistics for the full table scan.


SCOTT@demo> select /*+ full(t) */ * from ttest t where row_status = 'X';

229 rows selected.

Elapsed: 00:00:00.16

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       6129  consistent gets
          0  physical reads
          0  redo size
      29756  bytes sent via SQL*Net to client
        546  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        229  rows processed

As table is very small (full table scan within 0.1 sec), we are not expecting to see huge difference in time.
But we have 6129 consistent gets to get the proper result.

Explain plan confirms we are using FTS.


explain plan for
select /*+ full(t) */ * from ttest t where row_status = 'X';


SET LINESIZE 130
set pagesize 0
select * from table(dbms_xplan.display);

Plan hash value: 1227254841                                                                                                       
                                                                                                                                  
---------------------------------------------------------------------------                                                       
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |                                                       
---------------------------------------------------------------------------                                                       
|   0 | SELECT STATEMENT  |       |   127K|    20M|  1710   (2)| 00:00:01 |                                                       
|*  1 |  TABLE ACCESS FULL| TTEST |   127K|    20M|  1710   (2)| 00:00:01 |                                                       
---------------------------------------------------------------------------

2.
In the second example, I’m creating regular index on top of the row_status column to see how efficient query will execute.


create index ttest_i1 on ttest(row_status) tablespace users nologging;     

SCOTT@demo> select /*+ index(t, ttest_i1) */ * from ttest t where row_status = 'X';    

229 rows selected.

Elapsed: 00:00:00.04

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        235  consistent gets
          2  physical reads
          0  redo size
      43612  bytes sent via SQL*Net to client
        546  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        229  rows processed

As you can notice, we get dramatically faster execution time (0.04 sec instead of 0.16 sec), but as the table is relatively small, we won’t noticed that.

It is interesting to see: 235 consistent gets, instead of 6129, what we had in the first test.

As in the first test, explain plan is here just to confirm that created index has been used.


explain plan for
select /*+ index(t, ttest_i1) */ * from ttest t where row_status = 'X';    

SET LINESIZE 130
set pagesize 0
select * from table(dbms_xplan.display);

Plan hash value: 594085215                                                                                                        
                                                                                                                                  
----------------------------------------------------------------------------------------                                          
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |                                          
----------------------------------------------------------------------------------------                                          
|   0 | SELECT STATEMENT            |          |   127K|    20M|  3419   (1)| 00:00:01 |                                          
|   1 |  TABLE ACCESS BY INDEX ROWID| TTEST    |   127K|    20M|  3419   (1)| 00:00:01 |                                          
|*  2 |   INDEX RANGE SCAN          | TTEST_I1 |   127K|       |   234   (1)| 00:00:01 |                                          
----------------------------------------------------------------------------------------

3.
In the third test I’m going to index only rows of interest (rows where row_status = ‘X’).


drop index ttest_i1;

create index ttest_i9 on ttest(case row_status when 'X' then 'X' end) tablespace users nologging;     

SCOTT@demo> select /*+ index(t, ttest_i9) */ * from ttest t where (case row_status when 'X' then 'X' end) = 'X';

229 rows selected.

Elapsed: 00:00:00.01

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        235  consistent gets
          0  physical reads
          0  redo size
      30435  bytes sent via SQL*Net to client
        546  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        229  rows processed

This time we get even better execution time (0.01 sec instead of 0.04 sec), but it is interesting to see how we get better results.

If you look at the statistics, number of consistent gets (235) is identical as in previous example.

Explain plan is here just to confirm that we are using created function-based index.


explain plan for
select /*+ index(t, ttest_i9) */ * from ttest t where (case row_status when 'X' then 'X' end) = 'X';

SET LINESIZE 130
set pagesize 0
select * from table(dbms_xplan.display);


Plan hash value: 1858771934                                                                                                       
                                                                                                                                  
----------------------------------------------------------------------------------------                                          
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |                                          
----------------------------------------------------------------------------------------                                          
|   0 | SELECT STATEMENT            |          |  2557 |   414K|   230   (0)| 00:00:01 |                                          
|   1 |  TABLE ACCESS BY INDEX ROWID| TTEST    |  2557 |   414K|   230   (0)| 00:00:01 |                                          
|*  2 |   INDEX RANGE SCAN          | TTEST_I9 |   229 |       |     1   (0)| 00:00:01 |                                          
----------------------------------------------------------------------------------------

To be able to understand how we get better time results where everything else looks to be identical, we need to get out more information from the ttest_i1 (regular, B-tree index) and ttest_i9 (function-based index).


analyze index ttest_i9 validate structure;

select name, btree_space, lf_rows, height from index_stats;

NAME                                                       BTREE_SPACE    LF_ROWS     HEIGHT
------------------------------ --------------------------------------- ---------- ----------
TTEST_I9                                                          7996        229          1


create index ttest_i1 on ttest(row_status) tablespace users nologging;     

analyze index ttest_i1 validate structure;


select name, btree_space, lf_rows, height from index_stats;

NAME                                                       BTREE_SPACE    LF_ROWS     HEIGHT
------------------------------ --------------------------------------- ---------- ----------
TTEST_I1                                                       3718172     255688          2

Now it’s clear what is it about.

In the case of function-based index, where only rows of interest (row_status = ‘X’) has been indexed, we have much smaller and more efficient index.

For example, index height is 1 instead of 2 with regular index, but even more important is the number of leaf rows (229), which is exactly the number of rows which we’d like to have indexed (rows where row_status = ‘X’).

In the case of regular B-tree index, the number of leafs rows is 255688, which is the number of rows in the whole table.

Every index range scan in case of function-based index (test 3) will be much more efficient then with regular index.

You can also take a look at the index space (BTREE_SPACE), which is also much smaller in case of function-based index (7996) versus regular, B-tree index (3718172).

4.
There is one more think that can be done (from 11g+ version), which won’t improve performance, but will allow us to write more logical and compact SQL statements that are using row_status column. The name of that “thing” is Virtual columns.

Here we are not only going to create virtual column, but also an index will be created on top of it, to create “indexed virtual column”.

In the background, Oracle will create function based index, identical as in the previous example.

First I’ll create a new table (identical as ttest)


create table ttest_vc tablespace users nologging 
as
select * from ttest;

and add virtual column on it.


alter table ttest_vc add (row_status2 varchar2(32) generated always as (case row_status when 'X' then 'X' end) virtual);

Now let’s create index on virtual column.


create index ttest_vc_i9 on ttest_vc(row_status2) tablespace users nologging;     

SCOTT@demo> select /*+ index(t, ttest_vc_i9) */ * from ttest_vc t where row_status2 = 'X';

229 rows selected.

Elapsed: 00:00:00.01

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        235  consistent gets
          0  physical reads
          0  redo size
      30492  bytes sent via SQL*Net to client
        546  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        229  rows processed

Again, explain plan is here just to be sure we are using index created on virtual column.


explain plan for
select /*+ index(t, ttest_vc_i9) */ * from ttest_vc t where row_status2 = 'X';

SET LINESIZE 130
set pagesize 0
select * from table(dbms_xplan.display);


Plan hash value: 2461066658                                                                                                       
                                                                                                                                  
-------------------------------------------------------------------------------------------                                       
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |                                       
-------------------------------------------------------------------------------------------                                       
|   0 | SELECT STATEMENT            |             |  2227 |  1207K|     2   (0)| 00:00:01 |                                       
|   1 |  TABLE ACCESS BY INDEX ROWID| TTEST_VC    |  2227 |  1207K|     2   (0)| 00:00:01 |                                       
|*  2 |   INDEX RANGE SCAN          | TTEST_VC_I9 |   891 |       |     1   (0)| 00:00:01 |                                       
-------------------------------------------------------------------------------------------                   



analyze index ttest_vc_i9 validate structure;

select name, btree_space, lf_rows, height from index_stats;

NAME                                                       BTREE_SPACE    LF_ROWS     HEIGHT
------------------------------ --------------------------------------- ---------- ----------
TTEST_I9                                                          7996        229          1

As you can see, the results with virtual column and index created on top of it, ensures identical results as function based index (consistent gets, index characteristics, time etc).

The only difference is that now I can write SQL statement in a more logical way.

Instead of using function inside the WHERE clause, which must be identical as function in CREATE INDEX statement like in example 3:


select /*+ index(t, ttest_i9) */ * from ttest t where (case row_status when 'X' then 'X' end) = 'X';

, in case of virtual column/function-based index combination I can write SQL as usual, and virtual column will translate it transparently to the user.


select /*+ index(t, ttest_vc_i9) */ * from ttest_vc t where row_status2 = 'X';

Comments

There are no comments yet. Why not start the discussion?

How to index only rows of interest

Comments

Leave a Reply Cancel reply

Blog

Tags