Last week I’ve been faced with issue that is slowing down production system for more than 30 minutes. During that time, users were unable to work, Cluster services were failing and system was unresponsive.
After completing some initial investigation, I’ve found the problematic SQL:
SELECT
/*+ PARALLEL*/
TRAN_NO,
REF_NO1,
REF_NO25
FROM SA_TRAN_HEAD
WHERE STATUS <> 'D'
AND TRAN_NO IS NOT NULL
AND REF_NO1 IS NOT NULL
AND REF_NO25 IS NOT NULL
GROUP BY TRAN_NO,
REF_NO1,
ref_no25
HAVING COUNT(1) > 1;
Execution plan:
----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 25959 (100)| | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 6715K| 204M| | 25959 (1)| 00:05:12 | | | Q1,01 | P->S | QC (RAND) |
| 3 | FILTER | | | | | | | | | Q1,01 | PCWC | |
| 4 | HASH GROUP BY | | 6715K| 204M| 6227M| 25959 (1)| 00:05:12 | | | Q1,01 | PCWP | |
| 5 | PX RECEIVE | | 147M| 4500M| | 14412 (1)| 00:02:53 | | | Q1,01 | PCWP | |
| 6 | PX SEND HASH | :TQ10000 | 147M| 4500M| | 14412 (1)| 00:02:53 | | | Q1,00 | P->P | HASH |
| 7 | PX BLOCK ITERATOR | | 147M| 4500M| | 14412 (1)| 00:02:53 | 1 | 64 | Q1,00 | PCWC | |
| 8 | TABLE ACCESS FULL| SA_TRAN_HEAD | 147M| 4500M| | 14412 (1)| 00:02:53 | 1 | 384 | Q1,00 | PCWP | |
----------------------------------------------------------------------------------------------------------------------------------------------
As the table affected by parallel hint is big one (more than half billion rows), I figure out what was going on.
There are two issues in this SQL:
1.
Developer didn’t manage to describe desirable execution plan completely. That is one of the curses related to using hints, as you must know exactly what are you doing.
2.
Second problem is that PARALLEL hint without any limitation is very dangerous.
In this case, execution plan has changed, and instead of using index FFS, it is using FTS on very large table, with no limitation on parallelism.
As a result, I’ve found more than 200+ processes involved in SQL execution, 4 Gb memory drop and 4 Gb of swaping and AWR is showing 70 thousand seconds for executing that query.
To fix this I have to do what I described in points 1 & 2: to full describe desired execution plan, and to put limits in number of parallelism used for executing the query.
select
/*+ index_ffs (th sa_tran_head_ix) parallel_index(th sa_tran_head_ix 4) */
TRAN_NO,
REF_NO1,
ref_no25
FROM rms.SA_TRAN_HEAD th
WHERE STATUS <> 'D'
AND TRAN_NO IS NOT NULL
AND REF_NO1 IS NOT NULL
AND REF_NO25 IS NOT NULL
GROUP BY TRAN_NO,
REF_NO1,
REF_NO25
having count(1) > 1;
------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 1197K(100)| | | | |
| 1 | PX COORDINATOR | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 8141K| 248M| | 1197K (2)| 00:00:31 | Q1,01 | P->S | QC (RAND) |
|* 3 | FILTER | | | | | | | Q1,01 | PCWC | |
| 4 | HASH GROUP BY | | 8141K| 248M| 7079M| 1197K (2)| 00:00:31 | Q1,01 | PCWP | |
| 5 | PX RECEIVE | | 167M| 5115M| | 486K (1)| 00:00:13 | Q1,01 | PCWP | |
| 6 | PX SEND HASH | :TQ10000 | 167M| 5115M| | 486K (1)| 00:00:13 | Q1,00 | P->P | HASH |
| 7 | PX BLOCK ITERATOR | | 167M| 5115M| | 486K (1)| 00:00:13 | Q1,00 | PCWC | |
|* 8 | INDEX FAST FULL SCAN| SA_TRAN_HEAD_IX | 167M| 5115M| | 486K (1)| 00:00:13 | Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------------------------
AWR clearly shows that a optimized version of the query dropped significant from 70.101 sec—> 1.528, which is almost 47 times better results.
In addition, memory consumption and swapping have also dropped from 4 Gb —> 0,6 Gb which means not only that optimized version is many times faster, but the resource consumption is several times smaller than it was before optimization.
Conclusions:
1.
Use hints only when you know exactly what are you doing.
2.
Take care to describe desired execution plan with enough amount of hints.
Comments