Context switching is one of the most overlooked topics when tuning the Oracle database, as you can’t simply see it directly (e.g. “context switching” wait event doesn’t exist).
Instead you should use indirect approach (through the increased CPU consumption), as it’s recorded as a CPU wait event.
Context switching what it is and what impact on SQL performance it has?
SQL – as nonprocedural, declarative language, has separate execution engine within Oracle database kernel.
PL/SQL on the other side is a typical procedural language with its own execution engine.
If you want to call PL/SQL function from SQL, Oracle must switch between execution engines which brings an overhead as you need to pass all variables, states… between two different engines (alias context switching).
Despite overhead is usually very small, context switching is happening for each row.
In case you have 10.000 rows, it means you’ll call PL/SQL function 10.000 times.
If overhead for each call is 0.01 sec, you’ll end up with 100 seconds of overhead and increased CPU consumption.
Although Oracle SQL has advanced incredibly through the history, it is still not possible to code everything inside SQL, which means sometimes it’s inevitable to use PL/SQL function calls.
Still, in vast majority of cases, context switching is triggered by lack of knowledge of SQL capabilities, developer’s laziness and ignorance of new SQL functionality offered by each release of Oracle database.
The most important is to know is what impact of context switching you can expect and how to reduce it.
Let’s start with finding the answers on those questions.
The best place to start with is to take a look at Top SQL order by CPU in AWR report.
If you observe SQL’s with significant CPU consumption, and after inspecting SQL code you find function call, you need to dig further to check the PL/SQL code.
Through this article I’m going to use SH sample schema, more specifically sales and customers tables, which data model is presented on the picture below.
To simulate context switching I need to create a simple function:
create or replace function fcust_name(p_cust_id in number) RETURN varchar2
as
l_cust_name varchar2(256);
begin
select cust_first_name || ' ' || cust_last_name into l_cust_name from customers where cust_id = p_cust_id;
return l_cust_name;
end;
and to execute the following code:
SH@orcl> select fcust_name(s.cust_id), s.* from sales s;
918843 rows selected.
Elapsed: 00:00:15.67
Statistics
----------------------------------------------------------
918844 recursive calls
0 db block gets
2767309 consistent gets
16 physical reads
0 redo size
40204257 bytes sent via SQL*Net to client
101691 bytes received via SQL*Net from client
9190 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
918843 rows processed
As you can see, elapsed time is 15.67 seconds to fetch almost 1 million records.
If I execute SQL that will fetch the same result but this time without using PL/SQL function calls, I’m getting the following results.
SH@orcl> select c.cust_first_name || ' ' || c.cust_last_name , s.* from sales s, customers c where s.cust_id = c.cust_id;
918843 rows selected.
Elapsed: 00:00:01.98
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
12301 consistent gets
1472 physical reads
0 redo size
23926382 bytes sent via SQL*Net to client
101691 bytes received via SQL*Net from client
9190 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
918843 rows processed
The code completes in 1.98 seconds which is 7x faster (15.67 sec was result with a function calls).
While in the first case I had 918844 recursive calls, in the second case I have only 1 which is indicator what is going on behind the scene.
To further investigate what is going on you need to create 10024 trace files for both cases.
Here is the execution plan for SQL + PL/SQL function calls case.
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
918843 918843 918843 PARTITION RANGE ALL PARTITION: 1 28 (cr=10780 pr=0 pw=0 time=589642 us starts=1 cost=517 size=26646447 card=918843)
918843 918843 918843 TABLE ACCESS FULL SALES PARTITION: 1 28 (cr=10780 pr=0 pw=0 time=412394 us starts=28 cost=517 size=26646447 card=918843)
Execution plan for the second SQL (no PL/SQL function calls) seems to be more complex.
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
918843 918843 918843 HASH JOIN (cr=12303 pr=1269 pw=0 time=637267 us starts=1 cost=942 size=45023307 card=918843)
55500 55500 55500 NESTED LOOPS (cr=1522 pr=0 pw=0 time=31512 us starts=1 cost=942 size=45023307 card=918843)
55500 55500 55500 NESTED LOOPS (cr=1522 pr=0 pw=0 time=24349 us starts=1)
55500 55500 55500 STATISTICS COLLECTOR (cr=1522 pr=0 pw=0 time=17827 us starts=1)
55500 55500 55500 TABLE ACCESS FULL CUSTOMERS (cr=1522 pr=0 pw=0 time=10409 us starts=1 cost=423 size=1110000 card=55500)
0 0 0 PARTITION RANGE ALL PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us starts=0)
0 0 0 BITMAP CONVERSION TO ROWIDS (cr=0 pr=0 pw=0 time=0 us starts=0)
0 0 0 BITMAP INDEX SINGLE VALUE SALES_CUST_BIX PARTITION: 1 28 (cr=0 pr=0 pw=0 time=0 us starts=0)(object id 74422)
0 0 0 TABLE ACCESS BY LOCAL INDEX ROWID SALES PARTITION: 1 1 (cr=0 pr=0 pw=0 time=0 us starts=0 cost=517 size=493 card=17)
918843 918843 918843 PARTITION RANGE ALL PARTITION: 1 28 (cr=10781 pr=1269 pw=0 time=262488 us starts=1 cost=517 size=26646447 card=918843)
918843 918843 918843 TABLE ACCESS FULL SALES PARTITION: 1 28 (cr=10781 pr=1269 pw=0 time=248242 us starts=28 cost=517 size=26646447 card=918843)
Trick is that by looking at execution plan you want be able to see the problem as part of PL/SQL is missing.
In the SQL + PL/SQL function call example there is another SQL (recursive call) you need to inspect:
SELECT CUST_FIRST_NAME || ' ' || CUST_LAST_NAME
FROM
CUSTOMERS WHERE CUST_ID = :B1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 TABLE ACCESS BY INDEX ROWID CUSTOMERS (cr=3 pr=2 pw=0 time=1259 us starts=1 cost=2 size=20 card=1)
1 1 1 INDEX UNIQUE SCAN CUSTOMERS_PK (cr=2 pr=2 pw=0 time=1254 us starts=1 cost=1 size=0 card=1)(object id 74388)
Now we clearly see that main SQL execution plan for SQL + PL/SQL function call case is incomplete, but we still have no evidence where 7x larger elapsed time is coming from.
To answer that question we need to look at the different part of the SQL trace file.
In SQL + PL/SQL function call example, we have the following fragment:
select fcust_name(s.cust_id), s.*
from
sales s
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 9190 25.43 31.53 0 10780 0 918843
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 9192 25.44 31.54 0 10780 0 918843
As you can see. 25.44 seconds was spent on CPU (fetch phase as it’s SELECT statement).
Furthermore, you can find within the same trace file the following:
SELECT CUST_FIRST_NAME || ' ' || CUST_LAST_NAME
FROM
CUSTOMERS WHERE CUST_ID = :B1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 918843 23.21 27.36 0 0 0 0
Fetch 918843 3.09 4.02 106 2756529 0 918843
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1837687 26.30 31.38 106 2756529 0 918843
From previous output you can see actual number of function calls (918843) and CPU consumption (26.3 sec, where majority of time is coming from execute phase).
Thus this is the point where we spent majority of elapsed time.
By now you should be able to find out what impact of context switching you might expect.
Let’s see context switching impact from different perspective – by using top or similar Unix/Linux tool.
On the following picture you can find CPU consumption while executing SQL + PL/SQL function call case.
As you can see, CPU consumption, taking into account it’s for just one, single SQL that runs in one thread (not in parallel) is huge.
Let’s see the same SQL without PL/SQL function calls.
Despite CPU load is significantly lower when comparing it with context switching case, it’s still not neglectable, as SQL engine needs to hard parse the SQL when executing for the first time.
In the following picture you can see the load for executing the same SQL for the second time (meaning without hard parsing).
CPU load in case without context switching and with no hard parsing is almost neglectable.
How to fix context switching performance issues?
The best approach is to rewrite SQL & PL/SQL in a way that all logic can be performed in SQL.
Keep in mind that over a decades, Oracle has enhanced SQL language significantly.
Thus many cases where you used to use SQL + PL/SQL function call approach, today can be rewritten to use only SQL.
In rare cases where removing PL/SQL function calls is not possible, you need to check a logic behind to reduce the number of PL/SQL calls, as each call counts.
Additionally you can check cashing (there are many ways like context switching, table caching, materialized views, function cache, collections…).
Summary:
From this article you can learn about context switching: what is it, what issues they introduced, what is the impact of it, how you can indirectly monitor performance penalties related to context switching and how to fix it.
Besides allowing SQL to run much faster (7x faster in previous example), removing context switching might affect the number of Oracle licenses significantly.
Last point is that reducing the context switching is used by PL/SQL bulk operations (Forall…), which usually brings 10x speed up.
Comments
2019-09-06 16:15:38
2019-09-07 21:52:07
2019-09-19 01:22:16
2019-09-19 08:20:42
2019-09-19 16:07:12
2019-09-21 14:15:41
2019-09-21 17:22:46
2019-12-23 21:08:37
2019-12-25 16:16:13