Recently I’ve been assign to help in resolving issue named “name_of_the_critical_app slowing down” between 2 and 3 AM.

As the system architecture is fairly complex, I’ve decided to start with analyzing OS data collected by Oracle’s utility OS Watcher.

From generated report, It’s clear that during one hour period several disks had large service time and high percent busy (see the picture below). All other components (CPU, network…) was normal.

disk_slowdown

All overloaded disks belong to DATA diskgroup (ASM instance).

As Oracle Db is the only client of ASM disks, the next step is to analyze AWR report for one hour period.

AWR report has wealth of information, but we only need a tiny part of it in the IO Stats part of the main section. .

First let’s see amount of reads and writes during the issue.

IO_stats_by_filetype_summary

Notice the amount of data read in data file (1.3Tb ).

To see who is responsible for such amount of read, you can take a look at the IO Stat by Function summary report (there is even more detailed report IO Stat by Function/Filetype summary).

IO_stats_by_function_summary

Now we know that RMAN backup process is responsible for majority of reads.

Let’s check Top 5 foreground and the background wait events
top_5_timed_foreground_events

background_wait_events

To be able to see what else was going on, it’s time to look at the tablespace IO stats.

Tablespace with most reads (masked) and most writes is holds indexes for one application.

IO_stats_tablespace

As the last step, it’s time to take a look at SQL statistics ordered by reads (some columns like SQL module and SQL text are hidden).

sql_ordered_by_reads

To conclude this discussion, this is what I’ve found:

  1. There was RMAN backup running responsible for most reads during one hour period.
  2. External ETL process was running on the same time (loading data into Data Warehouse)
  3. Two batch processes, responsible for the majority of writes (look at the Tablespace IO Stats picture above) were running at the same time, which writes into one large global partitioned index.

Further step will be to re-schedule ETL process (responsible for reads) and batch processes (responsible for writes), and to optimize SQL statements and program logic.



Get notified when a new post is published!

Loading

Comments

There are no comments yet. Why not start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.