Last month I’ve been working on rman backup optimization.

One of the largest databases that I manage starts to break time frame given for the full backup.

Full backup starts every Saturday at midnight and should finish till Sunday 10AM.

Suddenly, for no apparent reason, full backup finish around 2:30 PM.

As the database is growing at speed about 50 – 100 Gb per week continuously, database growth should not be responsible for 4,5 hours longer full backup.

Below you can find the most important fragment from backup script:


RUN
{
    allocate channel t1 type 'sbt_tape' parms 'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
    allocate channel t2 type 'sbt_tape' parms 'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
    allocate channel t3 type 'sbt_tape' parms 'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
    allocate channel t4 type 'sbt_tape' parms 'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
    backup as compressed backupset tag='${backup_tag}' incremental level=0 database plus archivelog delete input;
    delete noprompt obsolete;
    release channel t1;
    release channel t2;
    release channel t3;
    release channel t4;
}

Just to explain architecture a little bit.
As you can see from the excerpt above, instead of FRA, backup targets IBM Tivoli server disks, from where, in the next step, Tivoli copy content of the disk to tapes.

Although backup in fact will be on disk, rman thinks (by using Tivoli provided MML library) that backup will use tapes.

This Db server is using 4 channels for backup, and everything else is default (besides CONTROLFILE AUTOBACKUP ON).

After thorough analysis, I can detect the problem. Although 4 channels have been allocated, last set of files for backup run only in 1 channel.

There are several parameters that needs to be tweaked to improve duration of full backup.
Parameters with the most impact in this case are:

1. filesperset
Parameter filesperset specifies the maximum number of files in each backup set. The default value of filesperset is 64.

As the database is large (300+ files) and default value of filesperset is 64, the following has happened: if Db has 383 data files, and default value for filesperset is 64, it means: 383 mod 64 = 63.

In the last round of backup data files, rman will allocate all 63 data files in only one channel.

Action of adding Db files trigger longer duration of rman backup, as the number of data files is not divided by 64, which means there will be files for which rman will allocate only one channel instead of 4.

This parameter need to be changed.

2. maxopenfiles

This parameter limits the number of files that can be simultaneously open for reads during a backup per allocated channel.

Default value of that parameter is 4.

For example, if filesperset = 8 and maxopenfiles = 4, it means that rman will be able to read only 4 files in parallel at time (simultaneously) if only one channel is allocated.

3. bigfile tablespace

This is of course not parameter. I make a point because you should check if you are using bigdata files. If you are using it, there are two options:
-either convert bigfile tablespaces into regular tbs where datafile limits is 32 GB
-use multisection backup (backup section size 2g …) to split backup of large data files into multiple sections of fixed size.

4. blksize

As I’ve explained before, rman backup will be stored first on Tivoli disks and in the second phase on tapes (that phase is out of Oracle control).

But as rman see Tivoli disks as tapes, it behaves differently (for example synchronous instead of asynchronous operations, buffer size will be different etc).

For that reason it is important to change blksize value as well.

Final version of rman backup that provides the best performance in this case is:


RUN
{
    allocate channel t1 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    allocate channel t2 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    allocate channel t3 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    allocate channel t4 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    allocate channel t5 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    allocate channel t6 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    allocate channel t7 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    allocate channel t8 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    allocate channel t9 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    allocate channel t10 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    allocate channel t11 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    allocate channel t12 type 'sbt_tape' parms 'BLKSIZE=1048576, ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)' maxopenfiles 1;
    backup as compressed backupset tag='${backup_tag}' incremental level=0 database filesperset 8 plus archivelog filesperset 8 delete input;
    delete noprompt obsolete;
    release channel t1;
    release channel t2;
    release channel t3;
    release channel t4;
    release channel t5;
    release channel t6;
    release channel t7;
    release channel t8;
    release channel t9;
    release channel t10;
    release channel t11;
    release channel t12;
}

Performance results can be found in the following table:


INPUT_MB_PER_SEC    OUTPUT_MB_PER_SEC       END_TIME                No. Channels    filesperset    maxopenfiles    bigfiles    blksize
----------------------------------------------------------------------------------------------------------------------------------------
  316.24M               83.19M          18.12.2016 07:54:17              12              8               1            no         1MB
  319.82M               84.69M          11.12.2016 07:47:55              12              8               1            no         1MB
  295.56M               77.52M          04.12.2016 08:18:51              12              4               1            no         256 Kb
  337.78M               89.05M          27.11.2016 07:10:38              12              8               1            no         256 Kb
  283.47M               74.96M          20.11.2016 08:32:07              8               8               1            yes        256 Kb
  179.72M               47.50M          13.11.2016 13:12:35              4               8               1            yes        256 Kb
  198.54M               52.78M          06.11.2016 11:51:32              4               8               8            yes        256 Kb
  150.14M               40.10M          23.10.2016 15:19:48              4               64              8            yes        256 Kb

By fully understand architecture of the existing backup solution, armed with understanding of how rman works and properly identifying what the problem is, I can concentrate on rman tweaks that will radically change backup duration.

As you can see from the table, performance of backup is improved by almost 7,5 hours.

There are many more tweaks that you can do, and the whole books have been written about Oracle backup, but they won’t lower full backup duration significantly (at least not in this case).

One more important thing.

You can improve performance by watching from all perspective what is going on (rman views, OS monitoring, DB monitoring, network…) until you start hitting the wall.

In this case, Db server can deliver even more power (more channels), storage system can deliver more IOPS (I can increase the number of files to read in paralle – see point 1 & 2), but the network card set the limits. As I’ve start to increase the number of channels, i can observe that the number of dropped packets also increase.

As this is nothing we can do (several network cards are already in bond), and results are more than satisfactory, I’ve decided to stop with rman tuning.



Get notified when a new post is published!

Loading

Comments

There are no comments yet. Why not start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.