TSM Topics Feed

Friday, March 28, 2014

Poor Performance

Currently I work in an environment where we have a specific TSM instance for a large SAP DB (99TB currently). We just upgraded the drives in the tape library (yes we use tape! I know...I know....) from MagStar 3592 TS1130 (E06) drives to TS1140 (E07) drives. The upgrade was pushed in hopes of a jump in write/backup performance, but I was skeptical. TSM adds so much overhead you cannot use the RAW tape read/write numbers from any manufacturer. Typically IBM is somewhat reasonable with their numbers, but in this case I have seen NO performance increase what-so-ever.  Here is a query of the processes for storage pool backup.

UPDATE (04/04/2014):  Let me give you some more specs, we have the 99TB DB split between 4 TSM Storage Agents each having 4 8Gb HBA's. Each storage agent runs 4 sessions (allocates 4 drives) for their backup process. So all 4 storage agents account for 16 simultaneous sessions and it still takes over 24 hours to perform the 99TB backup. The backups are averaging around 70-78MB/sec. Is this a TSM overhead issue or do I have a tuning issue with the TDP and TSM? I'm getting less than 50% of the throughput I should see.

Here's the command that is run to execute the DB backup:

ksh -c export DB2NODE=7 ; db2 "backup db DB8   LOAD /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a OPEN 4 SESSIONS OPTIONS /db2/DB8/dbs/tsm_config/vendor.env.7 WITH 14 BUFFERS BUFFER 1024 PARALLELISM 8 WITHOUT PROMPTING" ; echo BACKUP_RC=$?

PROCESS_NUM: 2667
    PROCESS: Backup Storage Pool
 START_TIME: 03-27 23:21:54
   DURATION: 00 23:20:13
      BYTES: 6.0TB
 AVG_THRPUT: 75.87 MB/s

PROCESS_NUM: 2668
    PROCESS: Backup Storage Pool
 START_TIME: 03-27 23:21:55
   DURATION: 00 23:20:12
      BYTES: 6.2TB
 AVG_THRPUT: 78.48 MB/s

PROCESS_NUM: 2669
    PROCESS: Backup Storage Pool
 START_TIME: 03-27 23:21:55
   DURATION: 00 23:20:12
      BYTES: 6.2TB
 AVG_THRPUT: 77.99 MB/s

PROCESS_NUM: 2670
    PROCESS: Backup Storage Pool
 START_TIME: 03-27 23:21:55
   DURATION: 00 23:20:12
      BYTES: 6.4TB
 AVG_THRPUT: 80.13 MB/s

I average anywhere from 75 to 80 MB/sec.  Here is the Magstar performance chart. I am using JB media, not JC so I do take a little hit in performance for that.










So with JB media I could get as high as 200MB/sec but I am not even 50% of that number.  Is there any specific tuning parameter I should look at that could be hindering the performance? 

FYI - The backup of the 99TB DB runs LAN-Free using 16 tape drives over 26 hrs.

4 comments:

  1. What is the HBA count and line speeds? Assuming an average 2:1 compression, a single drive is capable of 400MB/s, which is almost the full bandwidth of a 4gb FC connection. If you have more than one drive going through the same san connection, the TS1140 will need to step down its speed if the line gets saturated. 74MB/s is the lowest streaming speed that it can handle with JB media.
    It looks like your backup stg was run with 4 processes. Any performance difference when you just run a single job?

    ReplyDelete
  2. Yeah, 99TB in 26 hrs is about 8 Gb/s. Sounds like you hit an HBA or SAN fabric limit.

    ReplyDelete
  3. I updated the story with more details.

    ReplyDelete
  4. OK, so there’s a whole slew of possible problems - time to start tracing the path that the data takes. Each storage agent reads from disk, yes? Is it internal or external disk? JBOD or raid array? How is it attached - fibre channel to a 16x PCIe slot, SCSI320, onboard SATA II, your friend’s NAS, etc? If more than one disk controller, are they on separate PCIe busses?
    Next look at CPU and RAM usage on each storage agent, via “top”, “monitor”, etc. If this is old hardware, or part of a VM or LPAR, you may have hit a limit there.
    Then the data goes out on the SAN via 4 8Gb HBAs. Separate PCI (or whatever) busses? What’s the SAN fabric like - do all HBAs and tape drives plug into one switch or director? Is every switch 8 Gb capable? Does each storage agent have 4 dedicated tape drives, or can they see all 16? Are you using multi path and Atape load balancing?
    Are you dizzy yet? I think I am.

    ReplyDelete