Monday, January 23, 2012

TSM Backup Issue

Anyone had an issue where their backups were extremely slow and their Interrupts were huge? I've got 400GB DB's taking 40hrs to backup over a 4 port Ether-channel connection. No errors in my AIX errpt and the network guys are telling me they don't think it's them. Any suggestions on what to look at are appreciated.  Below is an example when I run entstat.

ETHERNET STATISTICS (en8) :
Device Type: IEEE 802.3ad Link Aggregation
Hardware Address: 00:14:5e:e7:26:41
Elapsed Time: 9 days 19 hours 20 minutes 35 seconds

Transmit Statistics:                          Receive Statistics:
--------------------                          -------------------
Packets: 5470416553                           Packets: 24510516113
Bytes: 440661650021                           Bytes: 32245892708954
Interrupts: 0                                 Interrupts: 6027433898
Transmit Errors: 0                            Receive Errors: 691
Packets Dropped: 0                            Packets Dropped: 0
                                              Bad Packets: 0
Max Packets on S/W Transmit Queue: 298
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 355

Broadcast Packets: 8786                       Broadcast Packets: -1346793420
Multicast Packets: 225928                     Multicast Packets: 136913
No Carrier Sense: 0                           CRC Errors: 0
DMA Underrun: 0                               DMA Overrun: 691
Lost CTS Errors: 0                            Alignment Errors: 0
Max Collision Errors: 0                       No Resource Errors: 0
Late Collision Errors: 0                      Receive Collision Errors: 0
Deferred: 141004                              Packet Too Short Errors: 0
SQE Test: 0                                   Packet Too Long Errors: 0
Timeout Errors: 0                             Packets Discarded by Adapter: 0
Single Collision Count: 0                     Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 355

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 1701737521
Driver Flags: Up Broadcast Running
        Simplex 64BitSupport ChecksumOffload
        PrivateSegment LargeSend DataRateSet



Thursday, January 19, 2012

TSM Device Handling in Windows

I have to say that TSM on Windows is good for small to medium size solutions and I'm not ANTI-Windows. I just cringe when dealing with devices in Windows. I hate its driver handling and most of all I hate how Windows presents library and tape drives. So I was working with a TSM server where the tape library would not initialize. It was an older SCSI library, not Fiber. I tried restarting the library, the TSM server, reloading drivers, and updating the drivers and nothing worked.

Duh! <Head Slap!!!>

That's because on the initial reboot that caused the library to stop communicating the device ID's changed. So the library went from LB1.0.0.2 to LB1.0.0.3. Nobody touched the SCSI card or library but the device definition changed! Seriously?  All the drives changed to mtX.X.X.3 also. Now I don't use Windows all that much but luckily I remembered the TSMDLST program that is installed with the TSM server. It's under C:\Porgram Files\tivoli\tsm\console and will pull the information from Windows for you in a readable format. So next time your library goes offline make sure you use it to compare the device definitions and serials with what is defined in TSM. It will save you a lot of time and headache. You can find more information on issues like this here.

Wednesday, January 18, 2012

TSM Client Scheduler Issue

I just recently had an issue with a handful of TSM clients that would not run their backups. The clients all backup to a TSM 5.5.2 server and were all running Windows 2008. The clients use TSM version 6.2.3. The five clients had all been missing their backups for days and what makes the situation more interesting is that there are other Windows 2008 servers with this version of TSM installed and they are all running their schedules without issue.

When reviewing the TSM Schedule log the scheduler listed that it had received the schedule info and was waiting for the TSM server to initiate the schedule. The TSM server never made an attempt to contact the clients in question and never showed any errors other than ANR2578W stating the client missed its schedule. There were no errors in the error log and not much to go by from the TSM server activity log. Even though the TSM client backs up over the public network I switched to polling mode to see if client based initiation of the backup would work. It didn't! The TSM client scheduler would receive the schedule upon polling the TSM server but would never execute it. So now what? I added the TCPCLIENTADDRESS and TCPCLIENTPORT and switched back to SCHEDMODE PROMPTED, still the scheduler would not run backups.

Now I was getting frustrated. I removed the scheduler service and redefined it using dsmcutil and voila, the schedule ran...ONCE! After the initial schedule ran the previous problem returned. Schedules were not running and the TSM server would not show any errors saying it could not contact the client. It just would not run the schedule. Well that left me no choice but to call support. IBM support's response was to make sure the TCPCLIENTADDRESS and TCPCLIENTPORT were defined in the dsm.opt and also to define the client HLADDRESS and LLADDRESS on the TSM server? Define the HL and LL addess? TSM gets that when the client connects doesn't it? Yes and No! It appears that without the optional setting the TSM server can have issues contacting some clients. Why? No idea, but adding the HL and LL address did the trick and the backups have been running without issue since.

How many of you define the HL and LLADDRESS when registering nodes? I've never suspected it was needed until now.