Tuesday, September 04, 2012

Solaris LAN-Free



I recently had to configure a Solaris box for LAN-Free and had to dig up my old documentation. Here's what I did to get LAN-Free to work after loading the drivers and the devices not being found in the tape list file. These directions are for IBM LTO drives only.

http://www-01.ibm.com/support/docvie...S7002972&aid=1
Adobe Reader page 137 (actual doc page 117)
We need to make sure the native "st" driver is not loaded. Run

rem_drv st

to unload. And comment out everything in /kernel/drv/st.conf
then we need to run the following:

rm /dev/rmt/*                 
removes any tape drive definitions in the rmt folder. Do this only if the IBM tape drives are the only drives used on the server

/opt/IBMtape/tmd –s           
Stops the Tape Monitor Daemon
/usr/sbin/rem_drv IBMtape     
Removes the IBMtape driver
The commands to reload the device driver are:

/usr/sbin/add_drv -m ’* 0666 bin bin’ IBMtape
This reloads the driver but does not set the correct driver type

/usr/sbin/update_drv -av -i ’"scsiclass,01.vIBM.pULTRIUM-TD3"’ IBMtape
This will add the drive type to the /etc/driver_aliases file.
/opt/IBMtape/tmd
Reloads the IBM Tape Monitor Daemon

Then run /opt/IBMtape/tapelist –Ac to see if the drives are discovered correctly.

Wednesday, August 08, 2012

NFS Mount Issue

I recently had a number of AIX server backups miss due to the backup hanging when doing its initial filesystem listing. At some point within the last day the mount's source was rebooted and all the mount points went bad. The problem is that TSM sits there trying to query it even though the default action is to not backup NFS mount points. So I had to log into each server, umount -f the file system, remount it, and then TSM was able to run successfully. TSM does not allow a DOMAIN -ALL-NFS so no matter what I do TSM is going to hang on the listing of file systems. Of course doing a df on the server hangs also, so it's not just a TSM issue. Anyone else ran across this issue?

Tuesday, July 31, 2012

TSM 5.5 to 6.2.4 Upgrade

I recently did a network based upgrade of a 270GB DB. Previously TSM could take days to due the upgrade of a DB so large, but my experience with performing upgrades for other companies and some suggestions from my Tivoli Consultant had me convinced it could be done in under 24 hrs. The network method is kind of a misnomer here since we used the loop-back address, so the upgrade was done in place on the server. After allocating additional disk space (a lot of disk space) and defining the user ID TSM would run under, I started the upgrade process by running the dsmupgrd preparedb process. The upgrade took less than an hour and completed successfully. I copied our devconfig and volhist files to an alternate location and then started the insert under the id of the new TSM 6.2.4 instance. I then switched back to root, cd'd to the upgrade directory, set my environmental variables accordingly, and then started the extract.

The extract took 5 hours and ran without issue. The insert ran for 11 hours and completed without errors. The overall time from start of the upgrade to end was 13 hours. My reason for testing was to verify the time frame needed so the applications that rely on TSM for log offloading could add to their additional log space ahead of time. If anyone has suggestions on how I can make the extract/insert performance even better feel free to post a comment.

Friday, May 25, 2012

TSM 6.2.3 and Lower DB Reorg Issue

One of our TSM servers started to experience large numbers of "ANR0530W - internal server error detected messages."  With further investigation we identified that these were related to ANR0162W which are DB deadlock or timeout problems. These errors were causing our DB TDP backups to fail and I eventually called support. I provided our db2diag.log file and a dump of our actlog for the last 24 hrs. and they found the issue to be the DB2 reorg process was locking records and tables and creating the deadlock situation. The problem was compounded in that with TSM versions 6.2.3 or lower the DB2 reorg process cannot be schedule so it can kick off during backups or processes that can create these deadlocks. So to resolve the issue I was told I needed to upgrade our TSM server to 6.2.4 or higher. With TSM 6.2.4 and higher you can schedule the reorg process using the REORGBEGINTIME and REORGDURATION parameters to schedule the reorg within a window. You can see the details of the APAR here.

Thursday, May 10, 2012

Client Lockdown

A directive came down for 75+ windows servers to be "locked down" when it came to accessing their backup data. The TSM client will allow anyone to open the client GUI and restore files with which they have permission. So I considered the various ways to keep anyone from accessing TSM and restoring data; we could set permissions on the GUI and command line to not allow executing unless in the admin group, We could delete the command line and GUI executable, or we could simply set the SESSIONINIT option on the server.  After weighing the options the SESSIONINIT was the easier and most direct way to keep anyone but a TSM admin from restoring data.  Once SESSIONINIT is set the TSM client GUI, command line, and web GUI will not be allowed to initiate a session. All restores will have to be executed through a schedule from the TSM server. Of course you can temporarily turn SESSIONINIT off, but only a TSM admin can do so, making it easier to track who's accessed the data.


(Note: SESSIONINIT does not support the CAD)


The problem was how to update 75+ Windows servers options file and then restart the TSM Scheduler. So you can change any MANAGEDSERVICES options to WEBCLIENT using a client option set, but SESSIONINIT is another problem. As it turns out, if you set SESSIONINIT on the TSM server you have to put the HLAddress and LLAddress in the node definition on the server.  The client dsm.opt must have the TCPCLIENTADDRESS and TCPCLIENTPORT in the dsm.opt. What we didn't know was that we also had to put SESSIONINIT  SERVERONLY in the dsm.opt also. If you set SESSIONINIT on the TSM server and not on the client and the scheduler was defined with /validate:yes then you will get "password" errors and the scheduler will crash. The reason for this is that TSM does not allow client initiated sessions, but the scheduler when started is trying to validate its password. Since the scheduler can't validate the password it fails similar to when the password has not been set when using PASSWORDACCESS GENERATE


We had the TCP Client settings, but adding SESSIONINIT to all 75+ servers would have been a chore...unless you know how to use the Windows command prompt and dsmcutil. Here's how I added the SESSIONINIT option to all 75 servers.

Example of how to remotely add a line to the dsm.opt


c:\echo SESSIONINIT SERVERONLY >> \\WINSERVPRD20\C$\progra~1\tivoli\tsm\baclient\dsm.opt

I put the command for all 75 into a batch file and ran it looking for errors (not all our servers had TSM installed in the default location). Then I used the dsmcutil command to stop and start the TSM scheduler remotely.

Example of stopping and starting the TSM Scheduler service

c:\C:\progra~1\tivoli\tsm\baclient\dsmcutil stop /name:"TSM Client Acceptor" /machine:WINSERVPRD20
c:\C:\progra~1\tivoli\tsm\baclient\dsmcutil start /name:"TSM Client Acceptor" /machine:WINSERVPRD20

I needed to restart the scheduler on all 75 servers so once again I created a Windows command line batch file with the following commands in a list and it restarted all 75 quickly and easily from my own desktop. 


 

Monday, April 30, 2012

DBMEMPERCENT...Where'd That Come From?

I was having performance issues with a couple TSM 6.2 servers and could not find anything that pointed to the issue. I'm not one to call support unless I'm totally stumped and cannot find help through the web, but this time I finally relented and made the call. The issue was problems with backups failing repeatedly and when researched we were getting internal server errors along with DB table errors. IBM support asked for some DB2 log files and within 30 or so minutes had identified the problem.

TSM has a server option I have never used or heard of that had somehow been set that adversely affected all backups. Somehow the option DBMEMPERCENT was set in the dsmserv.opt file. This option tells TSM what percentage of the overall server's memory it can allocate for use. The default is AUTO and would have been fine, but somehow DBMEMPERCENT was set to 10 in the dsmserv.opt. Which means out of 16GB of RAM I was only using 1.6GB?!?!? How'd that happen? I didn't set it, none of my coworkers remember setting it, so where did it come from? IBM support stated the default was AUTO so the option was manually set. Since I had never used this option and its 6.x specific, I never would have looked for it. Good thing I called support.