Wednesday, December 14, 2005

Unload/Load of DB pitfall

I am writing this article to save others from the situation we had at one of our customers. In the beginning a request came for replacing the RAID array which hosts the primary storage pools. I taked that as an excellent opportunity for reorganizing the DB which was 66GB with cca 50% utilization (after a massive deletion in last month or so). There was a planned downtime needed for copying the data from one RAID to another so I thought I had a plenty of time for unload/load.
The "dsmserv unloaddb" utility went smoothly - taking about 4hrs, creating cca 20GB dump (which was expected as the ESTIMATE DBREORGSTAT revealed we can save cca 7GB). Then I proceed with "dsmserv loadformat" and "dsmserv loaddb" (taking approx. 5hrs). So far everything was seemingly OK. I was able to start the server, reduce the DB and run some tests. The problem appeared when I tried to create a new stgpool


12/13/05 12:42:21 ANR2017I Administrator HARRY issued command: DEFINE STGPOOL archivedisk disk (SESSION: 3884)
12/13/05 12:42:21 ANR0102E sspool.c(1648): Error 1 inserting row in table "SS.Pool.Ids". (SESSION: 3884)
12/13/05 12:42:21 ANR2032E DEFINE STGPOOL: Command failed - internal server error detected. (SESSION: 3884)



Google revealed that this error is a known one and is fixed in 5.3.2.1 (we was on 5.3.2.0 - upgraded to this level just a few days before the fix appeared .. bad luck). Basically - there is an error in LOADFORMAT/LOADDB corrupting some DB values


http://www-1.ibm.com/support/docview.wss?uid=swg1IC47516


I did not want to go for loaddb again (there were changes already made to the DB, some migrations were run (luckily for me they were from stgpool with caching set on) .. etc.
So I tried to run the "dsmserv auditdb inventory fix=yes" - IBM says it can help if you do it after a loaddb - long story short - after 8hrs of audit (with message that some vaules were corrected) the problem was still there ...
So the only option was to apply the patch and do use loaddb again - so another 4hrs of waiting - and now it seems to work (still running tests). So watch for this problem and check your TSM level before reorganizing your DB.

Tuesday, December 06, 2005

System State Backup Issue

We recently had to rebuild a box (thank goodness it was not production) and found that the system state and objects had not backed up.  Stranger still there was no error stating they failed. In fact the only error was that the LVSA directory was not set and it would not be used during the backup.  Why would a failure to completely setup the LVSA affect the system objects and state? TSM doesn't require the LVSA to back them up.  We checked multiple systems and found this happening on more than one.  The client level we are on is 5.2.4 and higher.  Has anyone else experienced this?  BTW - The backup schedules were showing in the event log as COMPLETED not Failed.

Saturday, December 03, 2005

Got HACMP?

Well our group got stung again this last week. While working a HACMP implementation we were asked to install the Oracle TDP and TSM client. We installed the TSM client and all went well and we then installed the TDP and the DBA’s could not get the password file to generate. Each time they tried the utility would core dump. When trying to run the utility multiple ways it always core dumped. After calling support we were informed that the current 64-bit Oracle TDP (And from what I have found through Google it could be any 64-bit TDP). It seems they rebuilt HACMP version 5.3 and the TDP is not compatible with the new way it was compiled. There is a patch projected to be released as of end of the month, but we cannot wait that long so our options are to use temporary disk and mirror the DB, then break and backup the mirrored copy when we want to backup, or we can roll back to HACMP 5.2. The problem with the latter is that we basically have to rebuild the boxes. So be warned there are issues with the latest and greatest version of HACMP at this time.