Wednesday, December 14, 2005

Unload/Load of DB pitfall

I am writing this article to save others from the situation we had at one of our customers. In the beginning a request came for replacing the RAID array which hosts the primary storage pools. I taked that as an excellent opportunity for reorganizing the DB which was 66GB with cca 50% utilization (after a massive deletion in last month or so). There was a planned downtime needed for copying the data from one RAID to another so I thought I had a plenty of time for unload/load.
The "dsmserv unloaddb" utility went smoothly - taking about 4hrs, creating cca 20GB dump (which was expected as the ESTIMATE DBREORGSTAT revealed we can save cca 7GB). Then I proceed with "dsmserv loadformat" and "dsmserv loaddb" (taking approx. 5hrs). So far everything was seemingly OK. I was able to start the server, reduce the DB and run some tests. The problem appeared when I tried to create a new stgpool


12/13/05 12:42:21 ANR2017I Administrator HARRY issued command: DEFINE STGPOOL archivedisk disk (SESSION: 3884)
12/13/05 12:42:21 ANR0102E sspool.c(1648): Error 1 inserting row in table "SS.Pool.Ids". (SESSION: 3884)
12/13/05 12:42:21 ANR2032E DEFINE STGPOOL: Command failed - internal server error detected. (SESSION: 3884)



Google revealed that this error is a known one and is fixed in 5.3.2.1 (we was on 5.3.2.0 - upgraded to this level just a few days before the fix appeared .. bad luck). Basically - there is an error in LOADFORMAT/LOADDB corrupting some DB values


http://www-1.ibm.com/support/docview.wss?uid=swg1IC47516


I did not want to go for loaddb again (there were changes already made to the DB, some migrations were run (luckily for me they were from stgpool with caching set on) .. etc.
So I tried to run the "dsmserv auditdb inventory fix=yes" - IBM says it can help if you do it after a loaddb - long story short - after 8hrs of audit (with message that some vaules were corrected) the problem was still there ...
So the only option was to apply the patch and do use loaddb again - so another 4hrs of waiting - and now it seems to work (still running tests). So watch for this problem and check your TSM level before reorganizing your DB.

4 comments:

  1. Defragmenting the TSM DB is a nightmare. We had one that ran over a week and in the meantime we started a new TSM instance...well guess what we never went back to the defragged one because there was too much data now in the new instance. We actually had TSM support tell us you are better off starting a new instance and letting the old data expire rather than defragging. This is why TSM needs an enterprise DB backend. Too much is dependent on DB performance and current the TSM DB is too old school to keep up with demands.

    ReplyDelete
  2. Chad,
    Per reading your and harry's comment..I having a second thought to do re-org of database...database size is 141Gb and about 89% utilized...database backup take 3-4/hrs...I update the TSM server from 5.1.8 to 5.3.2.0...any thing i should watch for..?

    -Atif
    813-486-4953

    ReplyDelete
  3. Hi,
    5.3.2.0 is not enough - I was on 5.3.2.0 and had to go to 5.3.2.1
    Your DB size is about 5 times larger than mine (so the full DB backup time) so I expect the unload/load
    is going to take 5 times more time - so a day for each (with no problem encountered!). Maybe more .... can your backup system be down for such a long time?
    Maybe Chad's suggestion for starting a fresh instance is a way - but it depends on your environment.

    ReplyDelete
  4. Any TSM Server patch levels ending with a ~.0 always has some exposures/bugs. Always go for a version that does not end with ~.0 as this can be a beta release ie. not properly regression tested. AC 2006.

    ReplyDelete