Monday, August 07, 2017

TSM/Spectrum Protect 7.x Client Issue

I ran across an issue with the TSM/Spectrum Protect client recently that through investigation showed that the issue has not been patched and the supposed resolution works....sometimes. The issue is that when an AIX backup is running you can experience the following error:

08/02/17   03:24:52 ANS1512E Scheduled event 'CS-FS-U-02.00' failed.  Return code = 12.
08/02/17   13:54:36 ANS2820E An interrupt has occurred. The current operation will end and the client will shut down.
08/03/17   03:39:20 calloc() failed: Size 31496 File ../mem/mempool.cpp Line 1090
08/03/17   03:39:20 ANS1999E Incremental processing of '/usr/ibm' stopped.

This is a memory issue and can occur if TSM cannot allocate enough memory during the scheduled backup. The interesting thing is that manual backups run without issue. It is only when a scheduled backup is run that we experience the error. The server in question has only 8GB so we do know that real memory is limited so I followed the troubleshooting tips I saw online and checked the file system for excessive files. Querying the file system determined that was not the issue.  I could not exclude the file system and so I added the MEMORYEFFICIENTBACKUP YES option thinking that would resolve the issue. Unfortunately the addition of the MEMORYEFFICIENTBACKUP YES option did not work. Subsequent investigation showed some people had to downgrade their TSM client version to a 6.4 level to resolve the problem. Unwilling to do so I changed the option so it used the disk cache function. So far using disk cache has worked without issue, but it's concerning that a change to the TSM/SP client has created this memory issue. So the fix was to add the following two options to my dsm.sys:

   memoryefficientbackup DISKCACHEM
   diskcachelocation     /tmp

Please note that my /tmp file system is over 2GB in size and only 1% used so make sure you have a sufficiently sized file system you can use if you use the DISKCACHEMETHOD option.

2 comments:

  1. I found your blog quite interesting and the concern in the blog is really impressive. Thanks for sharing.

    Voice And Data Cabling

    ReplyDelete
  2. In my case I found this wasn't a TSM bug but rather a user limit issue. Setting 'ulimit -d unlimited' for the user running the backup (root in my case) solved the problem. It's documented by IBM at the link below. Other OS (ie Linux) have 'data' unlimited by default, so this is an AIX-specific fault.

    https://www.ibm.com/support/pages/backup-large-file-system-aix-fails

    ReplyDelete