TSM Topics Feed

Tuesday, July 17, 2018

Spectrum Protect Server 8.1.4 DB Issue

Recently we ran into a situation where our Spectrum Protect 8.1.4 server started to crash. We reviewed the db2diag.log file and identified the error as ANR0131E Server DB space exhausted. Upon review of our DB file systems the OS showed 7GB available in the 5 file systems assigned for DB space.

/dev/tsm20log 111616.00   9784.93   92%      205     1% /tsmserver/tsm20log
/dev/tsm20arch 102400.00  66821.52   35%      100     1% /tsmserver/tsm20arch
/dev/tsm20fail 101376.00 100964.16    1%       21     1% /tsmserver/tsm20fail
/dev/tsm20dblv01 101376.00   7622.45   93%      104     1% /tsmserver/tsm20dblv01
/dev/tsm20dblv02 101376.00   7622.48   93%      104     1% /tsmserver/tsm20dblv02
/dev/tsm20dblv03 101376.00   7622.45   93%      104     1% /tsmserver/tsm20dblv03
/dev/tsm20dblv04 101376.00   7622.46   93%      105     1% /tsmserver/tsm20dblv04
/dev/tsm20dblv05 101376.00   7622.45   93%      105     1% /tsmserver/tsm20dblv05


The server had been crashing from the previous night and we had at first thought it was due to the active log filling and not writing out to the archive log fast enough. What was weird is we could bring TSM up and it would run for an hour or two then crash again. When we identified it was not the active log but the DB space (earlier the next morning) we were confused. Why would we get a space notification when the DB has 35GB of space still available. My co-worker created an additional file system of the same size as the other DB file systems and ran the dsmserv extend dbspace command and the file systems almost immediately jumped to 99% utilized. So it appears Spectrum Protect needed to write 35+GB of DB data and could not due to space available? We also noticed that our offshore people had added tsm20dblv05 a few months back but had used a different disk array (slow SATA) and had not informed us, which we think might have contributed to the performance issues we had been seeing on this server recently and maybe was related to the DB space issue??? Not sure, but be aware that this can occur. I will need to research how DB2 allocates space and in what chunk size it does so.

Tuesday, January 23, 2018

Audit Occupancy Issue

1/24/2018 Update 
IBM Support contacted me and stated that the developers said the issue was actually a defect (APAR IT23153) and that you can fix the issue by logging on the SP server as the instance owner and running the following commands:

  • db2 connect to tsmdb1
  • db2 set schema tsmdb1
  • db2 "update tsmdb1.sd_pool_clusters set physoccupancy=1 where physoccupancy=0"
  • login as admin to Spectrum Protect instance
  • run AUDIT LICense command
  • run Q AUDITOCCupancy command

You should now see the AUDITOCC table is populated.


1/23/2018 
 We setup a new Spectrum Protect instance with a Directory Container Storage Pool and when we went to gather storage data for billing purposes we discovered that the AUDITOCC table in the database showed no storage data. The first troubleshooting step was to check to see if AUDITSTORAGE option was set to YES which it was. Then we reran the AUDIT LICENSE command to have Spectrum Protect audit the storage and after completing there was still no data in the AUDITOCC table. I contacted IBM and asked for support to tell me why I had no data in my AUDITOCC table and was met with level 1 support unable to provide an answer (which was probably due to them not knowing my servers full configuration). When the PMR was passed up to the chain to level 2 support and they reviewed the log details I sent in, they responded that DIRECTORY CONTAINER storage pools do not provide occupancy data to the AUDITOCC table. To gather the required data that the AUDITOCC table normally would provide you have to run the GENERATE DEDUPSTATS command.

GENERATE DEDUPSTATS <STORAGE POOL> <NODE NAME>

Please note that there is nothing in the Spectrum Protect manuals that will inform you that DIRECTORY CONTAINER storage pools do not provide AUDITOCC table data. IBM Level 2 support has opened a PMR requesting that the documentation be updated for the QUERY AUDITOCC command. I am not sure if you can use a * with the node name and have GENERATE DEDUPSTATS but will be trying it today and will let you all know the results in the comments below.

Monday, December 04, 2017

Stay Away From The Spectrum Protect For Windows 8.x Client

3/7/2017 Update
I have had a number of people request the test flag and one response said it was kicking up errors when used with the CAD service/process. Please note this flag is only needed in the event of a system state restore and you would not (I hope) be using the CAD/Scheduler to restore your system state. So the testflag should only be activated in the dsm.opt in the event you are doing a manual restore of the system state and not kept turned on for daily use.


1/24/2018 Update
After waiting for IBM to post an update to this issue I decided I would let you know that IBM has an undocumented flag for this issue that allows the client to restore the System State online rather than work with the PE. Our Windows admins spoke with Microsoft and they state emphatically that they support online system state restores, so why IBM is stating that they do not is "interesting". Here is the flag to allow the SP client to perform online system state restores.

ONLINE_SYSTEMSTATE_RESTORE_ALLOWED

12/4/2018
I wanted to make everyone aware that starting with the 8.1 Windows Client, IBM has deprecated the System State restore process so that any restore attempt will require the use of a WinPE environment. With the 8.1.x client you can no longer do a complete restore of a server within the normal windows environment and will have to make sure you have a compatible WinPE for every Windows Hardware type or environment. This, in my opinion, is a huge break of the software. Although IBM will swear it's because Microsoft does not support online system state restores it appears to be a weak argument. I say this knowing that if you are on a 7.x or earlier version of the client IBM will support you with the system state restore, but as of 8.1 you will be directed to build a Windows PE environment and follow a series of step that the support person I spoke with treated as if it should be second nature to any Windows Admin. 

We recently were migrating a clients server and in the process the disks were corrupted. The client installed on the new machine was 8.1.2 and when the Windows Admins tried to restore the system state they experienced the following error.
 


After speaking with support we were directed to the following document by Andrew Raibeck that discussed creating the WinPE disk image with the TSM client code that would allow the offline restore of the System State. We attempted to build the ISO but when the Windows Admin tried to run the dsmc client we experienced the following error:



Who do we call when we experience an error like this in a WinPE environment? IBM or Microsoft? This is ridiculous and I don't think IBM fully grasps how big of an issue this will be with their customers. Am I now required to make sure that to be ready for a DR scenario or test that we have WinPE ISOs and that my admins are fully aware how to use and troubleshoot them? So basically make sure you add a handful of hours to any Windows full server restore situation.  If you were fully aware of this and took precautions what were they? My thinking is that many admins have no idea this is going to bite them and its better you know now than find out the hard way.
 

Wednesday, November 22, 2017

Spectrum Protect 8.1 New DB Functions Supported - UPDATE to the UPDATE!

UPDATE (11/22/2017):

I have found another interesting select option I was previously unaware of that coincides with the previous LOCATE option. We have an interesting host naming convention where I work and we use the hostname for the NODE NAME when defining the servers to TSM/SP. It is as follows:

<servername>.<datacenterID>.<OS_type>.<customer>.<domain.net>
Example:   CQSDBS01.US01.ULX.BS.US32118.NET
The issue I have is with trying to seperate out the data is a report by customer. Of course each customer has their own domains but I wanted a report that broke out the report by the customer ID. I wanted something like this:
Customer      Occupancy GB
----------    ------------
UDC2                 38144
903                  33358
CFG                  30321
ULT01                 1942
MAQ                    639
MSM                   1557
MX                       9
MX3                      0
ELO                   6741
GUS                      4
BFG                  11234
VS                     166
PMTX                    80
POS                  14148
SAM                   8031
SUX                  18562
The problem was not with breaking out the customer ID because the LOCATE select worked perfectly for extracting the Customer IDs with their various lengths. The issue was how to sum each customers data without experiencing an error requiring I use GROUP BY which caused more headaches because I could not figure out how to group by the NODE NAME from which I was getting the customer ID. So I googled GROUP BY in DB2 with SUBSTR since I was using a SUBSTR to extract the customer IDs. As turns out the GROUP BY is more flexible than I originally thought. Check this out:

Select -                           
 VARCHAR(SUBSTR(NODE_NAME, -        
 LOCATE('.',NODE_NAME, -           
 LOCATE('.', NODE_NAME, -          
 LOCATE('.', NODE_NAME)+1)+1)+1, - 
 LOCATE('.',NODE_NAME, -           
 LOCATE('.',NODE_NAME, -           
 LOCATE('.', NODE_NAME, -          
 LOCATE('.', NODE_NAME)+1)+1)+1)- -
 LOCATE('.',NODE_NAME, -           
 LOCATE('.', NODE_NAME, -          
 LOCATE('.', NODE_NAME)+1)+1)-1),8) -
 as CUSTOMER, -                    
 sum(TOTAL_MB)/1024 AS TOTAL_GB -   
from auditocc -                    
 where node_name like '%.%.%.%.%' - 
group by VARCHAR(SUBSTR(NODE_NAME, -
 LOCATE('.',NODE_NAME, -           
 LOCATE('.', NODE_NAME, -          
 LOCATE('.', NODE_NAME)+1)+1)+1, - 
 LOCATE('.',NODE_NAME, -           
 LOCATE('.',NODE_NAME, -           
 LOCATE('.', NODE_NAME, -          
 LOCATE('.', NODE_NAME)+1)+1)+1)- -
 LOCATE('.',NODE_NAME, -           
 LOCATE('.', NODE_NAME, -          
 LOCATE('.', NODE_NAME)+1)+1)-1),8) 

In DB2 I could actually group by the SUBSTR of the value I was extracting, even with it being complex with the LOCATE options in use. MIND BLOWN! 

mind blown wat animated GIF

UPDATE (5/9/2017):

I posted on StackOverflow (a great developer/scripting/DB) listserv asking how I could get similar output with the functions available in versions prior to DB2 11 and a user was able to help me out with a script that could return the same data as the one below although it's a little more complicated.  So as you know the SUBSTR function works like this:


Select SUBSTR(FILESPACE_NAME, <START POSITION>,<LENGTH>)

The issue with the earlier DB2 versions is I could only get the LOCATION function to play nice with my script. I was previously only able to figure out a select that got close:
Select -                                                                        
 SUBSTR(FILESPACE_NAME, LOCATE('\',FILESPACE_NAME, LOCATE('\',FILESPACE_NAME, - 
 LOCATE('\', FILESPACE_NAME, -                                                  
 LOCATE('\', FILESPACE_NAME)+1)+1)-1)+1) -                                      
 as FSNAME -                                                                    
from filespaces -                                                               
 where node_name='TEST-AP-DAG'
Which output this:

FSNAME                                                           
-----------------------------------------------------------------
TEST-MB-08\6bc391ef-a370-49a1-8f05-b1bed9e5ad55                  
Mailbox Database 1143943276\55ac1670-efc9-4301-ac5c-beb5cd2d77cb 
TEST-MB-05\11dedbd4-1757-45c8-8991-f0f7134ef210                  

I could not figure out a way to get the length. So a kind reader at StackOverflow helped me understand how to get the length by subtracting the 4th '\' location from the 3rd '\' -1. This gives the correct length and allows for the length size to vary rather than be fixed.
Select -
 SUBSTR(FILESPACE_NAME, -
 LOCATE('\',FILESPACE_NAME, -
 LOCATE('\', FILESPACE_NAME, -
 LOCATE('\', FILESPACE_NAME)+1)+1)+1, -
 LOCATE('\',FILESPACE_NAME, -
 LOCATE('\',FILESPACE_NAME, -
 LOCATE('\', FILESPACE_NAME, -
 LOCATE('\', FILESPACE_NAME)+1)+1)+1)- -
 LOCATE('\',FILESPACE_NAME, -
 LOCATE('\', FILESPACE_NAME, -
 LOCATE('\', FILESPACE_NAME)+1)+1)-1) -
 as FSNAME -
from filespaces -
 where node_name='TEST-AP-DAG'
Here it is color coded:
(Select SUBSTR(FILESPACE_NAME, <START POSITION>,<LENGTH>)
Select -
 SUBSTR(FILESPACE_NAME, LOCATE('\',FILESPACE_NAME, LOCATE('\', 
 FILESPACE_NAME, LOCATE('\', FILESPACE_NAME)+1)+1)+1, -
 LOCATE('\',FILESPACE_NAME, LOCATE('\',FILESPACE_NAME, LOCATE('\', 
 FILESPACE_NAME,  LOCATE('\', FILESPACE_NAME)+1)+1)+1)-  -
 LOCATE('\',FILESPACE_NAME, LOCATE('\', FILESPACE_NAME,  -
 LOCATE('\', FILESPACE_NAME)+1)+1)-1) -
 as FSNAME -
from FILESPACES -
 where - 
  NODE_NAME='TEST-AP-DAG'
So the final script looks like this
Select -
 VARCHAR(NODE_NAME,12) as NODE, -
 VARCHAR(SUBSTR(FILESPACE_NAME, -
  LOCATE('\',FILESPACE_NAME, -
  LOCATE('\', FILESPACE_NAME, -
  LOCATE('\', FILESPACE_NAME)+1)+1)+1, -
  LOCATE('\',FILESPACE_NAME, -
  LOCATE('\',FILESPACE_NAME, -
  LOCATE('\', FILESPACE_NAME, -
  LOCATE('\', FILESPACE_NAME)+1)+1)+1)- -
  LOCATE('\',FILESPACE_NAME, -
  LOCATE('\', FILESPACE_NAME, -
  LOCATE('\', FILESPACE_NAME)+1)+1)-1),30) -
  as Exch_DB,
 date(backup_end) AS LAST_GOOD_BACKUP -
from filespaces -
 where - 
  node_name='TEST-AP-DAG' -
 order by Exch_DB, backup_end asc
The results are exactly like the script that only works with TSM 8.1.


NODE              EXCH_DB                          LAST_GOOD_BACKUP
-------------     ------------------------------- -----------------
TEST-AP-DAG       \TEST-MB-01                            2017-05-08
TEST-AP-DAG       \TEST-MB-02                            2017-05-08
TEST-AP-DAG       \TEST-MB-03                            2017-05-08
TEST-AP-DAG       \TEST-MB-04                            2017-05-08
TEST-AP-DAG       \TEST-MB-05                            2017-05-08
TEST-AP-DAG       \TEST-MB-TEMP                          2017-05-08
TEST-AP-DAG       \MISC-MB-TEMP                          2017-05-08
TEST-AP-DAG       \Mailbox Database 1128394465           2017-05-08
TEST-AP-DAG       \Mailbox Database 1397248650           2017-05-08



(4/20/17):
I upgraded some of my 6.3.4 servers to TSM 8.1 recently and have found the newer DB2 version (v11.1) offers some added functions for SQL queries. I have been trying to build a report to track an Exchange DAG servers backups. The problem is that the easiest way to track the backups of DB's is to view the last completed backup for each of nodes file spaces. That's the easy part. The hard part is that the file space name is huge.

TEST-AP-DAG\Microsoft Exchange Writer\{76fe1ac4-15f7-4bcd-987e-8e1acb462fb7}\TEST-MB-05\0e41e645-9acf-4b80-bc85-d606e04fe4d8

TEST-AP-DAG\Microsoft Exchange Writer\{76fe1ac4-15f7-4bcd-987e-8e1acb462fb7}\Mailbox Database 1223544393\2bc06db2-1966-4fd7-9545-f667102b0b7d

So how to extract the DB name, in this case TEST-MB-05 and Mailbox Database 1397248650 from the name when the length of the DB name changes? After some investigation and trial and error I found that the LOCATE_IN_STRING function works in v8.1. Whether it works in 7.x (DB2 v10.5) I can't say since I don't have a 7.x server to run it against. If any of you out there try this and it works let me know. So here's the script. I think you can read it and see what I am doing. If not let me know in the comments and I'll explain...

tsm: TSMSERV>q script DAG-EXCH-RPT f=raw

select -
 VARCHAR(NODE_NAME,12) as NODE, -
 varchar(substring(filespace_name, LOCATE_IN_STRING(filespace_name, '\', 1, 3), -
 LOCATE_IN_STRING(filespace_name, '\', 1, 4) -  LOCATE_IN_STRING(filespace_name, '\', 1, 3)),30) as Exch_DB, -
 date(backup_end) AS LAST_GOOD_BACKUP -
from FILESPACES -
 where -
  node_name='TEST-AP-DAG' -
 order by Exch_DB, backup_end asc


Here is sample output:

NODE              EXCH_DB                          LAST_GOOD_BACKUP
-------------     ------------------------------- -----------------
TEST-AP-DAG       \TEST-MB-01                            2017-04-18
TEST-AP-DAG       \TEST-MB-02                            2017-04-18
TEST-AP-DAG       \TEST-MB-03                            2017-04-18
TEST-AP-DAG       \TEST-MB-04                            2017-04-18
TEST-AP-DAG       \TEST-MB-05                            2017-04-18
TEST-AP-DAG       \TEST-MB-TEMP                          2017-04-18
TEST-AP-DAG       \MISC-MB-TEMP                          2017-04-18
TEST-AP-DAG       \Mailbox Database 1128394465           2017-04-18
TEST-AP-DAG       \Mailbox Database 1397248650           2017-04-18

If another function would work better feel free to correct the script and leave it for everyone in the comments. Now to go and find more functions I can use to manipulate my data!

Wednesday, October 25, 2017

Spectrum Protect 8.1.0 vs. 8.1.2 Client Issue

I just wanted to let everyone know that there is a compatibility issue with the Spectrum Protect 8.1.2 client with the 8.1.0 server. If you upgrade the client to SP 8.1.2 and your server is at 8.1.1 or 8.1.0 your server will be unable to start. The error you will get is

Spectrum Protect ANR0162W Supplemental database diagnostic information:  -1:08001:-30082 ([IBM][CLI Driver] SQL30082N  Security processing failed with reason "24" ("USERNAME AND/OR PASSWORD INVALID").

The issue has to do with the GSKit8.gskcrypt64.ppc.rte and GSKit8.gskssl64.ppc.rte in the 8.1.2.0 client. When we rolled back to the 8.1.0 client the SP server started without an issue.

Monday, August 07, 2017

TSM/Spectrum Protect 7.x Client Issue

I ran across an issue with the TSM/Spectrum Protect client recently that through investigation showed that the issue has not been patched and the supposed resolution works....sometimes. The issue is that when an AIX backup is running you can experience the following error:

08/02/17   03:24:52 ANS1512E Scheduled event 'CS-FS-U-02.00' failed.  Return code = 12.
08/02/17   13:54:36 ANS2820E An interrupt has occurred. The current operation will end and the client will shut down.
08/03/17   03:39:20 calloc() failed: Size 31496 File ../mem/mempool.cpp Line 1090
08/03/17   03:39:20 ANS1999E Incremental processing of '/usr/ibm' stopped.

This is a memory issue and can occur if TSM cannot allocate enough memory during the scheduled backup. The interesting thing is that manual backups run without issue. It is only when a scheduled backup is run that we experience the error. The server in question has only 8GB so we do know that real memory is limited so I followed the troubleshooting tips I saw online and checked the file system for excessive files. Querying the file system determined that was not the issue.  I could not exclude the file system and so I added the MEMORYEFFICIENTBACKUP YES option thinking that would resolve the issue. Unfortunately the addition of the MEMORYEFFICIENTBACKUP YES option did not work. Subsequent investigation showed some people had to downgrade their TSM client version to a 6.4 level to resolve the problem. Unwilling to do so I changed the option so it used the disk cache function. So far using disk cache has worked without issue, but it's concerning that a change to the TSM/SP client has created this memory issue. So the fix was to add the following two options to my dsm.sys:

   memoryefficientbackup DISKCACHEM
   diskcachelocation     /tmp

Please note that my /tmp file system is over 2GB in size and only 1% used so make sure you have a sufficiently sized file system you can use if you use the DISKCACHEMETHOD option.