I just recently had an issue with a handful of TSM clients that would not run their backups. The clients all backup to a TSM 5.5.2 server and were all running Windows 2008. The clients use TSM version 6.2.3. The five clients had all been missing their backups for days and what makes the situation more interesting is that there are other Windows 2008 servers with this version of TSM installed and they are all running their schedules without issue.
When reviewing the TSM Schedule log the scheduler listed that it had received the schedule info and was waiting for the TSM server to initiate the schedule. The TSM server never made an attempt to contact the clients in question and never showed any errors other than ANR2578W stating the client missed its schedule. There were no errors in the error log and not much to go by from the TSM server activity log. Even though the TSM client backs up over the public network I switched to polling mode to see if client based initiation of the backup would work. It didn't! The TSM client scheduler would receive the schedule upon polling the TSM server but would never execute it. So now what? I added the TCPCLIENTADDRESS and TCPCLIENTPORT and switched back to SCHEDMODE PROMPTED, still the scheduler would not run backups.
Now I was getting frustrated. I removed the scheduler service and redefined it using dsmcutil and voila, the schedule ran...ONCE! After the initial schedule ran the previous problem returned. Schedules were not running and the TSM server would not show any errors saying it could not contact the client. It just would not run the schedule. Well that left me no choice but to call support. IBM support's response was to make sure the TCPCLIENTADDRESS and TCPCLIENTPORT were defined in the dsm.opt and also to define the client HLADDRESS and LLADDRESS on the TSM server? Define the HL and LL addess? TSM gets that when the client connects doesn't it? Yes and No! It appears that without the optional setting the TSM server can have issues contacting some clients. Why? No idea, but adding the HL and LL address did the trick and the backups have been running without issue since.
How many of you define the HL and LLADDRESS when registering nodes? I've never suspected it was needed until now.
If I remember well in the last 8 years: never.
ReplyDelete;-)
hot damn! finally, the fix! thanks for posting this, Chad - it had really been eating at me. who knew it would be THOSE values? I define those values for nodes that I am doing firewall tricks with - not for my typical day-to-day Windows crap nodes... well, until now, I suppose... grrrr
ReplyDeleteHL and LL is set exactly on 0 of 130+ TSM servers in our scope
ReplyDeleteI had such problems too. It seems to be an issue with the Windows Firewall active on these nodes. Could this be your issue too?
ReplyDeleteI have not needed to enter these values for years. But have been having this issue crop up randomly with the Windows 2008 servers with and without the firewall enabled.
ReplyDeleteHow about the reverse DNS entry? Is it missing (commonly)
ReplyDeletefirst schedule runs fine... but the same problem continues with linux client during the next schedules... even i tried with multiple clientactions...
ReplyDeleteplease help
Hi!, mabe the proble is the comunication between tsm an node, the responsable is a firewall, the comunication between is n9t bidirectional and there is a problem...if is this then en the node in dsm.sys you might set (uncomment) TCPCLIENTPORT and SCHEDMODE let in POLLING, later restart services in the node...
ReplyDelete