Where I work we have a process that bi-monthly generates a mksysb then archives it to TSM. Recently an attempt to use an archived mksysb found that sometimes the mksysb process does not create a valid file, but it is still archived to TSM. So the other AIX admins asked me to generate a report that would show the amount of data that was archived and on what date it occurred. Now I would have told them it was impossible if they had asked for data from the backup table, but our archive table is not as large as the backups so I gave it a go.
First problem was determining the best table(s) to use. I could use the summary table, but it doesn't tell me what schedule ran and some of these UNIX servers do have archive schedules other than the mksysb process. The idea I came up with was to query the contents table and join it with the archive table using the object_id field. Here's an example of the command:
select a.node_name, a.filespace_name, a.object_id, cast((b.file_size/1048576)as integer(9,2))AS SIZE_MB , cast((a.ARCHIVE_DATE)as date) as ARCHIVE from archives a, contents b where a.node_name=b.node_name and a.filespace_name='/mksysb_apitsm' and a.filespace_name=b.filespace_name and a.object_id=b.object_id and a.node_name like 'USA%'
This select takes at least 20 hours to run across 6 TSM servers. I guess that I should be happy it returns at all, but TSM is DB2! It should be a lot faster, so I am wondering if I could clean up the script or add something that would make the index the data faster??? I am considering dropping the "like" and just matching node_name between the two tables. Would putting node_name matching first then matching object_id be faster? Would I be better off running it straight out of DB2? Suggestions appreciated.