Friday, May 15, 2009

Enabling Backup/Archive Statistics and Logging

Enable the processing of all events to the TSM server activity log.

en ev actlog all node=*

Start logging events to the all receivers - and this includes the TSM server activity log.

beg ev all

Adjust the length of time that the activity log retains messages to avoid insufficient or outdated data. Here we set the activity log retention set to 180 days for management by date to 180 days

s actl 180 m=d

The SQL activity summary table contains statistics about each client session and server processes. These statistics are reflected in the daily report produced by Operational Reporting. Adjust the length of time that TSM retains these statistics. Here we set the server to retain the SQL activity summary table information for 60 days.

s sum 60

Using the undocumented server option CLIENTSUMMARYSTATISTICS OFF in the dsmserv.opt file will prevent clients from logging events to the summary table.

The event retention period for event records in the server database allows you to monitor completed schedules. An event record is created whenever processing of a scheduled command is started or missed. You can adjust the length of time that the server maintains event information to avoid insufficient or outdated data. Here we set the event retention period to 180 days.

s ev 180

Issue the QUERY STATUS command and verify the Activity Summary Retention Period, Event Record Retention Period and the Activity Summary Retention Period values.


For the PDF version of this document, send a blank email, with subject line "Enabling Backup/Archive Statistics and Logging", to TSM Assist

Extracting a list of failed files from the Activity Log

Enable the processing of all events to the TSM server activity log.

en ev actlog all node=*

Start logging events to the all receivers - and this includes the TSM server activity log.

beg ev all

Adjust the length of time that the activity log retains messages to avoid insufficient or outdated data. Here we set the activity log retention set to 180 days for management by date.

s actl 180 m=d

To display all client originating failure messages, run:

select nodename, date(date_time) as DATE, time(date_time) as TIME, msgno, message from actlog where originator='CLIENT' and severity='E' order by 1,2

As of TSM Server 3.7, the DateFormat option in the Server Options file has been deprecated. The date format is now governed by the Locale in which the TSM server is running.

Note that TSM keeps only one version of an event record in the database. If a client schedule is changed, all previous event records for that schedule are removed from the database.


For the PDF version of this document, send a blank email, with subject line "Extracting a list of failed files from the Activity Log", to TSM Assist

Monday, May 11, 2009

SCRATCH volumes become PRIVATE when checked in

After volumes were checked in the library as SCRATCH, TSM changed their status to PRIVATE.


Inspect the activity log for ANR8356E and ANR8778W.

q ac s=ANR8356E
q ac s=ANR8778W



If you have surpassed your activity log retention period, the SELECT statement below will identify these tapes.

select volume_name from libvolumes where status='Private' and last_use is Null and volume_name not in (select volume_name from volumes) and volume_name not in (select volume_name from volhistory where type in ('BACKUPFULL', 'BACKUPINCR', 'DBSNAPSHOT', 'DBDUMP'))



This situation typically happens when volumes were previously labelled in a different TSM server; or when a tape what should have been checked in as SCRATCH is checked in as PRIVATE.


The resolution is to check in the volumes using the following command:

label libv <LIBR_NAME> labels=b search=b checkin=scr overwrite=y waitt=0


For the PDF version of this document, send a blank email, with subject line "SCRATCH volumes become PRIVATE when checked in", to TSM Assist

Thursday, April 23, 2009

Finding ‘lost’ Tape Volumes

The DELETE VOLHISTORY command deletes volume history file records that are no longer needed (for example, records for obsolete database backup volumes).

When you delete records for volumes that are not in storage pools (for example, database backup or export volumes), the volumes return to scratch status if TSM acquired them as scratch volumes. Scratch volumes of device type FILE are deleted. When you delete the records for storage pool volumes, the volumes remain in the TSM database.

For users of DRM, the database backup expiration should be controlled with the SET DRMDBBACKUPEXP command instead of this DELETE VOLHISTORY command. Using the DELETE VOLHISTORY command removes TSM's record of the volume. This can cause volumes to be lost that were managed by the MOVE DRMEDIA command. The following bash script identifies these volumes:

#!/bin/bash
# --------------------------------------------------------
#
# Description: 'Missing Tapes' volumes.
# Date: 29th March 2007
# Queries: A Singh - singh.ajith@gmail.com
#
# --------------------------------------------------------


# Update this with the highest value of the volume labels.
MAX_VOLUMES=1300


# Update this with the lowest value of the volume labels.
MIN_VOLUMES=1000


# --------------------------------------------------------


# Tape Label Parameters - update as necessary
PREFIX="BL"
SUFFIX="L3"
# length of label excluding PREFIX and SUFFIX
LABEL_LENGTH=4
ZERO="0"


# --------------------------------------------------------


# TSM Server administrator account details - update as necessary
DSM_DIR=/opt/tivoli/tsm/client/ba/bin
DSM_ADMIN=admin
DSM_PWD=secret
DSM_CMD="$DSM_DIR/dsmadmc -id=$DSM_ADMIN -pa=$DSM_PWD -datao=y"


# --------------------------------------------------------


test -x $DSM_DIR/dsmadmc { echo "TSM Client Administrative CLI not installed."; if [ "$1" = "stop" ]; then exit 0; else exit 5; fi }


DATA_VOLS_SQL="select volume_name from volumes order by 1 asc"
VOLH_SQL="select volume_name from volhistory order by 1 asc"
LIBVOLS_SQL="select volume_name from libvolumes order by 1 asc"


MISSING_VOLS=" "
DATA_VOLS=`$DSM_CMD $DATA_VOLS_SQL`
VOLH_VOLS=`$DSM_CMD $VOLH_SQL`
LIB_VOLS=`$DSM_CMD $LIBVOLS_SQL`


DSM_VOLS=`echo $DATA_VOLS $VOLH_VOLS $LIB_VOLS sort uniq`

for (( i=$MIN_VOLUMES; i<=$MAX_VOLUMES; i++ )) do tmpvol="$i" for (( j=${#tmpvol}; j<$LABEL_LENGTH; j++ )); do tmpvol=$ZERO$tmpvol; done tmpvol=$PREFIX$tmpvol$SUFFIX MISSING_VOLS=" "$tmpvol$MISSING_VOLS done for i in $DSM_VOLS do MISSING_VOLS=`(for j in $MISSING_VOLS; do echo $j; done) grep -v $i` done echo "'MISSING' TAPE VOLUMES" echo "----------------------" echo echo "This is a list of tape volumes that are not in the tape libraries and are not listed in the volume history file and the TSM volumes list." echo echo $MISSING_VOLS tr [" "] ["\n"] # --------------------------------------------------------

Monday, April 20, 2009

Recovery Log Pinning

It is possible that the recovery log appears to be out of space when in fact it is being pinned by an operation or combination of operations on the server. A pinned recovery log is where space in the recovery log cannot be reclaimed and used by current transactions because an existing transaction is processing too slowly or is hung.

To determine if the recovery log is pinned, issue SHOW LOGPINNED repeatedly over many minutes. If this reports the same client session or server processes as pinning the recovery log, it may be necessary to take action to cancel or terminate that operation in order to keep the recovery log from running out of space.

To cancel or terminate a session or process that is pinning the recovery log, issue SHOW LOGPINNED CANCEL. Server version 5.1.7.0 and above as well as 5.2.0.0 and above have additional support for the recovery log to automatically recognize that the recovery log is running out of space and where possible to detect and resolve a pinned recovery log using the SHOW LOGPINNED processing.


For the PDF version of this document, send a blank email, with subject line "Recovery Log Pinning", to TSM Assist

Sunday, April 19, 2009

Delaying the Re-use of Tape Storage Pools

The REUSEDELAY attribute of a sequential access (tape or file disk pools) storage pool the number of days that must elapse before a volume can be reused or returned to scratch status, after all files have been expired, deleted, or moved from the volume.

When you delay reuse of such volumes and they no longer contain any files, they enter the pending state. Volumes remain in the pending state for as long as specified with the REUSEDELAY parameter for the storage pool to which the volume belongs. Server internals will take care of finally deleting the Pending Volume from the storage pool when its time is up.

Delaying reuse of volumes can be helpful under certain conditions for disaster recovery. When TSM expires, deletes, or moves files from a volume, the files are not actually erased from the volumes: the database references to these files are removed. Thus the file data may still exist on sequential volumes if the volumes are not immediately reused.

If a disaster forces you to restore the TSM database using a database backup that is old or is not the most recent backup, some files may not be recoverable because TSM cannot find them on current volumes. However, the files may exist on volumes that are in pending state. You may be able to use the volumes in pending state to recover data by doing the following:

1. Restore the database to a point-in-time prior to file expiration.
2. Use a primary or copy storage pool volume that has not been rewritten and contains the expired file at the time of database backup.

If you back up your primary storage pools, set the REUSEDELAY parameter for the primary storage pools to 0 to efficiently reuse primary scratch volumes. For your copy storage pools, you should delay reuse of volumes for as long as you keep your oldest database backup. No useful purpose is served by setting REUSEDELAY to a value dramatically larger than the Retention period for Database backups.

Volumes in a storage pool with a non-zero REUSEDELAY may not remain in the storage pool for the REUSEDELAY period if access is set to destroyed. If REUSEDELAY is set to zero (zero is the default), this problem does not apply. Volumes which are in a destroyed state will be immediately deleted from the storage pool and set to scratch once they have been restored or deleted. Try to avoid updating a volume's access to DESTROYED, use UNAVAILABLE instead.

The TSM database retention period is specified using the SET DRMDBBACKUPEXPIREDAYS. By specifying this value to the REUSEDELAY period in the copy pool definition ensures that the database can be restored to an earlier level and database references to files in the storage pool are still valid.


For the PDF version of this document, send a blank email, with subject line "Delaying the Re-use of Sequential Access Volumes", to TSM Assist

Thursday, April 16, 2009

Define a RAW volume to TSM

One of the main advantages of disk pools is the timing of send high loads to your tape drives.
Within TSM, there are three types of disk pools: Random Access Disk Pools (of device class DISK), File Disk Pools (of device class FILE) – files on hard drives that store data sequentially as on tape, and RAW Disk Pools.

The 3 types differ in the use and the performance you can reach. Best performance for large file migrations is found in RAW volumes. Random access disk pools are best for small files. In the middle, we find file disk pools which have the advantage of sequential read and write operations which make it better than random access disk pools.

The size of each volume within a disk pool seems to be very important within TSM. To improve performance, reduce the size of and increase the count of the volumes. Furthermore, and only on random access volumes, a single corrupt volume can be taken varied to offline without halting operations to the entire storage pool.

To define a RAW volume to TSM, follow these steps:


1. Prepare a raw volume using Operating System commands; raw volume ls_name and platform AIX is used here.

2. Define to a storage pool:

def v stgp_name /dev/rls_name [ /code ]

3. Define as a TSM database volume:

def dbv /dev/rls_name

4. Define as a TSM log volume:

def logv /dev/rls_name

For the PDF version of this document, send a blank email, with subject line "Define a RAW volume to TSM", to TSM Assist

Monday, April 13, 2009

TSM Server-Side Daily Administrator Checklist

1. List TSM license compliance.

audit lic
select compliance from licenses


2. Query server processes and pending requests to determine if any jobs are waiting on operator action.

q pr
q req
q se

3. Query all disk storage pools to determine if the migration process has completed.

select stgpool_name, pct_utilized from stgpools where devclass='DISK'

4. List all drives that are OFFLINE.

select drive_name from drives where not online='YES'

5. List all paths that are OFFLINE.

select source_name, source_type, destination_name, destination_type from paths where not online='YES'

6. List all locked nodes.

select node_name from nodes where not locked='NO'

7. List all non-writeable tape and disk volumes.

q v acc=unavail
q v acc=reado
q v acc=destroyed

select volume_name, read_errors, write_errors from volumes where (read_errors>0 or write_errors>0)

select volume_name from volumes where devclass_name='DISK' and not status='ONLINE'


8. Verify that the library has sufficient scratch volumes.

select library_name,status,count(*) as "VOLUMES" from libvolumes group by library_name,status

9. Verify that the database extension and reduction values are non-zero and that the Cache Hit Ration is above 99%.

q db f=d

10. Verify that the recovery log extension and reduction values are non-zero and that the Wait Percentage is zero.

q log f=d


11. Verify that database and recovery log volumes are online and synchronized.

q dbv f=d
q logv f=d


12. Inspect TSM database fragmentation level.

select cast((100 - (cast(max_reduction_mb as float) * 256 ) / (cast(usable_pages as float) - cast(used_pages as float) ) * 100) as decimal(4,2)) as PERCENT_FRAG from db

13. Verify that the scheduled database backups completed successfully.

select date (date_time) as date, time(date_time) as time, volume_name, type from volhistory where type in ('BACKUPFULL', 'BACKUPINCR', 'DBSNAPSHOT', 'DBDUMP')

14. Verify that all CLIENT schedules for the last day succeeded.

q ev * * begind=-1 endd=today begint=00:00:00 endt=00:00:00

To restrict the listing to only those nodes with non-completed status:

q ev * * begind=-1 endd=today begint=00:00:00 endt=00:00:00 ex=y

15. Verify that all ADMINISTRATIVE schedules for the last day succeeded.

q ev * t=a begind=-1 endd=today begint=00:00:00 endt=00:00:00

To restrict the listing to only those nodes with non-completed status:

q ev * t=a begind=-1 endd=today begint=00:00:00 endt=00:00:00 ex=y

16. Check the activity log for error messages.

q actl search=AN?????E begind=-1 begint=00:00 endd=today endt=00:00

17. Open files and other missed filed will often not have the schedule name in activity log error messages. This query will list these files:

select nodename,date_time,message from actlog where (date_time>currenttimestamp-1 day) and msgno in (4005,4007,4018,4037,4046,4047,4987,4973,4034,4042)


18. List nodes that are not associated with a backup schedule.

select node_name from nodes where node_name not in (select node_name from associations)

19. Cross match the TSM node name with the host name or computer name.

select node_name, tcp_address, tcp_name from nodes

20. List PRIMARY POOL volumes that have been checked out of the library.

select volume_name, stgpool_name from volumes where stgpool_name in (select stgpool_name from stgpools where devclass<>'DISK' and pooltype='PRIMARY') and volume_name not in (select volume_name from libvolumes)

21. Checkout all D/R Media for offsite storage.

move drm * wherest=mo tost=va rem=b

22. Verify that all D/R volumes have been checked out.

select volume_name from libvolumes where volume_name in (select volume_name from volumes where stgpool_name in (select stgpool_name from stgpools where devclass<>'DISK' and pooltype='COPY'))

23. Verify that all TSM database backup volumes have been checked out.

select volume_name from libvolumes where last_use='DbBackup'

24. Identify previous offsite volumes that can be recycled to scratch status and checkin the same.

q drm wherest=vaultr
move drm * wherest=vaultr tost=onsite
checki libv checkl=b stat=scr search=b wait=0


25. Generate a list of unlocked TSM administrator accounts with full system privileges.

select admin_name from admins where not system_priv='No' and not locked='No'

26. List TSM Nodes and Client (BA/TDP) versions by platform.

select platform_name as OS, client_os_level as OS_VER, node_name as Node, cast(cast(client_version as char(2)) '.' cast(client_release as char(2)) '.' cast(client_level as char(2)) '.' cast(client_sublevel as char(2)) as char(15)) as "TSM Client" from nodes order by platform_name, "TSM Client", Node

27. Data backed up in the last 24 hours:

select entity, date(start_time) as DATE, time(start_time) as START_TIME, time(end_time) as END_TIME, substr(char(end_time-start_time),3,8) as DURATION, cast((bytes/1024/1024/1024) as decimal(18,2)) as GB_BACKED_UP, successful from summary where cast((current_timestamp-start_time) hours as decimal)<24>=current_timestamp-24 hours and activity='BACKUP' group by entity

28. Size and duration of archive operations for each node in the last 24 hours:

select entity as "Node Name ", cast(sum(bytes/1024/1024) as decimal(10,3)) as "Total MB", substr(cast(min(start_time) as char(26)),1,19) as "Date/Time ", cast(substr(cast(max(end_time)-min(start_time) as char(20)),3,8) as char(8)) as "Length " from summary where start_time>=current_timestamp-24 hours and activity='ARCHIVE' group by entity

29. Compare PRIMARY and COPY pool occupancy totals.

select sum(num_files) as num_of_files,sum(physical_mb) as Physical_mb,sum(logical_mb) as logical_mb from occupancy where stgpool_name in (select stgpool_name from stgpools where pooltype='PRIMARY')

select sum(num_files) as num_of_files,sum(physical_mb) as Physical_mb,sum(logical_mb) as logical_mb from occupancy where stgpool_name in (select stgpool_name from stgpools where pooltype='COPY')



For the PDF version of this document, send a blank email, with subject line "TSM Server-Side Daily Admistrator Checklist", to TSM Assist

Running a TSM Library Audit

The AUDIT LIBR command synchronizes the TSM server’s library volume inventory with volumes that are physically located in an automated library. If TSM detects inconsistencies, it updates it inventory to reflect the current state of the library: missing volumes are removed from the server inventory list (q libv). The server does not automatically add new volumes; you must check in new volumes with the CHECKIN LIBVOLUME command.
When running a library audit, it is usually a good idea that the library is inactive:

1. Use the DISABLE SE command to prevent starting new client node sessions.
2. Use the QUERY SE command to identify any existing administrative and client node sessions.
3. Use the CANCEL SE command to cancel any existing administrative or client node sessions.
4. Use the Q PR command to identify active background processes.
5. Use the CANCEL PR command to cancel any active background processes.
6. Use the Q MO command to identify the status of any mounted tape volumes.
7. Use the DISMOUNT VOL command to dismount idle volumes.

With the library inactive, run the AUDIT LIBR command with the switch CHECKL=b. This switch is optional, but it will make the audit run much faster. This audit involves your robot scanning the barcode labels of all tapes. If the robot cannot read the barcode label or the barcode label is missing, TSM mounts the tape to read the label.

AUDIT LIBR CHECKL=B

The default action is to mount each tape to identify the volume. The audit runs until all tapes are dismounted.

Lastly, checkin any new volumes (first for SCRATCH volumes, then for PRIVATE volumes) that the audit process may discover:

CHECKIN LIBV CHECKL=B STAT=SCR SEARCH=Y WAITT=0

CHECKIN LIBV CHECKL=B STAT=PRI SEARCH=Y WAITT=0

End this process, by running the ENABLE SE command to enable new client node sessions.


For the PDF version of this document, send a blank email, with subject line "Running a TSM Library Audit", to TSM Assist

Sunday, April 12, 2009

Halting the TSM Server

The HALT command forces an abrupt shutdown, which cancels all the administrative and client node sessions even if they are not completed. Any transactions in progress interrupted by the HALT command are rolled back when you restart the server.

Use the HALT command only after the administrative and client node sessions are completed or cancelled. To shut down the server without severely impacting administrative and client node sessions, perform the following steps:

  1. Use the DISABLE SE command to prevent starting new client node sessions.
  2. Use the QUERY SE command to identify any existing administrative and client node sessions.
  3. Use the CANCEL SE command to cancel any existing administrative or client node sessions.
  4. Use the Q PR command to identify active background processes.
  5. Use the CANCEL PR command to cancel any active background processes.
  6. Use the Q MO command to identify the status of any mounted tape volumes.
  7. Use the DISMOUNT VOL command to dismount idle volumes.
  8. With no existing administrative and client node sessions, no active background processes and no mounted volumes, run the HALT command to shut down the TSM server.

For the PDF version of this document, send a blank email, with subject line "Halting the TSM Server", to TSM Assist