The AIX Error Logging Facility (explored in the article 'The AIX Error Logging Facility" published in the Supplement to the June 2001 issue of Sys Admin Magazine) provides the administrator of an RS/6000 with unparalleled monitoring and reporting of the health and general welfare of system components, often providing warning of impending problems with ample time to take action before the problem causes unscheduled downtime or data loss. In addition to the error logging and analysis available, Error Notification (errnotify) objects enable the sysadmin to automate troubleshooting and problem resolution, reducing the amount of time and resources required to monitor the error log as well as the risk of "missing" a vital message.

An errnotify object is a "hook" into the error logging facility that causes the execution of a program whenever an error message is recorded that matches user-defined criterion. By using simple "one-liners" or complex scripts, any number of actions can be performed that can notify the administrator, perform analysis of sense data received from a device, or run system-level diagnostics.

Creating an Error Notification object

Error Notification (errnotify) objects are installed by creating a text file with the properly formatted contents of the object, and then adding it to the "errnotify" class of the ODM via the "odmadd" command. Each errnotify object should have the following descriptors (note that there are other descriptors available, but are not required for creating basic errnotify objects):

Table 1: errnotify object class descriptors
Descriptor Value Description
en_name Name (text string) Specifies the name that will be used to identify this object within the errnotify class.
en_class

H (hardware errors)
S (software errors)
O (operator messages generated by the errlogger command)
U (undetermined)

Specifies the class of the error log entries to match. If not included in the object, or if defined as a null string, all classes of errors will be matched.
en_crcid Identifier (text string) Specifies the unique identifier for error messages to match. Valid identifiers can be viewed with the errpt -t command.
en_label Label (text string) Specifies the label associated with a particular identifier.Valid labels can be viewed with the errpt -t command.
en_persistenceflg 0 (non-persistent)
1 (persistent)
Specifies whether this object should be deleted from the errnotify class upon system restart. This is useful for objects that take action on a process identified by its PID.
en_pid Process ID (numeric value) Specifies a process ID for use in identifying the Error Notification object. Object that have the en_pid descriptor specifies should also have the en_persistenceflg descriptor set to "0".
en_rclass Device Class (text string) Specifies the device class for hardware resources to match.
en_type INFO (informational)
PEND (impending loss of resource)
PERM (permanent)
TEMP (temporary)
UNKN (unknown)
Specifies the severity of error log entries to match.
en_method Path and arguments to an executable program Specifies the program to be run upon successful match of an error log entry. Parameter expansion for arguments is detailed in Table 2.

Note that if a descriptor that is used to match part of an error log entry is not included in the object class, or if its value is a null string, that descriptor will match all possible values.

The most important descriptor is "en_method", as it holds the command that is to be executed each time an error that matches this class. A number of parameter are made available to the "en_method", which may be passed as arguments to the specified program.

The parameter and a description of their contents are:

Table 2: en_method parameters
Parameter

Description

$1 Sequence number from the error log entry
$2 Error ID from the error Log entry
$3 Class from the error log entry
$4 Type from the error log entry
$5 Alert flags from the error log entry
$6 Resource name from the error log entry
$7 Resource type from the error log entry
$8 Resource class from the error log entry
$9 Error label from the error log entry

A Simple Example: "mailroot"

To add an Error Notification object that sends an e-mail to root each time an error of any type is added to the error log, create a file "/tmp/mailroot" with the following contents:

errnotify:
en_name = "mailroot"
en_persistenceflg = 1
en_method = "/usr/bin/errpt -a -l $1 | mail -s \"errpt: $9\" root"

After saving that file, run the command "odmadd /tmp/mailroot", and the object will be added to the "errnotify" ODM class. To verify that the object was installed correctly, run the command "odmget -q 'en_name=mailroot' errnotify", and the contents of the object will be displayed.

Once the above errnotify object is installed, an e-mail will be sent to root for every new entry in the error log. To confirm this, run the command "errlogger 'this is a test"', and root will receive an e-mail with the subject "errpt: OPMSG", containing the contents of the error log entry.

If, at some point, you wish to remove this object, execute the command "odmdelete -q 'en_name=mailroot' -o errnotify", and the object will be deleted from the ODM.

An Advanced Example: "cinnamon"

Several of our RS/6000s are deployed as TSM servers and have multiple IBM Magstar 3590 Tape Drives attached. These "intelligent" tape drives communicate to the host messages about the condition of the drive itself (a "SIM", or System Information Message) and of the tapes that are used by it (a "MIM", or Media Information Message.) These messages are processed in different ways, depending on the type of host that the drive is connected to. On AIX systems, SIM and MIM messages are recorded in the error log, with the identifier "D1A1AE6F" and the label "SIM_MIM_RECORD_3590".

---------------------------------------------------------------------------
LABEL: SIM_MIM_RECORD_3590
IDENTIFIER: D1A1AE6F
Date/Time: Mon Sep 17 09:03:21
Sequence Number: 151433
Machine Id: 00018294A400
Node Id: rescue
Class: H
Type: INFO
Resource Name: rmt6
Resource Class: tape
Resource Type: 3590
Location: 00-13-01-1,0
VPD:
Manufacturer................IBM
Machine Type and Model......03590E1A
Serial Number...............000000031230
Device Specific.(FW)........D25D
Loadable Microcode Level....A0B00E26
Description
AAA0
Probable Causes
TAPE DRIVE
Failure Causes
TAPE DRIVE
Recommended Actions
REFER TO PRODUCT DOCUMENTATION FOR ADDITIONAL INFORMATION
Detail Data
DIAGNOSTIC EXPLANATION
3100 0044 0000 6140 0130 3030 3030 3030 3233 3245 3431 3030 3737 3230 3030 3030
4438 3338 3036 3338 3036 3830 3030 4942 4D31 332D 3030 3030 3030 3033 3132 3330
3033 3539 3045 3141

Unfortunately, the information provided in the SIM or MIM message is encoded within a 144 character hexadecimal string, making it difficult to determine whether the message contains information about damaged media, or if it is a simple notification that the drive was cleaned. In order to make these messages more useful, I wrote a script that is invoked by the Error Notification daemon as an errnotify method.

Each time a message is recorded in the error log with the "D1A1AE6F" identifier, the script (named "cinnamon") is invoked, with the sequence number of the error log entry passed as an argument. The script then retrieves the complete entry from the error log, parses the encoded message to determine the severity and contents, and if the the severity is higher then a specified threshold, a e-mail is sent containing a "readable" version of the error message.

To add the object for the cinnamon script, create the text file /tmp/cinnamon with the following contents, and add it to the ODM with the command "odmadd /tmp/cinnamon".

errnotify:
en_name = "cinnamon"
en_persistenceflg = 1
en_label = "SIM_MIM_RECORD_3590"
en_class = "H"
en_type = "INFO"
en_method = "/usr/local/sbin/cinnamon $1"

After adding the above stanza to the errnotify ODM class, each SIM or MIM message that is received and is higher than the severity level defined in the script will be processed and mailed to the specified addresses.

Subject:  SIM posted by rmt6: Device Degraded
Sequence Number : 151433
Host : rescue
Drive : rmt6
Model : E1A
Microcode : 232E
Message Type : SIM
Message Code : Device Degraded
Severity : 2 -- Serious
First FSC : 3806
Last FSC : 3806 Raw Sense Data:
310000440000614001303030303030303233324534313030373732303030303044383338
3036333830363830303049424D31332D3030303030303033313233303033353930453141
------------------------------------------------------------------------

Summary

The Error Logging Facility is one of the features that helps AIX "stand-out" from other Unix platforms. By making use of the Error Notification object class, administrators of AIX systems can reduce the amount of time that they spend monitoring their systems, can automate solutions to common problems, and improve the overall availability of their systems.

References

"The AIX Error Logging Facility", published in the AIX Supplement to the June 2001 issue of Sys Admin Magazine, is available online at <http://www.samag.com/documents/s=1150/sam0106a/0106a.htm>.

Chapter 4 of the IBM manual "General Programming Concepts: Writing and Debugging Programs" describes Error Notification, and was the primary source of information for this article. It can be read online at <http://www.rs6000.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/genprogc/toc.htm>.

Additional examples of Error Notification object classes can be found in "/usr/samples/findcore", installed by the fileset bos.sysmgt.serv_aid, and in several documents from the IBM TechDocs website at <http://techsupport.services.ibm.com/rs6k/techbrowse/>.

About the Author

Sandor W. Sklar is a Unix Systems Administrator at Stanford University, in California.


Sidebar: The Object Data Manager


The Object Data Manager ("ODM") is a system-wide database used by AIX to store various device configuration, system resource information, installed product data, other information. It consists of files located in the directories "/usr/lib/objrepos", "/usr/share/lib/objrepos", and "/etc/objrepos", and is comprised of "objects" and "classes". The class that stores error notification objects is called "errnotify", and is located in the /usr/lib/objrepos directory.

Administrators should not manipulate the ODM at the file-level; instead, the ODM commands ("odmadd", "odmdelete", "odmshow", etc.) should be used.


Listing 1: cinnamon


#!/usr/local/bin/perl -w
#=====================================================================
# cinnamon -- a perl script that translates the sense data from SIM
#             and MIM messages posted by IBM 3590 tape drives into
#             human-readable format, and sends the messages via email
#---------------------------------------------------------------------
# $Id: errnotify.html,v 1.1 2001/09/23 05:29:57 ssklar Exp $
#=====================================================================

use strict;  $|++;

#=====================================================================
# USER-DEFINABLE VALUES
#=====================================================================

# the variable "$recipient" should be set to a comma-separated list of
# addresses to whom this script will send the parsed SARS email to.
# Note:  don't forget to backslash any "@" signs, or the script will die.
# If this variable is not set, all mail will be sent to root.

my $recipient = "";

# the variables "$min_sim_sev" and "$min_mim_sev" should be set to the
# minimum severity value that emails should be sent for.  Note that
# "1" is the highest severity for both MIMs and SIMs, while "4" is the
# lowest value for SIMs and "3" is the lowest value for MIMs.  If these
# variables are not set, email will be sent for messages at all
# severity levels.

my $min_sim_sev = "";

my $min_mim_sev = "";

#======================================================================
# END OF USER-DEFINABLE VALUES
#======================================================================

#----------------------------------------------------------------------
# error checking and defaults setting ...
#----------------------------------------------------------------------

die "cinnamon is useful only on AIX systems.  Sorry.\n" unless ($^O =~ /aix/);

$recipient = "root" unless $recipient;

$min_sim_sev = "4"  unless ($min_sim_sev =~ /\d/);

$min_mim_sev = "3"  unless ($min_mim_sev =~ /\d/);;

#----------------------------------------------------------------------
# the sequence number of the error log entry that we were invoked for
# will be passed as the single argument; make sure that is nothing 
# other then six digits ...
#----------------------------------------------------------------------

chomp (my $sequence_number = shift);

unless ($sequence_number =~ /^\d+$/)
	{ die "cinnamon: error log sequence number needed as argument\n" };

#----------------------------------------------------------------------
# read in the full unformatted error log entry with the specified
# sequence number ...
#----------------------------------------------------------------------

open (ERROR, "/usr/bin/errpt -g -l $sequence_number |");

#------------------------------------------------------------------
# pull out the detail data from the error log entry ...
#------------------------------------------------------------------

my %message;

while () {

	$message{host}   = (split)[1], next if /^el_nodeid/;
	$message{drive}  = (split)[1], next if /^el_resource/;
	$message{detail} = (split)[1], last if /^el_detail_data/;
		
};

close (ERROR);

#------------------------------------------------------------------
# make sure that there is 144 digits in $message{detail};  if not,
# something went wrong, so die ...
#------------------------------------------------------------------

die "cinnamon: incomplete or incorrect error log values retrieved\n"
	unless (length($message{detail}) == 144);
	
#------------------------------------------------------------------
# get the "Machine Type" and convert it from hex to ascii ...
#------------------------------------------------------------------

$message{machine_type} = pack ("H*", substr($message{detail}, 128, 10));

#------------------------------------------------------------------
# get the "Model" and convert it from hex to ascii ...
#------------------------------------------------------------------

$message{model} = pack ("H*", substr($message{detail}, 138, 6));

#------------------------------------------------------------------
# get the "Model and Microcode Level" and convert it from hex
# to ascii ...
#------------------------------------------------------------------	
	
$message{mml} = pack ("H*", substr($message{detail}, 32, 8));

#------------------------------------------------------------------
# get the "Message Code" and look up it's meaning ...
#------------------------------------------------------------------

my %message_code = (

	3030	=>	"No Message",
	3430	=>	"Operator Intervention Required",
	3431	=>	"Device Degraded",
	3432	=>	"Device Hardware Failure",
	3433	=>	"Service Circuits Failed, Operations not Affected",
	3535	=>	"Clean Device",
	3537	=>	"Device has been cleaned",
	3630	=>	"Bad Media, Read-Only Permitted",
	3631	=>	"Rewrite Data if Possible",
	3632	=>	"Read Data if Possible",
	3634	=>	"Bad Media, Cannot Read or Write",
	3732	=>	"Replace Cleaner Cartridge"

);

$message{code} = $message_code{substr($message{detail}, 40, 4)} || "UNKNOWN";

#------------------------------------------------------------------
# determine if we're dealing with a SIM or a MIM ...
#------------------------------------------------------------------

if (substr($message{detail}, 16, 2) eq "01") {

	#--------------------------------------------------------------
	# it's a SIM ...
	#--------------------------------------------------------------
	
	$message{type} = "SIM";
		
	#--------------------------------------------------------------
	# convert the FID Severity Code into something meaningful ...
	#--------------------------------------------------------------
			
	my %fid_severity_code = (
		33		=>	"1 -- Acute",
		32		=>	"2 -- Serious",
		31		=>	"3 -- Moderate",
		30		=>	"4 -- Service"
	);
	
	$message{severity} = $fid_severity_code{substr($message{detail}, 52, 2)} || "UNKNOWN";

	#--------------------------------------------------------------
	# if the severity of the SIM is not greater than $min_sim_sev,
	# exit now ...
	#--------------------------------------------------------------
	
	exit 0 unless (substr($message{severity}, 0, 1) <= $min_sim_sev);
	
	#--------------------------------------------------------------
	# get the FID (FRU Identification Number), and convert it from
	# hex to ascii ...
	#--------------------------------------------------------------
	
	$message{fid} = pack ("H*", substr($message{detail}, 64, 4));
		
	#--------------------------------------------------------------
	# get the "First FSC" (Fault Symptom Code), and convert it from
	# hex to ascii ...
	#--------------------------------------------------------------
	
	$message{first_fsc} = pack ("H*", substr($message{detail}, 68, 8));
	
	#--------------------------------------------------------------
	# get the "Last FSC" (Fault Symptom Code), and convert it from
	# hex to ascii ...
	#--------------------------------------------------------------
	
	$message{last_fsc} = pack ("H*", substr($message{detail}, 76, 8));
	
} else {

	#--------------------------------------------------------------
	# it's a MIM ...
	#--------------------------------------------------------------
	
	$message{type} = "MIM";
	
	#--------------------------------------------------------------
	# convert the MIM Severity Code into something meaningful ...
	#--------------------------------------------------------------
			
	my %mim_severity_code = (
		31		=>	"3 -- Moderate: high temporary read or write errors have occurred",
		32		=>	"2 -- Serious:  permanent read or write errors have occurred",
		33		=>	"1 -- Acute: tape directory errors have occurred"
	);
	
	$message{severity} = $mim_severity_code{substr($message{detail}, 52, 2)} || "UNKNOWN";
	
	#--------------------------------------------------------------
	# if the severity of the MIM is not greater than $min_mim_sev,
	# exit now ...
	#--------------------------------------------------------------
	
	exit 0 unless (substr($message{severity}, 0, 1) <= $min_mim_sev);
	
	#--------------------------------------------------------------
	# get the VOLSER (Volume Serial Number), and convert it from
	# hex to ascii ...
	#--------------------------------------------------------------
	
	$message{volser} = pack ("H*", substr($message{detail}, 68, 12));

};

#------------------------------------------------------------------
# format the data and store it in the array @mail ...
#------------------------------------------------------------------

my @mail;

push (@mail, sprintf("Subject:  %s posted by %s: %s\n", $message{type}, $message{drive}, $message{code}));

push (@mail, sprintf("%-16s: %-20s\n", "Sequence Number", $sequence_number));
push (@mail, sprintf("%-16s: %-20s\n", "Host",            $message{host}));
push (@mail, sprintf("%-16s: %-20s\n", "Drive",           $message{drive}));
push (@mail, sprintf("%-16s: %-20s\n", "Model",           $message{model}));
push (@mail, sprintf("%-16s: %-20s\n", "Microcode",       $message{mml}));
push (@mail, sprintf("%-16s: %-20s\n", "Message Type",    $message{type}));
push (@mail, sprintf("%-16s: %-20s\n", "Message Code",    $message{code}));
push (@mail, sprintf("%-16s: %-20s\n", "Severity",        $message{severity}));

if ($message{type} eq "SIM") {

	push (@mail, sprintf("%-16s: %-20s\n", "First FSC",   $message{first_fsc}));
	push (@mail, sprintf("%-16s: %-20s\n", "Last FSC",    $message{last_fsc}));
	
} else {

	push (@mail, sprintf("%-16s: %-20s\n", "VOLSER",       $message{volser}));
	
};

push (@mail, "\n\nRaw Sense Data:\n$message{detail}\n" . "-" x 72 . "\n\n");

#------------------------------------------------------------------
# open a pipe to sendmail and sent the message ...
#------------------------------------------------------------------

open (SENDMAIL, "|/usr/sbin/sendmail $recipient") or
	die "cinnamon: couldn't open sendmail: $!";
	
print SENDMAIL @mail;

close (SENDMAIL);

exit 0;

#======================================================================
# PROGRAM DOCUMENTATION: Run "perldoc cinnamon" to view ...
#======================================================================

=pod

=head1 NAME

B -- an errnotify object method that translates the sense data posted to the AIX error log by an IBM 3590 tape drive (a SIM or a MIM) into a readable format, and mails it to a specified address

=head1 DESCRIPTION

B (so named because I thought it sounded like "sim-mim-mon", my original name for the program) parses and mails AIX error log entries posted with the identifier B, which is the ERROR ID for B.

SIM and MIM records are part of the "Statistical Analysis and Reporting System" (SARS), and are messages created by IBM 3590 tape drives that report on the condition of the drive (a SIM) or of the medium (a MIM).  These records are presented by the operating system in different ways.  In AIX, SIMs and MIMs are recorded in the error log, the actual information encoded into a 144 character hexadecimal string.

=head1 CONFIGURATION

There are three user-definable values that can be set at the beginning of this script.  If they are not defined, default values will be used, as described below.

=item B<$recipient>

The variable B<$recipient> may be set to one or more e-mail addresses to which the output of this script will be mailed.  Any "@" signs in the string B be back-slash protected; multiple addresses should be separated by commas, with all addresses inside a single set of double-quotes.

If this variable is not set, the output of the script will be mailed to "root".

=item B<$min_sim_sev>

The variable B<$min_sim_sev> defines the lowest severity level of SIM messages that will be parsed and mailed.  The severity level for SIMs range from "4" (a "Service" type message, the lowest severity) to "1" (an "Acute" problem, probably resulting from hardware failure.)  To have the script parse and mail only SIMs with a severity of "1" or "2", define $min_sim_sev to "2".

If this variable is not set, SIMs of all severity levels will be parsed and mailed.

=item B<$min_mim_sev>

The variable B<$min_mim_sev> defines the lowest severity level of MIM messages that will be parsed and mailed.  The severity level for MIMs range from "3" (a "Moderate", temporary error) to "1" (an "Acute" problem, resulting from tape directory errors.)  To have the script parse and mail only MIMs with a severity of "1" or "2", define $min_mim_sev to "2".

If this variable is not set, MIMs of all severity levels will be parsed and mailed.

=head1 USAGE

This program is designed to be used as an B method added to the ODM, so that it will be invoked by the system each time an errpt entry is logged that matches the descriptor values of a 3590 SIM or MIM message.

To create the B, save the following text to the file B:

  errnotify:
    en_name = "cinnamon"
    en_persistenceflg = 1
    en_label = "SIM_MIM_RECORD_3590"
    en_class = "H"
    en_type = "INFO"
    en_method = "/usr/local/bin/perl /usr/local/sbin/cinnamon $1"
		
(Note: use the proper paths to your perl executable and to this program in the above "en_method" line.)

After saving the above text, run the command:

	odmadd /tmp/cinnamon.add
	
The error notification object will be added to the ODM.  To verify that the object was added to the ODM properly, run the command:

	odmget -q "en_name='cinnamon'" errnotify
	
To remove the object from the ODM (why would you want to do that?), run the command:

	odmdelete -q "en_name='cinnamon'" -o errnotify
	
=head1 AUTHOR

	Sandor W. Sklar
	Unix Systems Administrator
	Stanford University ITSS-CSS
	
	
	
	
	
If this script is useful to you, or even if it is of no use to you,  or you have some changes/improvements/questions/extra money, please send me an email.
	
=head1 FOR MORE INFORMATION

Most of the parsing that this script does was derived from the IBM publication "Statistical Analysis and Reporting System User Guide", which can be downloaded from .

Information about creating custom error notification objects can be found in Chapter 4 of the IBM manual "General Programming Concepts: Writing and Debugging Programs", available online at 

=head1 COPYRIGHT

This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.

=cut


$Id: errnotify.html,v 1.1 2001/09/23 05:29:57 ssklar Exp $