|
Previous | Table of Contents | Next |
Consequently, it is better to read the configuration file, process the entries, and validate the data values before using them. In our procom script, if the delay_between value contains anything other than a number, the value is not used, and a default of 300 seconds replaces the requested value. The same is true for ConfigDir: if the value is not a directory, the default of /etc is used.
The procmon.cmd file contains the list of processes that are to be monitored. This file contains two exclamation mark (!) separated fields: the first is the pattern to search for in the process list; the second is the name of the command to execute if the pattern is not found.
named !/etc/named cron!/etc/cron
This file indicates that procmon will be watching for named and cron. If named is not in the process list, the command /etc/named is started. The same holds true for the cron command. The purpose of using a configuration file for this information is to allow the system administrator to configure this file on the fly. If the contents of this file change, the procmon daemon must be restarted to read the changes.
Some startup messages are recorded by syslog when procmon starts. The appropriate information is substituted for the values in >value<; the >timestamp< is replaced by the current time through syslog.; >PID< is the process identification number of the procmon process, and >system_name< is the name of the system.
timestamp system_name procmon[PID]: Process Monitor started timestamp system_name procmon[PID]: Loaded config file value timestamp system_name procmon[PID]: Command File: value timestamp system_name procmon[PID]: Loop Delay = value timestamp system_name procmon[PID]: Adding value to stored process list timestamp system_name procmon[PID]: Monitoring: value processes
Monitoring messages are printed during the monitoring process. These messages represent the status of the monitored processes:
timestamp systeame procmon[ PID]: process running as PID PID This record is printed after every check, and indicates that the monitored process is running. timestamp system_name procmon[PID]: process is NOT running This record is printed when the monitored process cannot be found in the process list. timestamp system_name procmon[PID]: Last Failure of process time This record is printed to record when the last (previous) failure of the process was. timestamp system_name procmon[PID]: issuing start_command to system This record is printed before the identified command is executed. timestamp system_name procmon[PID]: start_command returns return_code
This last message is printed after the command has been issued to the system. The syslog may be able to give you clues regarding the status of the system after the command was issued. Actual procmon syslog entries are included here:
Feb 20 07:31:21 nic procmon[943]: Process Monitor started Feb 20 07:31:21 nic procmon[943]: Loaded config file /etc/procmon.cfg Feb 20 07:31:22 nic procmon[943]: Command File: /etc/procmon.cmd Feb 20 07:31:22 nic procmon[943]: Loop Delay = 300 Feb 20 07:31:22 nic procmon[943]: Adding named to stored process list Feb 20 07:31:22 nic procmon[943]: Monitoring: 1 processes Feb 20 07:31:22 nic procmon[943]: named running as PID 226 Feb 20 07:36:22 nic procmon[943]: named is NOT running Feb 20 07:36:24 nic procmon[943]: Last Failure of named, @ Sun Feb 12 13:29:02 EST 1995 Feb 20 07:36:26 nic procmon[943]: issuing /etc/named to system Feb 20 07:36:42 nic procmon[943]: /etc/named returns 0 Feb 20 07:41:22 nic procmon[943]: named running as PID 4814
The procmon code displayed at the end of this chapter has been written to run on System V systems. It has been in operation successfully since December 18, 1994. However, some enhancements could be made to the program. For example, it makes sense to report a critical message in syslog if the command returns anything other than 0. This is because a non-zero return code generally indicates that the command did not start. Another improvement would be to include a BSD option to parse the Ps output, and add an option in the configuration file to choose System V or BSD.
Run levels, which are equivalent to system operation levels, have not been around as long as Unix. In fact, they are a recent development with System V. Early versions of System V did not include the concept of run-levels. A run level is an operating state that determines which facilities will be available for use. There are three primary run levels: halt, single-user, and multiuser, although there can be more.
The run level is adjustable by sending a signal to init. Whether this can be done depends on the version of Unix in use, and the version of init. Many Unix versions only have single-user, or system maintenance, and multiuser modes.
On SunOS 4.1.x systems, for example, init terminates multiuser operations and resumes single-user mode if it is sent a terminate (SIGTERM) signal with kill -TERM 1. If processes are outstanding because theyre deadlocked (due to hardware or software failure), init does not wait for all of them to die (which might take forever), but times out after 30 seconds and prints a warning message.
When init is sent a terminal stop (SIGTSTP) signal using kill -TSTP 1, it ceases to create new processes, and allows the system to slowly die away. If this is followed by a hang-up signal with kill -HUP 1 init will resume full multiuser operations; For a terminate signal, again with kill -TERM 1, init will initiate a single-user shell. This mechanism of switching between multiuser and single-user modes is used by the reboot and halt commands.
Previous | Table of Contents | Next |