Internet Security Professional Reference:Understanding and creating

-->

Consequently, it is better to read the configuration file, process the entries, and validate the data values before using them. In our procom script, if the delay_between value contains anything other than a number, the value is not used, and a default of 300 seconds replaces the requested value. The same is true for ConfigDir: if the value is not a directory, the default of /etc is used.

The procmon.cmd file contains the list of processes that are to be monitored. This file contains two exclamation mark (!) separated fields: the first is the pattern to search for in the process list; the second is the name of the command to execute if the pattern is not found.

named !/etc/named
cron!/etc/cron

This file indicates that procmon will be watching for named and cron. If named is not in the process list, the command /etc/named is started. The same holds true for the cron command. The purpose of using a configuration file for this information is to allow the system administrator to configure this file on the fly. If the contents of this file change, the procmon daemon must be restarted to read the changes.

Some startup messages are recorded by syslog when procmon starts. The appropriate information is substituted for the values in >value<; the >timestamp< is replaced by the current time through syslog.; >PID< is the process identification number of the procmon process, and >system_name< is the name of the system.

timestamp system_name procmon[PID]: Process Monitor started
timestamp system_name procmon[PID]: Loaded config file value
timestamp system_name procmon[PID]: Command File: value
timestamp system_name procmon[PID]: Loop Delay = value
timestamp system_name procmon[PID]: Adding value to stored process list
timestamp system_name procmon[PID]: Monitoring: value processes

Monitoring messages are printed during the monitoring process. These messages represent the status of the monitored processes:

timestamp systeame procmon[ PID]: process running as PID PID
    This record is printed after every check, and indicates that the
  monitored process is running.

timestamp system_name procmon[PID]: process is NOT running
    This record is printed when the monitored process cannot be found
  in the process list.

timestamp system_name procmon[PID]: Last Failure of process time
    This record is printed to record when the last (previous) failure
  of the process was.

timestamp system_name procmon[PID]: issuing start_command to system
    This record is printed before the identified command is executed.

timestamp system_name procmon[PID]: start_command returns return_code

This last message is printed after the command has been issued to the system. The syslog may be able to give you clues regarding the status of the system after the command was issued. Actual procmon syslog entries are included here:

Feb 20 07:31:21 nic procmon[943]: Process Monitor started
Feb 20 07:31:21 nic procmon[943]: Loaded config file /etc/procmon.cfg
Feb 20 07:31:22 nic procmon[943]: Command File: /etc/procmon.cmd
Feb 20 07:31:22 nic procmon[943]: Loop Delay = 300
Feb 20 07:31:22 nic procmon[943]: Adding named  to stored process list
Feb 20 07:31:22 nic procmon[943]: Monitoring: 1 processes
Feb 20 07:31:22 nic procmon[943]: named  running as PID 226
Feb 20 07:36:22 nic procmon[943]: named  is NOT running
Feb 20 07:36:24 nic procmon[943]: Last Failure of named, @ Sun Feb  12
  13:29:02 EST 1995
Feb 20 07:36:26 nic procmon[943]: issuing /etc/named to system
Feb 20 07:36:42 nic procmon[943]: /etc/named returns 0
Feb 20 07:41:22 nic procmon[943]: named  running as PID 4814

The procmon code displayed at the end of this chapter has been written to run on System V systems. It has been in operation successfully since December 18, 1994. However, some enhancements could be made to the program. For example, it makes sense to report a critical message in syslog if the command returns anything other than 0. This is because a non-zero return code generally indicates that the command did not start. Another improvement would be to include a BSD option to parse the Ps output, and add an option in the configuration file to choose System V or BSD.

Unix Run Levels

Run levels, which are equivalent to system operation levels, have not been around as long as Unix. In fact, they are a recent development with System V. Early versions of System V did not include the concept of run-levels. A run level is an operating state that determines which facilities will be available for use. There are three primary run levels: halt, single-user, and multiuser, although there can be more.

The run level is adjustable by sending a signal to init. Whether this can be done depends on the version of Unix in use, and the version of init. Many Unix versions only have single-user, or system maintenance, and multiuser modes.

On SunOS 4.1.x systems, for example, init terminates multiuser operations and resumes single-user mode if it is sent a terminate (SIGTERM) signal with ‘kill -TERM 1’. If processes are outstanding because they’re deadlocked (due to hardware or software failure), init does not wait for all of them to die (which might take forever), but times out after 30 seconds and prints a warning message.

When init is sent a terminal stop (SIGTSTP) signal using ‘kill -TSTP 1’, it ceases to create new processes, and allows the system to slowly die away. If this is followed by a hang-up signal with ‘kill -HUP’ 1 init will resume full multiuser operations; For a terminate signal, again with ‘kill -TERM 1’, init will initiate a single-user shell. This mechanism of switching between multiuser and single-user modes is used by the reboot and halt commands.

Table of Contents