On startup, statmgr reads the configuration file named on the command line. Commands in this file set up all parameters used in monitoring the health of an Earthworm system. In the control file, lines may begin with a valid statmgr command (listed below) or with one of 2 special characters:
Command names must be typed in the control file exactly as shown in this document (upper/lower case matters!).# marks the line as a comment (example: # This is a comment).
@ allows control files to be nested; one control file can be accessed from another with the command "@" followed by a string representing the path name of the next control file (example: @model.d).
# Status Manager Configuration File # (statmgr.d) # # This file controls the notifications of earthworm error conditions. # The status manager can send pager messages to a pageit system, and # it can also send email messages to a list of recipients. # Earthquake notifications are not handled by the status manager. # In this file, comment lines are preceded by #. # MyModuleId MOD_STATMGR # "RingName" specifies the name of the transport ring to check for # heartbeat and error messages. Ring names are listed in file # earthworm.h. Example -> RingName HYPO_RING # RingName HYPO_RING # If CheckAllRings is set to 1 then ALL rings startstop currently
# knows about will be checked for status messages. The above
# single RingName, however, still needs to be a valid ring name.
# If you use CheckAllRings, you don't want to use any # copystatus modules. Note statmgr may not be able to keep up # on a system with a very busy ring, and you may need to # set CheckAllRings to 0 and go back to the old way of using copystatus
CheckAllRings 0
# "GetStatusFrom" lists the installations & modules whose heartbeats # and error messages statmgr should grab from transport ring: # # Installation Module Message Types GetStatusFrom INST_MENLO MOD_WILDCARD # heartbeats & errors # "LogFile" sets the switch for writing a log file to disk. # Set to 1 to write a file to disk. # Set to 0 for no log file. # Set to 2 for module log file but no logging to stderr/stdout # LogFile 1 # "heartBeatPageit" is the time in seconds between heartbeats # sent to the pageit system. The pageit system will report an error # if heartbeats are not received from the status manager at regular # intervals. # heartbeatPageit 60 # "pagegroup" is the pager group name. # The pageit program maps this name to a list of pager recipients. # This line is required. Individual modules can override this group # by including the "pagegroup" command in their descriptor file. # pagegroup larva_test # Between 1 and 10 names of computers to use as a mail server. # They will be tried in the order listed # This system must be alive for mail to be sent out. # This parameter is used by Windows NT only. # # Syntax # MailServer# MailServer # ... # MailServer # MailServer andreas # Any number (or none) of email recipients may be specified below. # These lines are optional. # # Syntax # mail emailAddress1 # mail emailAddress2 # ... # mail emailAddressN # mail Questions? Issues? Subscribe to the Earthworm Google Groups List. # # # Mail program to use, e.g /usr/ucb/Mail (not required) # If given, it must be a full pathname to a mail program MailProgram /usr/ucb/Mail # # Subject line for the email messages. (not required) # Subject "This is an earthworm status message" # # Message Prefix - useful for paging systems, etc. # this parameter is optional # MsgPrefix "((" # # Message Suffix - useful for paging systems, etc. # this parameter is optional # MsgSuffix "))" # Now list the descriptor files which control error reporting # for earthworm modules. One descriptor file is needed # for each earthworm module. If a module is not listed here, # no errors will be reported for the module. The file name of a # module may be commented out, if it is temporarily not to be used. # To comment out a line, insert # at the beginning of the line. # Descriptor statmgr.desc # Descriptor adsend_a.desc # Data source (adsend) on lardass # Descriptor adsend_b.desc # Data source (adsend) on honker # Descriptor picker_a.desc # Picker programs on redhot # Descriptor picker_b.desc # Picker programs on redhot # Descriptor coaxtoring.desc # Descriptor diskmgr.desc # Descriptor binder.desc # Descriptor eqproc.desc # Descriptor startstop.desc # Descriptor pagerfeeder.desc # Descriptor pick_client.desc # Descriptor pick_server.desc
Below are the commands recognized by statmgr, grouped by the function they influence. Most of the commands are required.
Earthworm system setup: GetStatusFrom required MyModuleId required RingName required Monitor system: heartbeatPageit required Descriptor required mail pagegroup required Output Control: LogFile required
In the following section, all configuration file commands are listed in alphabetical order. Listed along with the command (bold-type) are its arguments (in red), the name of the subroutine that processes the command, and the function within the module that the command influences. A detailed description of the command and is also given. Default values and example commands are listed after each command description.
command arg1 processed by function
Descriptor descfile statmgr_config Monitor system
Registers patients with the statmgr. descfile is the name of a file (up to 29 characters long) that describes a module that statmgr will monitor. One "Descriptor" command must give the name of statmgr's own descriptor file (ie, the statmgr is a patient of itself). Up to MAXDESC (currently defined as 15 in statmgr.h) "Descriptor" commands may be issued. All descriptor files should live in directory specified by the EW_PARAMS environment variable. Each descriptor file contains the patient module's name and ID, its heartbeat interval, and all its possible error codes and what they mean. It also contains information on how and how often the statmgr should notify system operators when errors do occur (see section 3 for more details on the descriptor files).Default: none Examples: Descriptor statmgr.desc Descriptor "statmgr.desc"
GetStatusFrom inst mod_id statmgr_config Earthworm setup
Controls the heartbeat and error messages input to statmgr. statmgr will only process TYPE_HEARTBEAT and TYPE_ERROR messages that come from module mod_id at installation inst. inst and mod_id are character strings (valid strings are listed in earthworm.h/earthworm.d) which are related to single-byte numbers that uniquely identify each installation and module. Up to 2 "GetStatusFrom" command may be issued; wildcards (INST_WILDCARD and MOD_WILDCARD) will force statmgr to process all heartbeat and error messages, regardless of their place of origin.Default: none Calnet: GetStatusFrom INST_WILDCARD MOD_WILDCARD
heartbeatPageit nsec statmgr_config Monitor system
Defines the number of seconds nsec between heartbeat messages issued by statmgr to the Pageit computer. This heartbeat serves as the heartbeat for the entire Earthworm system being monitored by statmgr. A statmgr heartbeat is actually a TYPE_PAGE message that contains a character string (example: "alive: sysname#"). statmgr places this TYPE_PAGE message into shared memory where the pagerfeeder module can find it and send it to the Pageit system via the serial port. If the Pageit computer doesn't receive a heartbeat within a specified time interval, it will issue an "obituary" page for the Earthworm system.Default: none Calnet: heartbeatPageit 60
LogFile switch statmgr_config output
Sets the on-off switch for writing a log file to disk. If switch is 0, no log file will be written. If switch is 1, statmgr will write a daily log file(s) called statmgrxx.log_yymmdd where xx is statmgr's module id (set with "MyModuleId" command) and yymmdd is the current UTC date (ex: 960123) on the system clock. The file(s) will be written in the EW_LOG directory (environment variable).Default: none
mail recipient statmgr_config Monitor system
Registers one recipient email address with the statmgr. As configured by descriptor files, statmgr will send every recipient an email message about patient-module errors and state of health (dead/alive) changes. Up to MAXRECIP (currently defined as 10 in statmgr.h) "mail" commands may be issued, but none are required. Each recipient address can be up to 59 characters long.Default: none Example: mail jdoe@yourmachine.edu
MyModuleId mod_id statmgr_config Earthworm setup
Sets the module id for labeling all outgoing messages. mod_id is a character string (valid strings are listed in earthworm.d) that relates (in earthworm.d) to a unique single-byte number.Default: none Calnet: MyModuleId MOD_STATMGR
pagegroup group statmgr_config Monitor system
Registers a pager group (string up to 79 characters long) with the statmgr. statmgr will address all of its TYPE_PAGE messages to group unless the module's descriptor file included its own pagegroup command. When the paging system computer receives the message, it maps group to a list of pager recipients and sends a page to each one. Only one "pagegroup" command is allowed and it is required.Default: none Example: pagegroup ew_operators
RingName ring statmgr_config Earthworm setup
Tells statmgr which shared memory region to use for input/output. ring is a character string (valid strings are listed in earthworm.d) that relates (in earthworm.d) to a unique number for the key to the shared memory region.Default: none Calnet: RingName HYPO_RING
All errors received by the statmgr are written to its daily log file. Each descriptor file specifies when error messages are to be reported via email and pager. The default pager group name and a list of email recipients are listed in file statmgr's configuration file. A different pagegroup can be listed in each module's descriptor file to override the default.
Here are the lines that make up a descriptor file:
instId inst
inst is the installation at which the patient-module is running. inst is a character string (valid strings are listed in earthworm.h) that relates (in earthworm.h) to a unique single-byte number. This line is required; inst and modId allow statmgr to match an error message with its proper descriptor file instructions.
modid is the module id of the patient module. modid is a character string (valid strings are listed in earthworm.d) that relates (in earthworm.d) to a unique single-byte number. modid must match that used in the patient module's own configuration file. This line is required; inst and modId allow statmgr to match an error message with its proper descriptor file instructions.
Give the name of the patient module. name is text string (up to 39 characters) which statmgr includes in each logged and reported error message from this patient. This line is required.
This is an optional parameter. sysname is a string (up to 29 characters) giving the name of the computer on which the patient module is running. statmgr includes this text string in each logged and reported error message from this patient. If the "system" line is ommitted, statmgr assumes the module is running on the local computer and uses the environment variable, SYS_NAME, in its place.
This is an optional parameter. group is a string (up to 79 characters) to which statmgr will address all TYPE_PAGE messages regarding this specific module. If the "pagegroup" line is ommitted here, statmgr uses the pagegroup listed in its own configuration file.
tsec: tsec page: npage mail: nmail
If the statmgr does not receive a heartbeat message every tsec seconds from this patient module, an error will be reported (LOCAL_time modName/sysname module dead). If statmgr receives a heartbeat from a module that it has reported "dead," it will send out an "alive" message (LOCAL_time modName/sysname module alive). tsec is generally set to 2*(heartbeat-interval) of the patient module. npage is the maximum number of pager messages that will be reported and nmail is the maximum number of email messages that will be reported. Each "dead" and "alive" message counts as a separate message. If the page or mail limit is exceeded, no further errors will be reported until the status manager is restarted.
err: code nerr: nerr tsec: tsec page: npage mail: nmail
text: description
code is the error code generated by the patient module. Error codes can be any unsigned integer, not necessarily sequential.nerr and tsec specify the maximum allowable error rate. If the error rate exceeds nerr errors per tsec seconds, an email or pager message may be reported. To report all errors, set nerr to 1 and tsec to 0.
npage is the maximum number of pager messages that will be reported and nmail is maximum number of email messages that will be reported. If the page or mail limit is exceeded, no further errors will be reported until the statmgr is restarted.
description is the default text string (up to 79 characters) that statmgr will report for this error code. Enclose the string in double-quotes if it contains embedded blanks. Each module may include a (hopefully more informative) text string in its error message; if so, that string overrides the default, description.