Earthworm Modules:
Startstop Overview
(last revised 9 Apr, 2010)
This program starts and stops an Earthworm system. It reads its
configuration file which specifies the message transport rings to be
created, which modules are to be run, and the names of the parameter
files each module is to read on startup. The program is system
dependent, and there are versions available for the Linux, SUN Solaris and
Windows NT operating systems.
For startstop to work, it must know about the Earthworm
environment. This is typically done by setting the environment
variables within the environment/ew_* file specific to your platform,
and then sourcing that file, or executing the cmd if you're on Windows.
Startstop typically reads its configuration file from the EW_PARAMS
directory (as defined in your environment) and creates the specified
rings. It then starts each module as a child process, passing its
configuration file name, and any other parameters as its command line
paramters (argv, argc). Each module (child process) is started with the
priority indicated in startstop*.d. Note that each module and each ring
specified must be definined within earthworm.d or earthworm_global.d,
which should be in the EW_PARAMS directory. The system continues to run
until "quit<cr>" is typed in startstop's command window. Startstop then
sets a terminate flag in each transport ring. Each well-behaved module
(child process) should periodically check for the terminate flag, and
exit gracefully if is set.
Note that two copies of startstop pointing at the same startstop *d
file are not allowed to run simultaneously. The second one started will
fail and quit. (If you really want to do this for some reason, you'd
need to make sure that you use all different rings in the second
version, different ports for the modules, and a different startstop*d
file, specified as a parameter when starting startstop.)
If the user presses the "Enter" key while the startstop command
window is selected, or enters the command "status",
startstop will print a status table showing various
statistics for each module, including whether it is dead or alive. If
a module is dead because it could not be started (for example, the
executable's name were mistyped so the executable could not be found),
it will be reported as NoExec.
Startstop will also react to 'restart' messages from statmgr. This
is part of a scheme wich works as follows: A module may have the token
"restartMe" it its .desc file (the file given to statmgr, which tells
it how to process exception conditions from that module). If its
heartbeat ceases, statmgr will send a restart request to startstop.
Startstop will then kill the offending module, and restart it with the
same arguments as it did at startup time. There are some system
specific features, listed below:
Interactive commands:
Startstop will repond to the following commands from the status
console window. There are similar command line versions of each command
as well.
- status
- Startstop will display information about the status of
Earthworm, including a listing of the rings and of modules.
- Within startstop, status can be invoked by hitting the "Enter" key
- restart <pid> or restart <module name>
- Startstop will send the module a message to exit, and may try
and kill it if it doesn't quit by itself in a certain period of time.
Next startstop will attempt to start the process back up.
- Note that the <module name> must be unique for this to
work as an argument. The command line version can only accept the pid
(Process Id) as an argument.
- stopmodule <pid> or stopmodule <module name>
- Startstop will send the module a message to exit, and may try
and kill it if it doesn't quit by itself in a certain period of time.
Startstop will not try to start the process back up, and statmgr
shouldn't try to restart it either.
- Note that the <module name> must be unique for this to
work as an argument. The command line version can only accept the pid
(Process Id) as an argument.
- Within startstop, this can be abbreviated to just "stop
<pid> or stop <module name>".
- The command-line "stopmodule" should mark the module as
intentionally stopped, showing up as "Stop" in the status listing. This
differes from the command line tool "pidpau" which will simply kill a
module. It won't be marked as "Stop" so if statmgr is set to monitor
and restart this particular module a process killed by "pidpau" will
get started back up again. A module stopped by "stopmodule" should not.
- The module is stopped only for the duration that this
startstop session is running! If you want to permanently stop a module,
you'll also want to remove it from the startstop*d, and the statmgr.d
files so it doesn't get started up next time around.
- reconfigure
- Startstop will re-read the startstop_nt.d, starstop_unix.d or
startstop_sol.d, and allocate any new rings and start up any new
modules it finds in the new .d file. In the process it rereads the
earthworm.d and earthworm_global.d, in the event that there have been
new module IDs or new ring IDs added there.
- As the final reconfigure step, statmgr is restarted as well
so it re-reads it's config file. Any modules that were added to
startstop*d should be added to the statmgr.d config file as well.
- The command line version does the same thing.
- Within startstop, this can be abbreviated to just "recon".
- quit
- Starstop will send all child processes (modules) a request to
quit, and will kill them if they don't quit within 30 seconds or so. It
will then shut itself down.
- The command line equivalent to "quit" is called "pau".
Solaris, Linux, Mac OS X versions:
- Solaris startstop reads a configuration file named
'startstop_sol.d'
- Mac OS X and Linux startstop reads a configuration file named
'startstop_unix.d'
- If a child process does not exit within a user specified time
after the user types "quit<cr>" (or "stopmodule" or "restart"),
startstop terminates the child
process. Startstop will resort to a more draconian but reliable
approach to quiting a module if the standard approach fails, but only if a
command to do so is included in the configuration file.
- The amount of CPU time used by each child process is listed in
the process status table.
- As of Version 3.0, Startstop can run in background. This
modification was made by Pete Lombard at the University of Washington. Instructions
- To run Earthworm as other than root, you must set the file
charateristics. Instructions
- For Mac OS X you must adjust the shared memory settings using the
/etc/sysctl.conf file and rebooting. We recommend values like this:
kern.sysv.shmmax=16777216
kern.sysv.shmmin=1
kern.sysv.shmmni=32
kern.sysv.shmseg=16
kern.sysv.shmall=4096
Windows, Windows Service version:
- Windows startstop and Windows startstop service read a
configuration file named 'startstop_nt.d'
- If Windows starts up, and, for example, the binary executables
for certain modules are missing or are misnamed, startstop will start
up anyway. These processes will be shown with a nonexistent negative
process ID, and "NoExec" as their status. If this process is restarted
once the problem that caused the error has been fixed, the process ID
will return to a normal ID, and the status will change to "Alive".
- Startstop can be set to start
automatically when Windows boots up, but probably better than doing
that is to set
startstop as a Windows service. Note if you set Startstop as a
Windows service you'll need to use other command line utilities like
'status' and 'restart' to monitor and control earthworm modules since
there's no interface to the Startstop service. You can run StartstopConsole in order to be
able to connect to the session running earthworm, if you're not logged
in as administrator. You'll be able to start and stop Earthworm with
the Windows Services Control Panel.
Module Index | Windows Commands | Solaris Commands | Linux Commands
Questions? Issues? Subscribe to the Earthworm Google Groups List.