Earthworm Modules:
Wave_serverV Overview

(last revised 9 Dec 2008)

Table of Contents

Introduction

This is the latest version of Wave_server as of Earthworm version v7.0. It is based on the original Wave_server. It supports the SCNL protocol and process TYPE_TRACEBUF2 messages; it is not backward compatible to SCN.

Wave_serverV provides a network-based service of trace data. It acquires Earthworm trace data messages for specified channels, and maintains a disk-based circular buffer for each channel. It then offers a network service capable of supplying specified portions of trace data from specified channels. The basic features include:

* Up to 10 concurrent clients can be serviced. This limit may be changed to suit the available hardware resources.

* The size of the disk-based buffer is user-specified, up to 2 gigabytes per trace (approximately approximately 100 days, assuming 100 SPS, 16 bit data).

* The module serves either 'sanitized' trace data in ASCII format, suitable for 'casual' purposes such as visual displays, or raw binary data containing all information (and flaws) as received from the telemetry source.

* Several crash-recovery strategies are used to permit rapid restarts with minimal loss of data after catastrophic crashes, such as system errors or power loss.

* Extensible client-server protocol to permit additional queries to be easily implemented.

* A set of client routines are available to simplify implementing client applications.

* Handles interruptions of trace data without wasting disk space. Event data from highly intermittent sources of event data will not be overwritten by long periods of quiesence.

Wave Server Startup

Following is a description of the process wave_serverV goes through when it starts up. This is intended to give users an understanding of how configuration changes are implemented by wave_serverV.

When wave_serverV first starts up, it reads the configuration file, and then starts a three-stage process of opening and checking tank and index files. In the first stage, it reads one of the `tank structure' files (an *.str file) if they exist. If no tank structure files are found, wave_serverV proceeds to the second stage, below.

Wave_serverV assumes that the tank structure file has the most up-to-date information about existing tank files (*.tnk) and their indexes (*.inx), especially tank file size, record size, and tank insertion position. This last crucial bit of information tells the `starting point' in the tank. Data just behind this point is the most recent information in the tank. Data after the insertion point is the oldest and will be overwritten as new data is added to the tank in circular fashion.

The server runs through the list of tanks from the structure file, verifying that each SCNL is listed in the config file. The one piece of information that wave_serverV takes from the config file in this stage is the index file size. Wave_serverV tries to open each the tank file and its index. If index files are missing or out of date, they are recreated by reading through the tank file. Depending on the amount of reconstruction needed for an index, this process may take several minutes for each tank. Note that there is no provision for checking the insertion point, read from the structure file, against the tank file.

If the tank and index files are read successfully, then that tank is marked as OK. If there are errors opening these files for a tank, then one of two things may happen. If ReCreateBadTanks is set, then new (empty) tank and index files are created using the information from the tank structure file. If ReCreateBadTanks is not set, then that tank is marked as BAD for later disposition. Once this loop has been completed for all the tanks listed in the structure file, stage one is complete.

For stage two of the startup sequence, wave_serverV scans the list of tanks in the config file. Any tanks that were not already found in the structure file will be created using the parameters listed in the config file. Any SCNLs added to the config file since wave_serverV was last run will be created now. If any errors are encountered creating new files here, that tank will be marked as BAD. Since the config file does not list the insertion point, any data that is in existing tank files for this SCNL will be effectively erased.

For the final stage of startup, wave_serverV goes through its internal list of tanks. Any that were marked as BAD previously now come to light. If PleaseContinue is set, wave_serverV will remove that tank from its internal list. Otherwise, wave_serverV will exit now.

Once this third pass is done, wave_serverV writes its internal list to the structure file and starts adding traces to their respective tanks. Each new packet of trace data causes the current index entry for that tank to be extended in time and file position. If there is a gap between the end of a tank and the new trace data (determined by GapThresh in config file), a new index entry is started. Wave_serverV periodically writes the index and structure list to disk, to save the latest information in case of wave server crashes.

Shutting Down Wave_serverV

As of Earthworm version 5.1, wave_serverV has a signal handler that allows graceful shutdowns (without having to stop all of earthworm.)

WindowsNT: this shutdown mechanism can be invoked in only one way: you must give the Control-C interrupt to the console window where wave_serverV is running. DO NOT try using `restart <pid>', as this will terminate wave_serverV immediately without sending a signal to it. Once wave_serverV has terminated and its console window is gone, you can start a new instance of wave_serverV by using `restart <pid>' or by letting statmgr send a restart request. This last method requires that the `restartMe' command was set in wave_serverV's .desc file.

Solaris: (unix) wave_serverV can be terminated using the `kill <pid>' command. To stop and immediately restart wave_serverV, you can use `restart <pid>'.

With earlier versions of wave_serverV (Earthworm V5.0 and earlier) the only way to shut down wave_serverV is to shut down earthworm, using the pau command or entering "quit" at the startstop window.

How to Make Configuration Changes

After you have been running wave_serverV for a while, you will eventually find that you need to make come configuration changes. While you can always shut down the server, delete all the tank, index, and structure files, and start with a new configuration, this is usually not necessary or desirable. Depending on what changes you need to make, existing tank files can often be preserved. Below is a description of how to change each of the tank, index and structure file parameters. These procedures depend on having PleaseContinue set to one, and ReCreateBadTanks not set. If your configuration file does not currently have these values, change the file now to include these values. When you restart wave_serverV as one of the steps below the new values for PleaseContinue and ReCreateBadTanks will take affect immediately.

Be sure you understand how to restart wave_serverV. If you inadvertently shut down wave_serverV without letting it go through its normal shutdown sequence, you risk doing damage to the tank, index and structure files. In the following discussion, restart means a quick but graceful shutdown and startup of wave_serverV, using the method appropriate for your platform and wave_serverV version. When some action must be taken between wave_serverV shutdown and startup, that will be spelled out.

IMPORTANT: Before taking any action below, shutting down wave_serverV first and then making a full backup of all tanks and TankStructFiles is recommended, if you can afford the disk space and the downtime to do it.

Also worth noting: the tank struct files may take prescedece over the .d configuration file, so if you make a change to your config file, and you appear to have problems, or your changes did not go into effect, try: Shut down wave_serverV; delete the TankStructFile and if you're using it, delete the TankStructFile2 and then start back up wave_serverV. The structure files should be recreated based on the .d configuration file.

How do I:

Wave_serverV Tools

Three simple utilities are available to assist with wave_serverV problems. They basicly read the tank, index, and structure files and write out their contents in human-readable form.

Protocol Notes

* The server is synchronous, in the sense that it receives a request, issues the corresponding reply, and then processes the next request. Since there may be several server threads, more than one client can be connected and requesting data at one time. The <request id> was implemented to assist asynchronous clients, such as clients which are so structured that the code issuing the requests is tightly linked to the code processing the requests, and thus has trouble remembering which reply goes with which request. It is generated by the client, and is simply echoed by the server. The request id is echoed as a fixed length 12 character string, null padded on the right as required.

* A client establishes a connection to a wave server by requesting a TCP connection on an agreed-upon address and port number. The client may issue as many requests as desired once the connection is made. The client may close the connection at any time to terminate the interaction.

* <s><c><n> is short-hand for site code, channel code, network id, and location code. The format is four space-separated ASCII strings.

* <flags> is currently used by the server to indicate special conditions. Currently three flags are used, but additional flags can be added as needed. Formally, it is:

<flags>:: F | F<letter>...<letter>

That is, it consists of the letter F followed by zero or more letters. A space terminates the <flags>. The bare letter F by itself means that the requested data was returned; there may be gaps in the data but it is up to the client to detect those. Currently "FR", "FL", and "FG" are implemented to indicate that the request totally missed the tank. "FL" means that the requested interval was before anything in the tank; "FR" means the requested interval was after anything in the tank. "FG" is used to indicate that the requested interval fell wholy within a gap in the tank.

* <datatype> is a two character code ala CSS. Currently, only i2, i4, s2, and s4 are implemented. i means Intel byte order; s means Sparc byte order; 2 and 4 meaning two- four-byte signed integer.

* All times are given as ASCII representations of floating point seconds since 1970.

* Currently most of the following reqeusts and replies are handled by the ws_clientII routines that are included in the libsrc part of the Earthworm source tree.

Requests and Responses

MENU: <request id>

This request is used by a client to learn what the server 'knows'. The reply contains the list of channels being served, and the time interval available for each channel. The available time interval is as of the time of the reply. The client is responsible for tracking the time delays betweeen the MENU reply and any subsequent data requests. <request id> is an arbitrary ASCII string of 12 characters or less (see above). The reply is terminated by the ASCII "newline" character:

<request id> pin# <s><c><n><l> <starttime> <endtime> <datatype> . . . . . pin# <s><c><n><l> <starttime> <endtime> <datatype> \n

MENUPIN: <request id> <pin#>

As above, but returns the information only for specified pin number:

<request id> <pin#> <s><c><n> <datatype> <\n>

MENUSCNL: <request id> <s><c><n><l>

As above, but returns the information for specified <s><c><n><l> name:

<request id> <pin#> <s><c><n> <datatype> <\n>

GETPIN: <request id> <pin#> <starttime> <endtime> <fill-value>

Returns the trace data for the specified <pin#> and requested time interval. Any data gaps within the interval are filled with <fill-value>. Only internal gaps will be filled. No fill will be provided for any requested data which is either before or after the range of the available period (as stated in the MENU reply). The reply data is represented in ASCII as blank-delimited signed integers.

<request id> <pin#> <s><c><n><l> F <datatype> <starttime> <sampling rate>

sample(1) sample(2)... sample(nsamples) <\n> {the samples are ASCII}

If the requested time is older than anything in the tank, the reply is:

<request id> <pin#> <s><c><n><l> FL <datatype> \n

For the case when the requested interval is younger than anything in the tank, the reply is

<request id> <pin#> <s><c><n><l> FR <datatype> <youngest time in tank> <sampling rate> \n

GETSCN: <request id> <s><c><n><l> <starttime> <endtime> <fill-value>

A above, but for specified scnl name.

<request id> <pin#> <s><c><n><l> F <datatype> <starttime> <sampling-rate>

sample(1) sample(2)... sample(nsamples) <\n>

GETSCNLRAW: <request id> <s><c><n><l> <starttime> <endtime>

As above, but returns the trace data in the form in which it was circulating within the system. The original trace data ("TYPE_TRACEBUF2" messages) spanning the requested period will be returned in binary form. Only whole trace data messages will be supplied, so that the actual <starttime> may be older than requested, and the <endtime> may be younger than requested. The initial part of the reply is part ASCII as above, terminated by a "\n", following that are the binary "TYPE_TRACEBUF2" messages. The reply is terminated when the stated number of binary bytes have been sent:

<request id> <pin#> <s><c><n><l> F <datatype> <starttime> <endtime> <bytes of binary data to follow> \n <trace_buf msg> ... <trace_buf msg>

If the requested time interval is older than anything in the tank, the reply is:

<request id> <pin#> <s><c><n><l> FL <datatype> <oldest time in tank> \n

If the requested time interval is older than anything in the tank, the reply is:

<request id> <pin#> <s><c><n><l> FR <datatype> <youngest time in tank> \n

For the case when the requested interval falls completely within a data gap, the reply is:

<request id> <pin#> <s><c><n><l> FG <datatype> \n

HISTORY

The original Wave_server module was written by Will Kohler in a rather spectacularly short time: We came in on a Monday to find Will comatose, all waste cans full of espresso cups, and a working wave server. The motivation was to support the effort with the Alaska Geophysical Institute to integrate Earthworm with DataScope and to provide a playback facility for testing real-time algorithms. Lynn Dietz then proceeded to add numerous enhancements. It stores all trace data messages received, and servers all trace data received during a specified time interval. It can thus be used to recreate the the pattern of trace data messages inside an Earthworm system during a specified period of time. This has proven to be valuable for testing and debugging, and this module still exists as "wave_server".

Kent Lindquist then produced an enhanced version, including the idea of segmenting the tank into one partition for each trace. Since then, several authors were involved in wave_server development: Alex Bittenbinder wrote the main thread; Mac McKenzie wrote the parser of the client thread (server_thread.c); Eureka Young wrote the server routines; Dave Kragness pretty much re-wrote the main thread while implementing crash-recovery, and Pete Lombard produced the suite of associated client routines.

 

TROUBLESHOOTING

From Lynn Dietz 8/07: The waveserver messages "Circular queue lapped : 3456 messages lost." mean that waveserver is not writing data to its tanks fast enough. This is an obvious source of gaps in your waveserver tanks. [To fix this you] can try increasing "InputQueueLen". Also, 270 channels is a lot for one waveserver process to handle. We usually try to limit each waveserver process to ~100 channels on one disk dedicated to that process.

 

Module Index | Wave_serverV Commands

Contact:
Questions? Issues? Subscribe to the Earthworm Google Groups List.