Last modified 20 months ago
Last modified on 01/19/12 16:39:34
TRANSPORT.C 4/24/98
Table of contents:
I. Introduction
1. Overview of transport routines
1.1 Transport.h structures used by the calling program.
1.2 Initializing/terminating access to shared memory.
1.3 Writing messages to shared memory.
1.4 Retrieving messages from shared memory.
1.5 Buffering messages in a private memory region.
1.6 Communicating with the shared memory header flag.
1.7 Error reporting by transport functions.
2. Function calls
2.1 tport_create
2.2 tport_destroy
2.3 tport_attach
2.4 tport_detach
2.5 tport_putmsg
2.6 tport_getmsg
2.7 tport_copyto
2.8 tport_copyfrom
2.9 tport_buffer
2.10 tport_bufthr
2.11 tport_putflag
2.12 tport_getflag
2.13 tport_syserr
2.14 tport_buferror
3. Programming tips
4. Bug fixes and program modifications
4.1 Mishandled shared memory pointer wraps in tport_putmsg.
4.2 Missing argument to shmctl.
4.3 Speed enhancement using memcpy.
4.4 Making tport_putmsg multi-thread safe.
4.5 Mishandled shared memory pointer resets in tport_getmsg.
4.6 Minor crack in tport_getmsg and tport_copyfrom.
4.7 Logo-tracking problem with GET_TOOBIG messages,
tport_getmsg and tport_copyfrom.
4.8 Tracking problem when no messages of requested logo
are ever returned, tport_getmsg and tport_copyfrom.
4.9 Variable name changed to allow use of C++ compilers.
4.10 Semaphore operations problem in tport_putmsg and tport_copyto
(Solaris version).
I. INTRODUCTION
Transport.c contains a set of functions for accessing System V IPC shared
memory regions under SunOS 4.1.1 and Solaris 2.4. These routines, with
exactly the same function calls, have also been ported to OS/2 and Windows NT.
void tport_create();
void tport_destroy();
void tport_attach();
void tport_detach();
int tport_putmsg();
int tport_getmsg();
int tport_copyto();
int tport_copyfrom();
void tport_putflag();
int tport_getflag();
void tport_syserr();
In June 1995, a set functions were added to transport.c to create multi-
threaded, message-buffering applications under Solaris 2.4, OS/2, and NT.
(SunOS does not support multi-threaded applications):
int tport_buffer();
void *tport_bufthr();
void tport_buferror();
On Solaris, source files using transport functions should include these lines:
#include <earthworm.h> /* required by multi-thread transport functions */
#include <transport.h>
On OS/2, source files using transport functions should include these lines
(the first 3 lines must be before the transport.h include line):
#define INCL_DOSMEMMGR
#define INCL_DOSSEMAPHORES
#include <os2.h>
#include <earthworm.h> /* required by multi-thread transport functions */
#include <transport.h>
1. OVERVIEW OF TRANSPORT ROUTINES
In the following paragraphs, anything written in all capital letters is
defined in transport.h. The following topics are explained in more detail
below:
1.1 Transport.h structures used by the calling program.
1.2 Initializing/terminating access to shared memory.
1.3 Writing messages to shared memory.
1.4 Retrieving messages from shared memory.
1.5 Buffering messages in a private memory region.
1.6 Communicating with the shared memory header flag.
1.7 Error reporting by transport functions.
1.1 Transport.h structures used by the calling program.
Many constants and five structure types and are defined in transport.h.
Two of the structure types are used as arguments to transport functions. The
other defined structure types are used internally by the transport functions;
for more information on those, please read the comments in transport.h. The
first structure type used as an argument in transport calls is a shared memory
information structure:
Solaris version:
typedef struct {
SHM_HEAD *addr; /* pointer to beginning of memory region */
long key; /* key to shared memory region */
long mid; /* shared memory region identifier */
long sid; /* associated semaphore identifier */
} SHM_INFO;
OS/2 version:
typedef struct {
SHM_HEAD *addr; /* pointer to beginning of memory region */
long key; /* key to shared memory region */
PVOID objAlloc; /* pointer to memory object */
HMTX hmtx; /* mutex semaphore handle */
} SHM_INFO;
All the values in this structure are set within function tport_create or
tport_attach. It contains all the information needed to identify and use the
memory region in all other transport function calls.
The second structure type used as an argument is the message logo structure:
typedef struct {
unsigned char type; /* message is of this type */
unsigned char mod; /* was created by this module id */
unsigned char instid; /* at this installation */
} MSG_LOGO;
This structure describes the message it is associated with. A single
MSG_LOGO structure is passed an argument to tport_putmsg. tport_getmsg
takes an array of MSG_LOGO structures as a list of requested logos and it
sets values in an individual MSG_LOGO structure to identify the retrieved
message.
1.2 Initializing/terminating access to shared memory.
Four of the transport functions deal with getting a program ready to
use or to finish with a shared memory region. tport_create() creates the
memory region given a unique "key" to identify the region and the size (in
bytes) of the region. The created memory region consists of 2 parts: a header
section (SHM_HEAD) for keeping track of pointers, etc., and a circular buffer
area for storing variable-length messages. The region should be made large
enough compared to the size of the messages it holds to give each message a
reasonable residence time in the memory before it is overwritten. All
information needed to identify and use the memory region in other transport
function calls is stored in a shared memory information structure (SHM_INFO).
To access an existing shared memory region, a program must first attach to it
by passing tport_attach() the region's unique key. tport_attach then sets up
the shared memory information structure. Note: A program should call EITHER
tport_create() to create and attach to a memory region OR tport_attach() to
attach to an existing region. It should never call both.
Just before exitting, a program that had attached to a memory region
should detach from it using tport_detach() and one that had created it should
destroy it using tport_destroy().
None of these four functions has a return value; if a system error
occurs, they will write a message to stdout and exit.
1.3 Writing messages to shared memory.
Messages are written to a shared memory region using tport_putmsg() or
tport_copyto(), given the region's shared memory information structure. When
one tport_putmsg or tport_copyto is writing to memory, no other tport_putmsg or
tport_copyto can access the same region. Both functions write a transport
layer header (TPORT_HEAD) in front of each message in shared memory. The first
byte of this header is always set to FIRST_BYTE to signal the beginning of a
new message. The header also includes the length of the following message,
its "message logo" (MSG_LOGO; its message type, module id and installation id),
and a sequence number. If tport_copyto is used, the sequence number is passed
as an argument to the function, and sequencing from another source can be
preserved. If tport_putmsg is used, the sequence number is assigned and
tracked by tport_putmsg; any previous sequencing of messages will be lost.
tport_putmsg has a limit to the number of different logos for which it can
keep track of sequences numbers (NTRACK_PUT). If this limit is exceeded,
tport_putmsg will not write messages with new logos to memory; it will return
PUT_NOTRACK, write a warning to stdout and continue. tport_copyto has no
tracking limits. tport_putmsg and tport_copyto are multi-thread safe (they
can be used by multiple threads of the same process without problems).
1.4 Retrieving messages from shared memory.
Messages of a given logo are retrieved from a shared memory region
using tport_getmsg() or tport_copyfrom(). A single logo can be requested or
an array of logos can be requested. Additionally, any or all components (type,
module, instid) of the requested message logo(s) can be wildcarded (set to
WILD). tport_getmsg or tport_copyfrom will return when it has found the first
message which matches any of the requested logos. Both functions also keep
track of the sequence number they expect to see for the next message of each
logo; therefore, tport_getmsg or tport_copyfrom can tell if they have missed
any messages. If tport_getmsg misses messages, it returns GET_MISS; if
tport_copyfrom misses messages, it returns either GET_MISS_LAPPED (if memory
was over-written by tport_putmsg or tport_copyto) or GET_MISS_SEQGAP (if a
gap in sequence numbers was passed along by tport_copyto). There is a limit
(NTRACK_GET) to the number of logos for which tport_getmsg or tport_copyfrom
can track sequence numbers. If this limit is exceeded, both functions will
still return a message matching any requested logo, but they won't know if
they have missed any; they will return GET_NOTRACK, write a warning to stdout
and continue. Both functions write the message logo, length (bytes), and
message to addresses in their argument lists. tport_copyfrom has one
additional address argument to which it writes the TPORT_HEAD sequence number
of the returned message. Since both functions have their own private tracking
variables, it is very important that each module use only one of these
functions to grab messages for a given region-logo combination. Otherwise,
the module may see the same message twice! tport_getmsg and tport_copyfrom
are not multi-thread safe; they cannot be used safely by two threads of the
same process.
1.5 Buffering messages in a private memory region.
Several functions have been added to transport.c to give modules a
multi-threaded message-buffering capability. After attaching to or creating
a public shared memory region and creating (tport_create) a private shared
memory region, a module can call tport_buffer() to start the buffering thread,
passing it 2 shared memory information structures (public and private), an
array of logos, and the module id and installation id of the calling module.
tport_buffer creates a thread, tport_bufthr(), which uses tport_copyfrom and
tport_copyto to transfer all messages of the given logo(s) from the public
region to the private region. All sequence numbers from the public region
are preserved in the private region. The buffering-thread reports errors by
calling tport_buferror(), which writes error messages, labeled with the main
thread's module id and installation, to the public region using tport_putmsg.
The main thread must retrieve all of its buffered messages from the private
region using tport_getmsg. [tport_copyfrom and tport_getmsg are not multi-
thread safe, and since the buffering-thread is hard-wired to call
tport_copyfrom, the main thread must use tport_getmsg.] The buffering-thread
will exit when the shared memory header flag in the public region is set to
TERMINATE. The main thread must destroy its private buffering region
(tport_destroy) before it exits.
1.6 Communicating with the shared memory header flag.
Two transport functions deal only with the flag in the shared memory
header structure. This flag is included as a means of communication between
different programs accessing the same region. For instance, if the flag is
set to a certain value, all attached programs should detach and terminate. To
change the value of the flag in a given region, use tport_putflag(). To find
out the current value of the flag, use tport_getflag().
1.7 Error reporting by transport functions.
Transport routines report errors by use of one of 2 functions,
tport_syserr() or tport_buferror(). Both are meant for internal use only by
the other transport functions. tport_syserr is called when a system error has
occurred; it writes a message to stdout and exits. tport_buferror is called by
tport_bufthr (the buffering-thread) when return values from other transport
routines indicate a problem. tport_buferror writes an error message, tagged
with the main thread's module id and installation id, to the public shared
memory region using tport_putmsg and then it returns.
2. FUNCTION CALLS
Below are the function calls, return values and comment lines from the
transport.c source code. They provide a general description of each
function's purpose and its program flow.
2.1 tport_create
2.2 tport_destroy
2.3 tport_attach
2.4 tport_detach
2.5 tport_putmsg
2.6 tport_getmsg
2.7 tport_copyto
2.8 tport_copyfrom
2.9 tport_buffer
2.10 tport_bufthr
2.11 tport_putflag
2.12 tport_getflag
2.13 tport_syserr
2.14 tport_buferror
2.1 tport_create: create a shared memory region & its semaphore, attach
to it and initialize shared memory header values.
void tport_create( SHM_INFO *region, /* info structure for memory region */
long nbytes, /* size of shared memory region */
long memkey ) /* key to shared memory region */
Arguments used as passed: nbytes, memkey
Arguments reset by function: *region
Return Value: None. If any system error occurs during its execution,
tport_create writes a message to stdout and exits.
Program flow:
/* Destroy memory region if it already exists */
/* Create shared memory region */
/* Attach to shared memory region */
/* Initialize shared memory region header */
/* Make semaphore for this shared memory region & set semval = SHM_FREE */
/* set values in the shared memory information structure */
2.2 tport_destroy: destroy a shared memory region.
void tport_destroy( SHM_INFO *region ) /* info structure for memory region */
Arguments used as passed: region
Arguments reset by function: none
Return Value: None. If any system error occurs during its execution,
tport_destroy writes a message to stdout and exits.
Program flow:
/* Set kill flag, give other attached programs time to terminate */
/* Detach from shared memory region */
/* Destroy semaphore set for shared memory region */
/* Destroy shared memory region */
2.3 tport_attach: map to an existing shared memory region.
void tport_attach( SHM_INFO *region, /* info structure for memory region */
long memkey ) /* key to shared memory region */
Arguments used as passed: memkey
Arguments reset by function: *region
Return Value: None. If any system error occurs during its execution,
tport_attach writes a message to stdout and exits.
Program flow:
/* attach to header; find out size memory region; detach */
/* reattach to the entire memory region; get semaphore */
/* set values in the shared memory information structure */
2.4 tport_detach: detach from a shared memory region.
void tport_detach( SHM_INFO *region ) /* info structure for memory region */
Arguments used as passed: region
Arguments reset by function: none
Return Value: None. If any system error occurs during its execution,
tport_detach writes a message to stdout and exits.
2.5 tport_putmsg: write a message into a shared memory region.
int tport_putmsg( SHM_INFO *region, /* info structure for memory region */
MSG_LOGO *putlogo, /* type,module,instid of incoming msg */
long length, /* size of incoming message */
char *msg ) /* pointer to incoming message */
Arguments used as passed: region, putlogo, length, msg
Arguments reset by function: none
Return values: PUT_OK if it put the message in memory with no problems.
PUT_NOTRACK if it did not put the message in memory because
its sequence number tracking limit (NTRACK_PUT) was
exceeded.
PUT_TOOBIG if it did not put the message in memory because
it was too long to fit in the region.
If a system error occurs while tport_putmsg is executing or if a
pointer into the memory region gets lost (doesn't point to a FIRST_BYTE),
tport_putmsg writes a message to stdout and exits.
Program flow:
/* First time around, init the sequence counters, semaphore controls */
/* Set up pointers for shared memory, etc. */
/* First, see if the incoming message will fit in the memory region */
/* Change semaphore; let others know you're using tracking structure & memory */
/* Next, find incoming logo in list of combinations already seen */
/* Incoming logo is a new combination; store it, if there's room */
/* Store everything you need in the transport header */
/* First see if keyin will wrap; if so, reset both keyin and keyold */
/* Then see if there's enough room for new message in shared memory */
/* If not, "delete" oldest messages until there's room */
/* Now copy transport header into shared memory by chunks... */
/* ...and copy message into shared memory by chunks */
/* Finished with shared memory, let others know via semaphore */
2.6 tport_getmsg: read a message out of shared memory.
int tport_getmsg( SHM_INFO *region, /* info structure for memory region */
MSG_LOGO *getlogo, /* requested logo(s) */
short nget, /* number of logos in getlogo */
MSG_LOGO *logo, /* logo of retrieved msg */
long *length, /* size of retrieved message */
char *msg, /* retrieved message */
long maxsize ) /* max length for retrieved message */
Arguments used as passed: region, getlogo, nget, maxsize
Arguments reset by function: *logo, *length, *msg
Return values: GET_OK if it got a message of requested logo(s).
GET_NONE if there were no new messages of requested logo(s).
GET_MISS if it got a message, but missed some. Messages could
be missed for one of 3 reasons:
1) memory was overwritten before the message was retrieved.
2) message was lost before being written to memory and a
sequence # gap was passed to memory by tport_copyto.
3) previous message of returned logo was skipped because
it was longer than maxsize.
GET_NOTRACK if it got a message, but couldn't tell if it
had missed any because its sequence # tracking limit
(NTRACK_GET) was exceeded.
GET_TOOBIG if it found a message of requested logo(s) but
it was too long to fit in caller's buffer. No message
returned, but length and logo of the "toobig" message
are returned.
If a pointer into the memory region gets lost (doesn't point to a FIRST_BYTE),
tport_getmsg writes a message to stdout and exits.
Program flow:
/* Get the pointers set up */
/* First time around, initialize sequence counters, outpointers */
/* find latest starting index to look for any of the requested logos */
/* See if keyin and keyold were wrapped and reset by tport_putmsg; */
/* If so, reset trak[xx].keyout and go back to findkey */
/* Find next message from requested type, module, instid */
/* make sure you haven't been lapped by tport_putmsg */
/* load next header; make sure you weren't lapped */
/* make sure it starts at beginning of a header */
/* see if this msg matches any requested type */
/* Found a message of requested logo; retrieve it! */
/* complain if retrieved msg is too big */
/* copy message by chunks to caller's address */
/* see if we got run over by tport_putmsg while copying msg */
/* if we did, go back and try to get a msg cleanly */
/* set other returned variables */
/* find logo in tracked list */
/* new logo, track it if there's room */
/* check if sequence #'s match; update sequence # */
/* Ok, we're finished grabbing this one */
/* If you got here, there were no messages of requested logo(s) */
/* update outpointer ->msg after retrieved one for all requested logos */
2.7 tport_copyto: put a message into a shared memory region; preserve the
sequence number (passed as an argument) as the transport
layer sequence number.
int tport_copyto( SHM_INFO *region, /*info structure for memory region */
MSG_LOGO *putlogo, /*type,module,instid of incoming msg */
long length, /*size of incoming message */
char *msg, /*pointer to incoming message */
unsigned char seq ) /*preserve as sequence# in TPORT_HEAD*/
Arguments used as passed: region, putlogo, length, msg, seq
Arguments reset by function: none
Return values: PUT_OK if it put the message in memory with no problems.
PUT_TOOBIG if it did not put the message in memory because
it was too long to fit in the region.
If a system error occurs while tport_copyto is executing or if a
pointer into the memory region gets lost (doesn't point to a FIRST_BYTE),
tport_copyto writes a message to stdout and exits.
Program flow:
/* First time around, initialize semaphore controls */
/* Set up pointers for shared memory, etc. */
/* First, see if the incoming message will fit in the memory region */
/* Store everything you need in the transport header */
/* Change semaphore to let others know you're using memory */
/* First see if keyin will wrap; if so, reset both keyin and keyold */
/* Then see if there's enough room for new message in shared memory */
/* If not, "delete" oldest messages until there's room */
/* Now copy transport header into shared memory by chunks... */
/* ...and copy message into shared memory by chunks */
/* Finished with shared memory, let others know via semaphore */
2.8 tport_copyfrom: get a message out of public shared memory; save the
sequence number from the transport layer.
int tport_copyfrom( SHM_INFO *region, /* info structure for memory region */
MSG_LOGO *getlogo, /* requested logo(s) */
short nget, /* number of logos in getlogo */
MSG_LOGO *logo, /* logo of retrieved message */
long *length, /* size of retrieved message */
char *msg, /* retrieved message */
long maxsize, /* max length for retrieved message */
unsigned char *seq ) /* TPORT_HEAD seq# of retrieved msg */
Arguments used as passed: region, getlogo, nget, maxsize
Arguments reset by function: *logo, *length, *msg, *seq
Return values: GET_OK if it got a message of requested logo(s).
GET_NONE if there were no new messages of requested logo(s).
GET_MISS_LAPPED if it got a message, but missed some due to
msgs being overwritten (by tport_putmsg or tport_copyto)
before it got to them.
GET_MISS_SEQGAP if it got a message, but noticed a gap in the
sequence numbers in the ring. This means one of 2 things:
1) a msg was lost before being placed in shared memory and
the sequence gap was transferred into shared memory by
tport_copyto.
2) the previous message of the returned logo was skipped
because it was longer than maxsize.
GET_NOTRACK if it got a message, but couldn't tell if it
had missed any because its sequence # tracking limit
(NTRACK_GET) was exceeded.
GET_TOOBIG if it found a message of requested logo(s) but
it was too long to fit in caller's buffer. No message
returned, but length and logo of the "toobig" message
are returned.
If a pointer into the memory region gets lost (doesn't point to a FIRST_BYTE),
tport_getmsg writes a message to stdout and exits.
Program flow:
Same as tport_getmsg program flow (see section 2.6).
2.9 tport_buffer: initialize the input buffering thread.
int tport_buffer( SHM_INFO *region1, /* transport ring */
SHM_INFO *region2, /* private ring */
MSG_LOGO *getlogo, /* array of logos to copy */
short nget, /* number of logos in getlogo */
unsigned maxMsgSize, /* size of message buffer */
unsigned char module, /* module id of main thread */
unsigned char instid ) /* inst id of main thread */
Arguments used as passed: region1, region2, getlogo, nget, maxMsgSize,
module, instid
Arguments reset by function: none
Return values: 0 if there were no errors.
-1 if there was an error allocating the internal message buffer,
or if there was an error creating the thread.
Program flow:
/* Allocate internal message buffer */
/* Copy function arguments to global variables */
/* Start the input buffer thread, tport_bufthr */
/* Yield to the buffer thread */
2.10 tport_bufthr: thread to buffer input from one transport region to another.
void *tport_bufthr( void *dummy )
Arguments: none
Return values: none
Program flow:
This function is an infinite loop which will exit only when the termination
flag is set in the public shared memory region's header:
/* Check the flag in the public region; exit if it's set to TERMINATE */
/* Try to copy a message from the public memory region with tport_copyfrom */
/* Handle return values from tport_copyfrom */
/* If you did get a message, copy it to private ring with tport_copyto */
2.11 tport_putflag: set the flag in a shared memory header.
void tport_putflag( SHM_INFO *region, /* shared memory info structure */
short flag ) /* value to set header flag to */
Arguments used as passed: region, flag
Arguments reset by function: none
Return Value: none
2.12 tport_getflag: return the value of the flag from a shared memory header.
int tport_getflag( SHM_INFO *region ) /* shared memory info structure */
Arguments used as passed: region
Arguments reset by function: none
Return value: The value of the shared memory header flag.
2.13 tport_syserr: print a system error and exit.
void tport_syserr( char *msg, /* message to print */
long key ) /* key of memory region that had an error */
Arguments used as passed: msg, key
Arguments reset by function: none
Return Value: None. In fact it never returns, but always exits after
writing the error message to stdout.
2.14 tport_buferror: build an ascii earthworm error message and put it
in the public memory region using tport_putmsg.
void tport_buferror( short ierr, /* 2-byte error word */
char *note ) /* string describing error */
Arguments used as passed: ierr, note
Arguments reset by function: none
Return Value: none
3. PROGRAMMING TIPS
Here are some tips for writing and running programs using transport.c:
Region key(s) should be defined in a .h file which is included by all
programs that will access the region(s). One program should create the
memory region(s) (tport_create); other programs accessing those regions
will attach to them (tport_attach). The "creator" can also be a "putter"
or "getter" or it can be a program with no purpose other than
creating/destroying memory regions.
When deciding how large to make a memory region (tport_create), remember
that the transport layer uses a portion of the memory region for its own
bookkeeping. The region size is NOT required to be an even multiple of the
size of the messages it will contain. However, suppose a user wants the
region to be exactly large enough to store NUMRING messages of size MSGSIZE.
To include space for transport bookkeeping too, the region size should be:
sizeof(SHM_HEAD) + NUMRING * ( sizeof(TPORT_HEAD) + MSGSIZE )
At run time, the "creator" must be started first. A few seconds should
be allowed for the regions to be set up before starting "attachers".
Otherwise the "attachers" will exit immediately because they can't find
the memory regions.
Any program accessing shared memory should periodically look at the flag
in the memory's header structure (tport_getflag). If the flag is set to
TERMINATE, any "attacher" should detach from memory (tport_detach) and
exit, and the "creator" should destroy the memory region(s) (tport_destroy)
and exit.
To initiate such a polite termination of all programs, one program
must set that termination flag (tport_putflag). A "killer" program,
whose only purpose is to attach to a region and set the flag, is a useful
tool for keyboard-initiated exits.
Simple examples of these types of programs reside in the same directory
as transport.c. They are:
putter1.c creates regions and writes messages as module 1.
putter2.c attaches to regions and writes messages as module 2.
getter.c attactes to regions and retrieves messages, printing them.
killer.c sets terminate flag to stop all programs.
keys.h include file defining shared memory region keys.
go simple script to start the programs.
Makefile
Note: Transport.c was designed to work in programs which run continuously.
If, however, a putter or getter is a transient beast that is run only
intermittently, the getter may return the "GET_MISS" status without actually
missing any messages. This is due to the fact that every time a putter or
starts up, its sequence # trackers are set to 0.
4. BUG FIXES AND PROGRAM MODIFICATIONS
4.1 Mishandled shared memory pointer wraps in tport_putmsg.
4.2 Missing argument to shmctl.
4.3 Speed enhancement using memcpy.
4.4 Making tport_putmsg multi-thread safe.
4.5 Mishandled shared memory pointer resets in tport_getmsg.
4.6 Minor crack in tport_getmsg and tport_copyfrom.
4.7 Logo-tracking problem with GET_TOOBIG messages,
tport_getmsg and tport_copyfrom.
4.8 Tracking problem when no messages of requested logo
are ever returned, tport_getmsg and tport_copyfrom.
4.9 Variable name changed to allow use of C++ compilers.
4.10 Semaphore operations problem in tport_putmsg and tport_copyto
(Solaris version).
4.1 Mishandled shared memory pointer wraps.
Problem: tport_putmsg mishandled wraps in the shared memory header's
unsigned long keyin and keyold. The caused the transport layer to
lose its place in the memory ring and die.
The Fix: After resetting keyin and keyold, check to make sure that keyin is
larger than keyold. If not make keyin = keyin + keymax.
Change made in tport_putmsg on 10/24/94 by Lynn Dietz.
I also changed transport.c so that it writes warning and error
messages to stdout (instead of stderr as it was doing) so that the
messages can easily be redirected to a log file.
Change made in transport.c on 10/24/94 by Lynn Dietz.
4.2 Missing argument to shmctl.
Problem: tport_create and tport_destroy each have a call to shmctl().
Shmctl() takes 3 arguments, but I only had the first two passed.
The compiler under SunOS never complained about it, but the
Solaris compiler 3.0.1 did.
The Fix: I added the 3rd argument (struct shmid_ds shmbuf) to both of
the shmctl() calls.
Change made in transport.c on 3/28/95 by Lynn Dietz.
4.3 Speed enhancement using memcpy.
Problem: I noticed that coaxtoring, a program that just reads messages
from ethernet and puts them into shared memory using tport_putmsg,
took a big chunk of the cpu on a Sparc2 when handling large
messages (>50,000 bytes). Suspect that something isn't optimized.
The Fix: I changed how tport_putmsg and tport_getmsg copy messages from
one address to another. A byte-by-byte for loop was replaced with
one or two (if the message was wrapped around the end of the ring)
calls to memcpy(). This sped up the coaxtoring program by 20-30%.
Change made in transport.c on 6/20/95 by Lynn Dietz.
4.4 Making tport_putmsg multi-thread safe.
Problem: Previously, the semaphore was set in tport_putmsg after the incoming
logo was found in the tracking list. If more than one thread of the
same process was using tport_putmsg, they could have competed for
access to the tracking structure, potentially causing duplicated
sequence numbers or other errors.
The Fix: tport_putmsg now sets the semaphore before it looks for the logo in
the tracking structure. Since only one tport_putmsg can access the
tracking structure at a time, multiple threads of one process can
safely use the same routine.
Change made in transport.c on 6/27/95 by Lynn Dietz.
4.5 Mishandled shared memory pointer resets in tport_getmsg.
Problem: Each tport_getmsg() and tport_copyfrom() must reset its tracking
pointers (trak[xx].keyout) after shared memory header keyin & keyold
are wrapped and reset (by tport_putmsg or tport_copyto). Sometimes,
keyout was mistakenly reset to a number less than keyold, causing
the getter to grab messages from the ring starting with the oldest
complete message in the ring. This results in a "missed message"
error, because of a gap in transport sequence numbers. It also
causes some messages to be processed twice.
The Fix: After resetting a keyout value in tport_getmsg() and tport_copyfrom(),
first see if it still points to the FIRST_BYTE of a message.
If it does, make sure the value of keyout lies between keyold and
keyin. If it doesn't point to a FIRST_BYTE, the getter was lapped
by a putter; reset keyout to keyold.
Change made in transport.c on 1/17/96 by Lynn Dietz
4.6 Minor crack in tport_getmsg and tport_copyfrom.
Problem: When reading shared memory, both tport_getmsg and tport_copyfrom use
this logic: make sure I haven't been lapped by a putter, grab a
TPORT_HEAD from the ring, make sure that the TPORT_HEAD starts with
a FIRST_BYTE. On very rare occassions, a putter will overwrite the
first byte (or the TPORT_HEAD) between the getter's lap-check an its
grabbing the header from the ring. In this case, the getter will
complain that the header doesn't begin with a FIRST_BYTE and it will
exit.
The Fix: Add another lap-check just after tport_getmsg and tport_copyfrom
grab a TPORT_HEAD from the ring. Their logic now looks like this:
make sure I haven't been lapped by a putter, grab a TPORT_HEAD from
the ring, make sure I haven't been lapped by a putter, make sure
that the TPORT_HEAD starts with a FIRST_BYTE. Note: another lap-
check is done after each message is grabbed from the ring.
Change: In a move totally unrelated to the above problem, I changed
the word "WARNING" to "NOTICE" in all references to wraps of
keyin/keyout/keyget to reflect the fact that this is really a
normal, albeit rare, occurrance.
Changes made in transport.c on 6/12/96 by Lynn Dietz
4.7 Logo-tracking problem with GET_TOOBIG messages,
tport_getmsg and tport_copyfrom.
Problem: Whenever tport_getmsg or tport_copyfrom find a message that matches
the requested logo(s) but is too long for the target address, they
return the logo and length of the message, but they never enter the
logo-tracking part of the routine. This causes a problem only if the
very first message is GET_TOOBIG; since no logos are being tracked,
these functions don't record the fact that they've looked at this
GET_TOOBIG message already. On the next call, they look at the same
GET_TOOBIG message, and thus get stuck looking at this same message
forever... (which may put you into an infinite loop depending on
how you handled the return codes).
The Fix: Modify the program flow of tport_getmsg and tport_copyfrom such that
after a TOOBIG message is found, they enter the logo-tracking part
of the routine. Also make sure that the return code does NOT get
changed from GET_TOOBIG!
Changes made in transport.c on 6/12/96 by Lynn Dietz
4.8 Tracking problem when no messages of requested logo are ever returned,
tport_getmsg and tport_copyfrom.
Symptom: If a module never finds a message of any requested logo in a given
memory region, that module eventually becomes a CPU hog. We know
something is wrong because the module has nothing to process; it
should be doing a loop something like: call tport_getmsg, get a
return code of GET_NONE, sleep a little bit, try again. Where is
the CPU going?
Problem: The problem is essentially the same as that described in section 4.7.
No entries exist in the logo-tracking list until a message of a
requested logo is actually found in shared memory. If no such
message has been found, tport_getmsg and tport_copyfrom have no way
to record the position in shared memory of the last message that
they considered (and rejected). So on every single call, tport_getmsg
or tport_copyfrom start at the oldest complete message in memory and
look at every single one (even though they've probably seen most of
them already...) before concluding that none of them match their
request. If the memory region is large and there are a lot of
little messages in it, this can take a lot of CPU!
The Fix: Modify tport_getmsg and tport_copyfrom so that the first thing they
do is verify that each of the requested logos is entered in the
logo-tracking list. This way, even if none of the requested logos
is found, there is place to record the position of the last message
that was considered for each requested logo. (The sequence number
tracking for each logo remains "inactive" until the first message
with that logo is found). On subsequent calls, tport_getmsg and
tport_copyto will only look at messages they haven't seen before.
Changes made in transport.c on 6/18/96 by Lynn Dietz
4.9 Variable name changed to allow use of C++ compilers.
Problem: We had used "class" as the variable name for the installation in
the MSG_LOGO structure. However, "class" is also a keyword in C++,
so if you want to use a C++ compiler, you cannot use "class" as
a variable name.
The Fix: Change all references to "class" to "instid" to allow this software
to be compiled with a C++ compiler.
Changes made in transport.c and transport.h on 3/13/97 by Lynn Dietz
4.10 Semaphore operations problem in tport_putmsg and tport_copyto (Solaris).
Symptom: All modules attached to a given transport ring (running on a Solaris
system) suddenly die with a message like:
"ERROR: tport_getmsg; keyget not at FIRST_BYTE, Region xxxx"
This message implies that the transport ring is corrupted. The symptom
was first noticed when Doug Neuhauser ran his transport-based UCB code
on a dual-processor Ultra. A dual-processor X86 Solaris machine has
also exhibited this symptom while running Earthworm v3.1 code.
Problem: Many thanks go to Doug Neuhauser for tracking down the bug! In both
tport_putmsg and tport_copyto, the structure sembuf sops, used as
an argument to the semaphore operation function semop(), had been
declared as a static struct. In multi-threaded code, you can have two
simultaneous invocations of tport_putmsg(), eg one for a heartbeat
and one for data. Each one will overwrite the values of sops for the
other thread. This bug shows up readily on a multi-processor machine
on which two threads can really run simultaneously. It could also
presumably occur on a single-processor machine, but we've never
experienced it yet. This bug can manifest itself with these symptoms:
1) a corrupted transport ring, from both threads writing to the
ring at the same time, and
2) deadlock, where both threads are waiting for the semaphore.
The Fix: Remove "static" from the declaration of "struct sembuf sops;" in
tport_putmsg and tport_copyto. Also, pull the initialization of
sops structure members out of one-time-only initialization loops.
Changes made in solaris/transport.c on 4/24/98 by Lynn Dietz
For more information contact: Lynn Dietz
dietz@andreas.wr.usgs.gov
415-329-5520
