1nettee(1)                        nettee Manual                       nettee(1)
2
3
4

NAME

6       nettee - a network "tee" program
7
8

SYNOPSIS

10       nettee [options]
11
12

DESCRIPTION

14       nettee  passes  a data stream to one or more child nodes using a daisy‐
15       chain method.  On each node nettee may also direct the stream to a file
16       or pipe.  nettee allows large amounts of data to be quickly distributed
17       to multiple nodes on a network at a rate limited only  by  the  network
18       bandwidth.  The distribution chain is typically linear for each network
19       switch but may branch when nodes utilize multiple switches.  For  maxi‐
20       mum  throughput only one instance of nettee should utilize each network
21       interface.
22
23
24       When nettee starts it waits for a connection  from  the  upstream  node
25       before  attempting  to  connect  to its downstream nodes.  Consequently
26       nettee may be started on the nodes in any order (by a script, rsh, ssh,
27       and  so forth.)  Typically only the node that reads the data stream for
28       stdin or a file will be set to log messages, so that  the  progress  of
29       the  transfer  may  be  monitored.  Transmission errors are detected by
30       comparing the total number of bytes read by each child  node  with  the
31       number of bytes transmitted to that child.
32
33
34   Error Handling
35       By default severe errors cause the entire chain to abort.  By utilizing
36       the -conwf and -colwf options nettee may be instructed to do  its  best
37       to  continue  processing  in the event of certain write failures of the
38       data stream.  Note that failures which  occur  while  the  distribution
39       chain  is forming are still fatal events.  To allow the program to con‐
40       tinue with a truncated or alternate chain if chain formation errors are
41       encountered utilize the -connf option, and optionally specify alternate
42       targets in each hostlist.  If the node above the failed node is allowed
43       to  emit messages and errors ( for instance: -v 5 ) messages similar to
44       these will be sent to the log destination ( -log ):
45
46       Failures detected in child 0 [node34]: NWF
47       Failures detected in child 1 [node35]: NONE
48       Failures detected in chain: NWF
49
50       The first type of message describes the failures that were detected  in
51       the  named  child  node,  that is, those named in the -next option. The
52       second message describes failures that were detected  anywhere  further
53       on  in  the  chain.   The  error  codes  currently defined are: NONE no
54       errors, NWF network write failure, LWF local write failure,  BBC  child
55       returned incorrect byte count, BSTAT child returned unknown or bad sta‐
56       tus, and NNF could not connect to (one or more) downstream chain nodes.
57
58   Exit Status
59       nettee will normally emit an EXIT_SUCCESS status.  (0 on  Unix.)   This
60       is true even if the errors were detected and handled in the node itself
61       or in a child node.  nettee will emit an EXIT_FAILURE status if it  was
62       forced to close by an unhandled event such as a timeout, write failure,
63       or unexpected socket closure.
64
65

OPTIONS

67       -h     Print help information.
68
69
70       -hexamples
71              Print examples.
72
73
74       -herrors
75              Print error status codes.
76
77
78       -i     Print version, license, and copyright information.
79
80
81       -in <SRC>
82              Reads data from <SRC> which may have one of three values: nettee
83              reads  from  the  upstream node; - reads from stdin; socket read
84              the output of a command from a socket;  filename  reads  from  a
85              file.   If no -in option is present the programs reads data from
86              the upstream node.
87
88
89       -out <DST>
90              Writes data locally to <DST> which may have one of three values:
91              none  writes  nothing  locally; - writes to stdout; socket write
92              the datastream to a command through a socket; filename writes to
93              a file.  If no -out option is present the program writes data to
94              stdout.
95
96
97       -next <HOSTLISTS>
98              Writes       data       to       downstream       destination[s]
99              hostlist1(,hostlist2(,hostlist3(...)))    where   the   hostlist
100              entries are separated by commas or spaces.  A hostlist  consists
101              of  either a single hostname, or a comma separated list of host‐
102              names     enclosed     in     square     brackets.      Example:
103              node1,[node2,node3],[node4,node5,node6],node7.    The  bracketed
104              form allows for automatic  failover  if  unreachable  nodes  are
105              encountered  and  if -connf is specified.  The first hostname in
106              the list is tried, then the next, and so on.  There may  be  1-8
107              hostlists.  The number of hostlists controls the topology of the
108              distribution chain.  Use a linear distribution chain  (a  single
109              hostlist)  when  all nodes share a single network switch.  Use a
110              forked distribution chain (multiple  hostlist)  when  nodes  are
111              connected  to  two  or  more network switches.  The End of Chain
112              condition (no downstream write) is indicated by a <HOSTS>  value
113              of .  , "" , or _EOC_ .  An End of Chain condition is also indi‐
114              cated by the absence of an -next option.  If  End  of  Chain  is
115              indicated there may not be any other hostslists specified.
116
117
118       -cmd <COMMAND>
119              Specifies  the  command to use in conjunction with an -in socket
120              or -out socket option.  Since only a  single  <COMMAND>  may  be
121              specified  socket may not be applied to both -in and -out at the
122              same time.  When -cmd is used with -in socket  a  child  process
123              running  <COMMAND>  reads  data  from a disk or other device and
124              writes the resulting data stream to stdout. When  -cmd  is  used
125              with  -out  socket  a  child process running <COMMAND> reads the
126              datastream from stdin and writes the processed data to a disk or
127              other  device.   Typically  the  <COMMAND> string invokes tar or
128              some other archiving program.  In some instances  using  sockets
129              and  -cmd  will  be faster than using the same command in a pipe
130              due to the larger buffer size used for the socket.   Run  nettee
131              -hexamples to see a usage example.
132
133
134       -stm <EOS>
135              stream  text  through  a  nettee chain until the string <EOS> is
136              encountered, then exit.  This allows short text messages to tra‐
137              verse the chain without waiting for a buffer to fill.  Since the
138              text message can very rapidly traverse the nettee chain  it  can
139              be  piped into execinput (or any other program that will execute
140              its stdin as commands) to produce essentially simultaneous  exe‐
141              cution  on  all  target  nodes.   The <EOS> string is not passed
142              through the data chain and its length is ignored.  When used  to
143              start further nettee processes on the target nodes <PORT> values
144              must be chosen to avoid interference.  While this  mode  may  be
145              convenient  for  setting up Beowulf nodes it is exceedingly dan‐
146              gerous for general use since any  command  introduced  into  the
147              command  stream  will execute on all chain nodes as if submitted
148              by the owner of the nettee process on  that  node.   Run  nettee
149              -hexamples to see a usage example.
150
151
152       -name <STRING>
153              Specify  the  node name used in messages (<=127 characters).  If
154              not supplied the values of the environmental  variables  MYHOST‐
155              NAME  and  HOSTNAME  are  first  checked,  and  if those are not
156              defined, the result of a gethostname() call is used.
157
158
159       -log <LDST>
160              Errors and messages are written to <LDST> which may have one  of
161              two values: - writes to stderr or filename writes to a file.  If
162              no -log option is present the program writes messages to stderr.
163
164
165       -p,-port <PORT>
166              First of two consecutive ports use  for  communication.   If  no
167              -port  option  is  present the program uses the default value of
168              9997.
169
170
171       -v <VERBOSE>
172              <VERBOSE> is a bit mask which controls the types of warning  and
173              error messages which are sent to the -log destination.  Bit val‐
174              ues indicate: 1 show error messages; 2 show  command  line  set‐
175              tings;  4  show messages; 8 show periodic status messages during
176              transfer; 16 prepend nodename to all messages.  Use a  <VERBOSE>
177              value  of  0 to eliminate all messages.  If no -v is present the
178              program uses a default <VERBOSE> value of 1.
179
180
181       -q     Suppresss "ignored signal" messages.
182
183
184
185       -t <WAIT>
186              Wait up to <WAIT> seconds for a connection from upstream in  the
187              chain  to  form  or  data  to  be received.  If neither of these
188              events occur exit with an error.  A value of 0 waits forever and
189              will only exit on an end of data condition.  If no -t is present
190              the program uses a default <WAIT> value of 0.  The -iconnf<WAIT>
191              and -w options control timeouts for downstream connections.
192
193
194       -w     Wait for the next node to boot or attach to the network.  If not
195              specified and the next node is not reachable  nettee  will  exit
196              with  an  error  no matter what the -t <WAIT> and -iconnf <WAIT>
197              timeout values are.
198
199
200       -colwf Continue on Local Write Failure.   Normally  the  failure  of  a
201              write  of  the data stream to the local output will be fatal and
202              the entire distribution chain will collapse immediately.  (Typi‐
203              cally  this happens when data is written to disk and a partition
204              fills or there is an ownership problem.  A complete disk failure
205              may  initially  present  this way but often goes on to crash the
206              node, resulting also in a network write failure.)   When  -colwf
207              is set and a local write failure occurs on a node that node will
208              continue to relay data down the chain.   The  node  that  failed
209              will  not  have  correctly processed the data stream locally but
210              all other nodes will be unaffected by  this  failure.   The  top
211              node  will emit an error message when this occurs so that a sub‐
212              sequent analysis with other tools may locate the  node(s)  which
213              failed.   This  option may only be employed on a node that reads
214              data from an upstream node.
215
216
217       -conwf Continue on Network Write Failure.  Normally the  failure  of  a
218              write  of the data stream to the next node will be fatal and the
219              entire distribution chain will collapse immediately.  (Typically
220              this happens when a node crashes while nettee is running.)  When
221              -conwf is set and a network  write  failure  occurs  on  a  node
222              (indicating  that  the  next node has failed) the node will con‐
223              tinue to process the data stream locally but will make  no  fur‐
224              ther  attempts  to  transfer data to the next node in the chain.
225              This allows the data transfer to complete on a chain down to the
226              node  above a failed node.  The top node will emit an error mes‐
227              sage when this occurs so that a subsequent analysis  with  other
228              tools  may locate the node(s) which failed. This option may only
229              be employed on a node that reads data from an upstream node
230
231
232       -connf <WAIT>
233              Continue on Next Node Failure.  Give each  node  in  a  hostlist
234              <WAIT>  seconds  to  join the chain.  After that each successive
235              host in the hostlist is given <WAIT> seconds  to  join,  and  if
236              none  succeed,  no  data will be sent to any of those hosts.  If
237              -connf is not specified or the wait time is set to zero seconds,
238              the program will wait forever for a connection to the first node
239              in each hostlist.
240
241
242       -progress <INTERVAL>
243              If -v 8 is used a status message  is  emitted  every  <INTERVAL>
244              bytes  transferred.   The  default value of 10000000 will be too
245              small for a very fast network.
246
247
248
250       netcat(1).
251
252       nettee is derived from Felix Rauch's dolly  which  is  available  here:
253       http://www.cs.inf.ethz.ch/CoPs/patagonia/#dolly
254
255       The nettee home page is: http://saf.bio.caltech.edu/nettee.html
256
257
258

COPYRIGHTS

260       Copyright: 2008 David Mathog and Caltech.
261       Copyright: Felix Rauch and ETH Zurich
262
263

LICENSE

265       Freely distributed under the second GNU General Public License (GPL 2).
266
267

AUTHOR

269       David Mathog
270       Biology Division, Caltech
271
272
273
274
275
276nettee 0.1.9                       OCT 2008                          nettee(1)
Impressum