1nettee(1) nettee Manual nettee(1)
2
3
4
6 nettee - a network "tee" program
7
8
10 nettee [options]
11
12
14 nettee passes a data stream to one or more child nodes using a daisy‐
15 chain method. On each node nettee may also direct the stream to a file
16 or pipe. nettee allows large amounts of data to be quickly distributed
17 to multiple nodes on a network at a rate limited only by the network
18 bandwidth. The distribution chain is typically linear for each network
19 switch but may branch when nodes utilize multiple switches. For maxi‐
20 mum throughput only one instance of nettee should utilize each network
21 interface.
22
23
24 When nettee starts it waits for a connection from the upstream node
25 before attempting to connect to its downstream nodes. Consequently
26 nettee may be started on the nodes in any order (by a script, rsh, ssh,
27 and so forth.) Typically only the node that reads the data stream for
28 stdin or a file will be set to log messages, so that the progress of
29 the transfer may be monitored. Transmission errors are detected by
30 comparing the total number of bytes read by each child node with the
31 number of bytes transmitted to that child.
32
33
34 Error Handling
35 By default severe errors cause the entire chain to abort. By utilizing
36 the -conwf and -colwf options nettee may be instructed to do its best
37 to continue processing in the event of certain write failures of the
38 data stream. Note that failures which occur while the distribution
39 chain is forming are still fatal events. To allow the program to con‐
40 tinue with a truncated or alternate chain if chain formation errors are
41 encountered utilize the -connf option, and optionally specify alternate
42 targets in each hostlist. If the node above the failed node is allowed
43 to emit messages and errors ( for instance: -v 5 ) messages similar to
44 these will be sent to the log destination ( -log ):
45
46 Failures detected in child 0 [node34]: NWF
47 Failures detected in child 1 [node35]: NONE
48 Failures detected in chain: NWF
49
50 The first type of message describes the failures that were detected in
51 the named child node, that is, those named in the -next option. The
52 second message describes failures that were detected anywhere further
53 on in the chain. The error codes currently defined are: NONE no
54 errors, NWF network write failure, LWF local write failure, BBC child
55 returned incorrect byte count, BSTAT child returned unknown or bad sta‐
56 tus, and NNF could not connect to (one or more) downstream chain nodes.
57
58 Exit Status
59 nettee will normally emit an EXIT_SUCCESS status. (0 on Unix.) This
60 is true even if the errors were detected and handled in the node itself
61 or in a child node. nettee will emit an EXIT_FAILURE status if it was
62 forced to close by an unhandled event such as a timeout, write failure,
63 or unexpected socket closure.
64
65
67 -h Print help information.
68
69
70 -hexamples
71 Print examples.
72
73
74 -herrors
75 Print error status codes.
76
77
78 -i Print version, license, and copyright information.
79
80
81 -in <SRC>
82 Reads data from <SRC> which may have one of three values: nettee
83 reads from the upstream node; - reads from stdin; socket read
84 the output of a command from a socket; filename reads from a
85 file. If no -in option is present the programs reads data from
86 the upstream node.
87
88
89 -out <DST>
90 Writes data locally to <DST> which may have one of three values:
91 none writes nothing locally; - writes to stdout; socket write
92 the datastream to a command through a socket; filename writes to
93 a file. If no -out option is present the program writes data to
94 stdout.
95
96
97 -next <HOSTLISTS>
98 Writes data to downstream destination[s]
99 hostlist1(,hostlist2(,hostlist3(...))) where the hostlist
100 entries are separated by commas or spaces. A hostlist consists
101 of either a single hostname, or a comma separated list of host‐
102 names enclosed in square brackets. Example:
103 node1,[node2,node3],[node4,node5,node6],node7. The bracketed
104 form allows for automatic failover if unreachable nodes are
105 encountered and if -connf is specified. The first hostname in
106 the list is tried, then the next, and so on. There may be 1-8
107 hostlists. The number of hostlists controls the topology of the
108 distribution chain. Use a linear distribution chain (a single
109 hostlist) when all nodes share a single network switch. Use a
110 forked distribution chain (multiple hostlist) when nodes are
111 connected to two or more network switches. The End of Chain
112 condition (no downstream write) is indicated by a <HOSTS> value
113 of . , "" , or _EOC_ . An End of Chain condition is also indi‐
114 cated by the absence of an -next option. If End of Chain is
115 indicated there may not be any other hostslists specified.
116
117
118 -cmd <COMMAND>
119 Specifies the command to use in conjunction with an -in socket
120 or -out socket option. Since only a single <COMMAND> may be
121 specified socket may not be applied to both -in and -out at the
122 same time. When -cmd is used with -in socket a child process
123 running <COMMAND> reads data from a disk or other device and
124 writes the resulting data stream to stdout. When -cmd is used
125 with -out socket a child process running <COMMAND> reads the
126 datastream from stdin and writes the processed data to a disk or
127 other device. Typically the <COMMAND> string invokes tar or
128 some other archiving program. In some instances using sockets
129 and -cmd will be faster than using the same command in a pipe
130 due to the larger buffer size used for the socket. Run nettee
131 -hexamples to see a usage example.
132
133
134 -stm <EOS>
135 stream text through a nettee chain until the string <EOS> is
136 encountered, then exit. This allows short text messages to tra‐
137 verse the chain without waiting for a buffer to fill. Since the
138 text message can very rapidly traverse the nettee chain it can
139 be piped into execinput (or any other program that will execute
140 its stdin as commands) to produce essentially simultaneous exe‐
141 cution on all target nodes. The <EOS> string is not passed
142 through the data chain and its length is ignored. When used to
143 start further nettee processes on the target nodes <PORT> values
144 must be chosen to avoid interference. While this mode may be
145 convenient for setting up Beowulf nodes it is exceedingly dan‐
146 gerous for general use since any command introduced into the
147 command stream will execute on all chain nodes as if submitted
148 by the owner of the nettee process on that node. Run nettee
149 -hexamples to see a usage example.
150
151
152 -name <STRING>
153 Specify the node name used in messages (<=127 characters). If
154 not supplied the values of the environmental variables MYHOST‐
155 NAME and HOSTNAME are first checked, and if those are not
156 defined, the result of a gethostname() call is used.
157
158
159 -log <LDST>
160 Errors and messages are written to <LDST> which may have one of
161 two values: - writes to stderr or filename writes to a file. If
162 no -log option is present the program writes messages to stderr.
163
164
165 -p,-port <PORT>
166 First of two consecutive ports use for communication. If no
167 -port option is present the program uses the default value of
168 9997.
169
170
171 -v <VERBOSE>
172 <VERBOSE> is a bit mask which controls the types of warning and
173 error messages which are sent to the -log destination. Bit val‐
174 ues indicate: 1 show error messages; 2 show command line set‐
175 tings; 4 show messages; 8 show periodic status messages during
176 transfer; 16 prepend nodename to all messages. Use a <VERBOSE>
177 value of 0 to eliminate all messages. If no -v is present the
178 program uses a default <VERBOSE> value of 1.
179
180
181 -q Suppresss "ignored signal" messages.
182
183
184
185 -t <WAIT>
186 Wait up to <WAIT> seconds for a connection from upstream in the
187 chain to form or data to be received. If neither of these
188 events occur exit with an error. A value of 0 waits forever and
189 will only exit on an end of data condition. If no -t is present
190 the program uses a default <WAIT> value of 0. The -iconnf<WAIT>
191 and -w options control timeouts for downstream connections.
192
193
194 -w Wait for the next node to boot or attach to the network. If not
195 specified and the next node is not reachable nettee will exit
196 with an error no matter what the -t <WAIT> and -iconnf <WAIT>
197 timeout values are.
198
199
200 -colwf Continue on Local Write Failure. Normally the failure of a
201 write of the data stream to the local output will be fatal and
202 the entire distribution chain will collapse immediately. (Typi‐
203 cally this happens when data is written to disk and a partition
204 fills or there is an ownership problem. A complete disk failure
205 may initially present this way but often goes on to crash the
206 node, resulting also in a network write failure.) When -colwf
207 is set and a local write failure occurs on a node that node will
208 continue to relay data down the chain. The node that failed
209 will not have correctly processed the data stream locally but
210 all other nodes will be unaffected by this failure. The top
211 node will emit an error message when this occurs so that a sub‐
212 sequent analysis with other tools may locate the node(s) which
213 failed. This option may only be employed on a node that reads
214 data from an upstream node.
215
216
217 -conwf Continue on Network Write Failure. Normally the failure of a
218 write of the data stream to the next node will be fatal and the
219 entire distribution chain will collapse immediately. (Typically
220 this happens when a node crashes while nettee is running.) When
221 -conwf is set and a network write failure occurs on a node
222 (indicating that the next node has failed) the node will con‐
223 tinue to process the data stream locally but will make no fur‐
224 ther attempts to transfer data to the next node in the chain.
225 This allows the data transfer to complete on a chain down to the
226 node above a failed node. The top node will emit an error mes‐
227 sage when this occurs so that a subsequent analysis with other
228 tools may locate the node(s) which failed. This option may only
229 be employed on a node that reads data from an upstream node
230
231
232 -connf <WAIT>
233 Continue on Next Node Failure. Give each node in a hostlist
234 <WAIT> seconds to join the chain. After that each successive
235 host in the hostlist is given <WAIT> seconds to join, and if
236 none succeed, no data will be sent to any of those hosts. If
237 -connf is not specified or the wait time is set to zero seconds,
238 the program will wait forever for a connection to the first node
239 in each hostlist.
240
241
242 -progress <INTERVAL>
243 If -v 8 is used a status message is emitted every <INTERVAL>
244 bytes transferred. The default value of 10000000 will be too
245 small for a very fast network.
246
247
248
250 netcat(1).
251
252 nettee is derived from Felix Rauch's dolly which is available here:
253 http://www.cs.inf.ethz.ch/CoPs/patagonia/#dolly
254
255 The nettee home page is: http://saf.bio.caltech.edu/nettee.html
256
257
258
260 Copyright: 2008 David Mathog and Caltech.
261 Copyright: Felix Rauch and ETH Zurich
262
263
265 Freely distributed under the second GNU General Public License (GPL 2).
266
267
269 David Mathog
270 Biology Division, Caltech
271
272
273
274
275
276nettee 0.1.9 OCT 2008 nettee(1)