1Cflow(3) User Contributed Perl Documentation Cflow(3)
2
3
4
6 Cflow::find - find "interesting" flows in raw IP flow files
7
9 use Cflow;
10
11 Cflow::verbose(1);
12 Cflow::find(\&wanted, <*.flows*>);
13
14 sub wanted { ... }
15
16 or:
17
18 Cflow::find(\&wanted, \&perfile, <*.flows*>);
19
20 sub perfile {
21 my $fname = shift;
22 ...
23 }
24
26 This module implements an API for processing IP flow accounting
27 information which as been collected from routers and written into flow
28 files by one of the various flow collectors listed below.
29
30 It was originally conceived and written for use by FlowScan:
31
32 http://net.doit.wisc.edu/~plonka/FlowScan/
33
35 This package is of little use on its own. It requires input in the
36 form of time-stamped raw flow files produced by other software
37 packages. These "flow sources" either snoop a local ethernet (via
38 libpcap) or collect flow information from IP routers that are
39 configured to export said information. The following flow sources are
40 supported:
41
42 argus by Carter Bullard:
43 http://www.qosient.com/argus/
44
45 flow-tools by Mark Fullmer (with NetFlow v1, v5, v6, or v7):
46 http://www.splintered.net/sw/flow-tools/
47
48 CAIDA's cflowd (with NetFlow v5):
49 http://www.caida.org/tools/measurement/cflowd/
50 http://net.doit.wisc.edu/~plonka/cflowd/
51
52 lfapd by Steve Premeau (with LFAPv4):
53 http://www.nmops.org/
54
56 Cflow::find() will iterate across all the flows in the specified files.
57 It will call your wanted() function once per flow record. If the file
58 name argument passed to find() is specified as "-", flows will be read
59 from standard input.
60
61 The wanted() function does whatever you write it to do. For instance,
62 it could simply print interesting flows or it might maintain byte,
63 packet, and flow counters which could be written to a database after
64 the find subroutine completes.
65
66 Within your wanted() function, tests on the "current" flow can be
67 performed using the following variables:
68
69 $Cflow::unix_secs
70 secs since epoch (deprecated)
71
72 $Cflow::exporter
73 Exporter IP Address as a host-ordered "long"
74
75 $Cflow::exporterip
76 Exporter IP Address as dotted-decimal string
77
78 $Cflow::localtime
79 $Cflow::unix_secs interpreted as localtime with this strftime(3)
80 format:
81
82 %Y/%m/%d %H:%M:%S
83
84 $Cflow::srcaddr
85 Source IP Address as a host-ordered "long"
86
87 $Cflow::srcip
88 Source IP Address as a dotted-decimal string
89
90 $Cflow::dstaddr
91 Destination IP Address as a host-ordered "long"
92
93 $Cflow::dstip
94 Destination IP Address as a dotted-decimal string
95
96 $Cflow::input_if
97 Input interface index
98
99 $Cflow::output_if
100 Output interface index
101
102 $Cflow::srcport
103 TCP/UDP src port number or equivalent
104
105 $Cflow::dstport
106 TCP/UDP dst port number or equivalent
107
108 $Cflow::ICMPType
109 high byte of $Cflow::dstport
110
111 Undefined if the current flow is not an ICMP flow.
112
113 $Cflow::ICMPCode
114 low byte of $Cflow::dstport
115
116 Undefined if the current flow is not an ICMP flow.
117
118 $Cflow::ICMPTypeCode
119 symbolic representation of $Cflow::dstport
120
121 The value is a the type-specific ICMP code, if any, followed by the
122 ICMP type. E.g.
123
124 ECHO
125 HOST_UNREACH
126
127 Undefined if the current flow is not an ICMP flow.
128
129 $Cflow::pkts
130 Packets sent in Duration
131
132 $Cflow::bytes
133 Octets sent in Duration
134
135 $Cflow::nexthop
136 Next hop router's IP Address as a host-ordered "long"
137
138 $Cflow::nexthopip
139 Next hop router's IP Address as a dotted-decimal string
140
141 $Cflow::startime
142 secs since epoch at start of flow
143
144 $Cflow::start_msecs
145 fractional portion of startime (in milliseconds)
146
147 This will be zero unless the source is flow-tools or argus.
148
149 $Cflow::endtime
150 secs since epoch at last packet of flow
151
152 $Cflow::end_msecs
153 fractional portion of endtime (in milliseconds)
154
155 This will be zero unless the source is flow-tools or argus.
156
157 $Cflow::protocol
158 IP protocol number (as is specified in /etc/protocols, i.e.
159 1=ICMP, 6=TCP, 17=UDP, etc.)
160
161 $Cflow::tos
162 IP Type-of-Service
163
164 $Cflow::tcp_flags
165 bitwise OR of all TCP flags that were set within packets in the
166 flow; 0x10 for non-TCP flows
167
168 $Cflow::TCPFlags
169 symbolic representation of $Cflow::tcp_flags The value will be a
170 bitwise-or expression. E.g.
171
172 PUSH|SYN|FIN|ACK
173
174 Undefined if the current flow is not a TCP flow.
175
176 $Cflow::raw
177 the entire "packed" flow record as read from the input file
178
179 This is useful when the "wanted" subroutine wants to write the flow
180 to another FILEHANDLE. E.g.:
181
182 syswrite(FILEHANDLE, $Cflow::raw, length $Cflow::raw)
183
184 Note that if you're using a modern version of perl that supports
185 PerlIO Layers, be sure that FILEHANDLE is using something
186 appropriate like the ":bytes" layer. This can be activated on
187 open, or with:
188
189 binmode(FILEHANDLE, ":bytes");
190
191 This will prevent the external LANG setting from causing perl to do
192 such things as interpreting your raw flow records as UTF-8
193 characters and corrupting the record.
194
195 $Cflow::reraw
196 the entire "re-packed" flow record formatted like $Cflow::raw.
197
198 This is useful when the "wanted" subroutine wants to write a
199 modified flow to another FILEHANDLE. E.g.:
200
201 $srcaddr = my_encode($srcaddr);
202 $dstaddr = my_encode($dstaddr);
203 syswrite(FILEHANDLE, $Cflow::reraw, length $Cflow::raw)
204
205 These flow variables are packed into $Cflow::reraw:
206
207 $Cflow::index, $Cflow::exporter,
208 $Cflow::srcaddr, $Cflow::dstaddr,
209 $Cflow::input_if, $Cflow::output_if,
210 $Cflow::srcport, $Cflow::dstport,
211 $Cflow::pkts, $Cflow::bytes,
212 $Cflow::nexthop,
213 $Cflow::startime, $Cflow::endtime,
214 $Cflow::protocol, $Cflow::tos,
215 $Cflow::src_as, $Cflow::dst_as,
216 $Cflow::src_mask, $Cflow::dst_mask,
217 $Cflow::tcp_flags,
218 $Cflow::engine_type, $Cflow::engine_id
219
220 Note that if you're using a modern version of perl that supports
221 PerlIO Layers, be sure that FILEHANDLE is using something
222 appropriate like the ":bytes" layer. This can be activated on
223 open, or with:
224
225 binmode(FILEHANDLE, ":bytes");
226
227 This will prevent the external LANG setting from causing perl to do
228 such things as interpreting your raw flow records as UTF-8
229 characters and corrupting the record.
230
231 $Cflow::Bps
232 the minimum bytes per second for the current flow
233
234 $Cflow::pps
235 the minimum packets per second for the current flow
236
237 The following variables are undefined if using NetFlow v1 (which does
238 not contain the requisite information):
239
240 $Cflow::src_as
241 originating or peer AS of source address
242
243 $Cflow::dst_as
244 originating or peer AS of destination address
245
246 The following variables are undefined if using NetFlow v1 or LFAPv4
247 (which do not contain the requisite information):
248
249 $Cflow::src_mask
250 source address prefix mask bits
251
252 $Cflow::dst_mask
253 destination address prefix mask bits
254
255 $Cflow::engine_type
256 type of flow switching engine
257
258 $Cflow::engine_id
259 ID of the flow switching engine
260
261 Optionally, a reference to a perfile() function can be passed to
262 Cflow::find as the argument following the reference to the wanted()
263 function. This perfile() function will be called once for each flow
264 file. The argument to the perfile() function will be name of the flow
265 file which is about to be processed. The purpose of the perfile()
266 function is to allow you to periodically report the progress of
267 Cflow::find() and to provide an opportunity to periodically reclaim
268 storage used by data objects that may have been allocated or maintained
269 by the wanted() function. For instance, when counting the number of
270 active hosts IP addresses in each time-stamped flow file, perfile() can
271 reset the counter to zero and clear the search tree or hash used to
272 remember those IP addresses.
273
274 Since Cflow is an Exporter, you can request that all those scalar flow
275 variables be exported (so that you need not use the "Cflow::" prefix):
276
277 use Cflow qw(:flowvars);
278
279 Also, you can request that the symbolic names for the TCP flags, ICMP
280 types, and/or ICMP codes be exported:
281
282 use Cflow qw(:tcpflags :icmptypes :icmpcodes);
283
284 The tcpflags are:
285
286 $TH_FIN $TH_SYN $TH_RST $TH_PUSH $TH_ACK $TH_URG
287
288 The icmptypes are:
289
290 $ICMP_ECHOREPLY $ICMP_DEST_UNREACH $ICMP_SOURCE_QUENCH
291 $ICMP_REDIRECT $ICMP_ECHO $ICMP_TIME_EXCEEDED
292 $ICMP_PARAMETERPROB $ICMP_TIMESTAMP $ICMP_TIMESTAMPREPLY
293 $ICMP_INFO_REQUEST $ICMP_INFO_REPLY $ICMP_ADDRESS
294 $ICMP_ADDRESSREPLY
295
296 The icmpcodes are:
297
298 $ICMP_NET_UNREACH $ICMP_HOST_UNREACH $ICMP_PROT_UNREACH
299 $ICMP_PORT_UNREACH $ICMP_FRAG_NEEDED $ICMP_SR_FAILED
300 $ICMP_NET_UNKNOWN $ICMP_HOST_UNKNOWN $ICMP_HOST_ISOLATED
301 $ICMP_NET_ANO $ICMP_HOST_ANO $ICMP_NET_UNR_TOS
302 $ICMP_HOST_UNR_TOS $ICMP_PKT_FILTERED $ICMP_PREC_VIOLATION
303 $ICMP_PREC_CUTOFF $ICMP_UNREACH $ICMP_REDIR_NET
304 $ICMP_REDIR_HOST $ICMP_REDIR_NETTOS $ICMP_REDIR_HOSTTOS
305 $ICMP_EXC_TTL $ICMP_EXC_FRAGTIME
306
307 Please note that the names above are not necessarily exactly the same
308 as the names of the flags, types, and codes as set in the values of the
309 aforemented $Cflow::TCPFlags and $Cflow::ICMPTypeCode flow variables.
310
311 Lastly, as is usually the case for modules, the subroutine names can be
312 imported, and a minimum version of Cflow can be specified:
313
314 use Cflow qw(:flowvars find verbose 1.031);
315
316 Cflow::find() returns a "hit-ratio". This hit-ratio is a string
317 formatted similarly to that of the value of a perl hash when taken in a
318 scalar context. This hit-ratio indicates ((# of "wanted" flows) / (#
319 of scanned flows)). A flow is considered to have been "wanted" if your
320 wanted() function returns non-zero.
321
322 Cflow::verbose() takes a single scalar boolean argument which indicates
323 whether or not you wish warning messages to be generated to STDERR when
324 "problems" occur. Verbose mode is set by default.
325
327 Here's a complete example with a sample wanted function. It will print
328 all UDP flows that involve either a source or destination port of 31337
329 and port on the other end that is unreserved (greater than 1024):
330
331 use Cflow qw(:flowvars find);
332
333 my $udp = getprotobyname('udp');
334 verbose(0);
335 find(\&wanted, @ARGV? @ARGV : <*.flows*>);
336
337 sub wanted {
338 return if ($srcport < 1024 || $dstport < 1024);
339 return unless (($srcport == 31337 || $dstport == 31337) &&
340 $udp == $protocol);
341
342 printf("%s %15.15s.%-5hu %15.15s.%-5hu %2hu %10u %10u\n",
343 $localtime,
344 $srcip,
345 $srcport,
346 $dstip,
347 $dstport,
348 $protocol,
349 $pkts,
350 $bytes)
351 }
352
353 Here's an example which demonstrates a technique which can be used to
354 pass arbitrary arguments to your wanted function by passing a reference
355 to an anonymous subroutine as the wanted() function argument to
356 Cflow::find():
357
358 sub wanted {
359 my @params = @_;
360 # ...
361 }
362
363 Cflow::find(sub { wanted(@params) }, @files);
364
366 Argus uses a bidirectional flow model. This means that some argus
367 flows represent packets not only in the forward direction (from
368 "source" to "destination"), but also in the reverse direction (from the
369 so-called "destination" to the "source"). However, this module uses a
370 unidirection flow model, and therfore splits some argus flows into two
371 unidirectional flows for the purpose of reporting.
372
373 Currently, using this module's API there is no way to determine if two
374 subsequently reported unidirectional flows were really a single argus
375 flow. This may be addressed in a future release of this package.
376
377 Furthermore, for argus flows which represent bidirectional ICMP
378 traffic, this module presumes that all the reverse packets were
379 ECHOREPLYs (sic). This is sometimes incorrect as described here:
380
381 http://www.theorygroup.com/Archive/Argus/2002/msg00016.html
382
383 and will be fixed in a future release of this package.
384
385 Timestamps ($startime and $endtime) are sometimes reported incorrectly
386 for bidirectional argus flows that represent only one packet in each
387 direction. This will be fixed in a future release.
388
389 Argus flows sometimes contain information which does not map directly
390 to the flow variables presented by this module. For the time being,
391 this information is simply not accessible through this module's API.
392 This may be addressed in a future release.
393
394 Lastly, argus flows produced from observed traffic on a local ethernet
395 do not contain enough information to meaningfully set the values of all
396 this module's flow variables. For instance, the next-hop and
397 input/output ifIndex numbers are missing. For the time being, all
398 argus flows accessed throught this module's API will have both the
399 $input_if and $output_if as 42. Althought 42 is the answer to life,
400 the universe, and everthing, in this context, it is just an arbitrary
401 number. It is important that $output_if is non-zero, however, since
402 existing FlowScan reports interpret an $output_if value of zero to mean
403 that the traffic represented by that flow was not forwarded (i.e.
404 dropped). For similar reasons, the $nexthopip for all argus flows is
405 reported as "127.0.0.1".
406
408 Currently, only NetFlow version 5 is supported when reading cflowd-
409 format raw flow files.
410
411 When built with support for flow-tools and attempting to read a cflowd
412 format raw flow file from standard input, you'll get the error:
413
414 open "-": No such file or directory
415
416 For the time being, the workaround is to write the content to a file
417 and read it from directly from there rather than from standard input.
418 (This happens because we can't close and re-open file descriptor zero
419 after determining that the content was not in flow-tools format.)
420
421 When built with support for flow-tools and using verbose mode,
422 Cflow::find will generate warnings if you process a cflowd format raw
423 flow file. This happens because it will first attempt to open the file
424 as a flow-tools format raw flow file (which will produce a warning
425 message), and then revert to handling it as cflowd format raw flow
426 file.
427
428 Likewise, when built with support for argus and attempting to read a
429 cflowd format raw flow file from standard input, you'll get this
430 warning message:
431
432 not Argus-2.0 data stream.
433
434 This is because argus (as of argus-2.0.4) doesn't seem to have a mode
435 in which such warning messages are supressed.
436
437 The $Cflow::raw flow variable contains the flow record in cflowd
438 format, even if it was read from a raw flow file produced by flow-tools
439 or argus. Because cflowd discards the fractional portion of the flow
440 start and end time, only the whole seconds portion of these times will
441 be retained. (That is, the raw record in $Cflow::raw does not contain
442 the $start_msecs and $end_msecs, so using $Cflow::raw to convert to
443 cflowd format is a lossy operation.)
444
445 When used with cflowd, Cflow::find() will generate warnings if the flow
446 data file is "invalid" as far as its concerned. To avoid this, you
447 must be using Cisco version 5 flow-export and configure cflowd so that
448 it saves all flow-export data. This is the default behavior when
449 cflowd produces time-stamped raw flow files after being patched as
450 described here:
451
452 http://net.doit.wisc.edu/~plonka/cflowd/
453
455 The interface presented by this package is a blatant ripoff of
456 File::Find.
457
459 Dave Plonka <plonka@doit.wisc.edu>
460
461 Copyright (C) 1998-2005 Dave Plonka. This program is free software;
462 you can redistribute it and/or modify it under the terms of the GNU
463 General Public License as published by the Free Software Foundation;
464 either version 2 of the License, or (at your option) any later version.
465
467 The version number is the module file RCS revision number ($Revision:
468 1.53 $) with the minor number printed right justified with leading
469 zeroes to 3 decimal places. For instance, RCS revision 1.1 would yield
470 a package version number of 1.001.
471
472 This is so that revision 1.10 (which is version 1.010), for example,
473 will test greater than revision 1.2 (which is version 1.002) when you
474 want to require a minimum version of this module.
475
477 perl(1), Socket, Net::Netmask, Net::Patricia.
478
479
480
481perl v5.34.0 2021-07-22 Cflow(3)