1Cflow(3)              User Contributed Perl Documentation             Cflow(3)
2
3
4

NAME

6       Cflow::find - find "interesting" flows in raw IP flow files
7

SYNOPSIS

9          use Cflow;
10
11          Cflow::verbose(1);
12          Cflow::find(\&wanted, <*.flows*>);
13
14          sub wanted { ... }
15
16       or:
17
18          Cflow::find(\&wanted, \&perfile, <*.flows*>);
19
20          sub perfile {
21             my $fname = shift;
22             ...
23          }
24

BACKROUND

26       This module implements an API for processing IP flow accounting infor‐
27       mation which as been collected from routers and written into flow files
28       by one of the various flow collectors listed below.
29
30       It was originally conceived and written for use by FlowScan:
31
32          http://net.doit.wisc.edu/~plonka/FlowScan/
33

Flow File Sources

35       This package is of little use on its own.  It requires input in the
36       form of time-stamped raw flow files produced by other software pack‐
37       ages.  These "flow sources" either snoop a local ethernet (via libpcap)
38       or collect flow information from IP routers that are configured to
39       export said information.  The following flow sources are supported:
40
41       argus by Carter Bullard:
42              http://www.qosient.com/argus/
43
44       flow-tools by Mark Fullmer (with NetFlow v1, v5, v6, or v7):
45              http://www.splintered.net/sw/flow-tools/
46
47       CAIDA's cflowd (with NetFlow v5):
48              http://www.caida.org/tools/measurement/cflowd/
49              http://net.doit.wisc.edu/~plonka/cflowd/
50
51       lfapd by Steve Premeau (with LFAPv4):
52              http://www.nmops.org/
53

DESCRIPTION

55       Cflow::find() will iterate across all the flows in the specified files.
56       It will call your wanted() function once per flow record.  If the file
57       name argument passed to find() is specified as "-", flows will be read
58       from standard input.
59
60       The wanted() function does whatever you write it to do.  For instance,
61       it could simply print interesting flows or it might maintain byte,
62       packet, and flow counters which could be written to a database after
63       the find subroutine completes.
64
65       Within your wanted() function, tests on the "current" flow can be per‐
66       formed using the following variables:
67
68       $Cflow::unix_secs
69           secs since epoch (deprecated)
70
71       $Cflow::exporter
72           Exporter IP Address as a host-ordered "long"
73
74       $Cflow::exporterip
75           Exporter IP Address as dotted-decimal string
76
77       $Cflow::localtime
78           $Cflow::unix_secs interpreted as localtime with this strftime(3)
79           format:
80
81              %Y/%m/%d %H:%M:%S
82
83       $Cflow::srcaddr
84           Source IP Address as a host-ordered "long"
85
86       $Cflow::srcip
87           Source IP Address as a dotted-decimal string
88
89       $Cflow::dstaddr
90           Destination IP Address as a host-ordered "long"
91
92       $Cflow::dstip
93           Destination IP Address as a dotted-decimal string
94
95       $Cflow::input_if
96           Input interface index
97
98       $Cflow::output_if
99           Output interface index
100
101       $Cflow::srcport
102           TCP/UDP src port number or equivalent
103
104       $Cflow::dstport
105           TCP/UDP dst port number or equivalent
106
107       $Cflow::ICMPType
108           high byte of $Cflow::dstport
109
110           Undefined if the current flow is not an ICMP flow.
111
112       $Cflow::ICMPCode
113           low byte of $Cflow::dstport
114
115           Undefined if the current flow is not an ICMP flow.
116
117       $Cflow::ICMPTypeCode
118           symbolic representation of $Cflow::dstport
119
120           The value is a the type-specific ICMP code, if any, followed by the
121           ICMP type.  E.g.
122
123              ECHO
124              HOST_UNREACH
125
126           Undefined if the current flow is not an ICMP flow.
127
128       $Cflow::pkts
129           Packets sent in Duration
130
131       $Cflow::bytes
132           Octets sent in Duration
133
134       $Cflow::nexthop
135           Next hop router's IP Address as a host-ordered "long"
136
137       $Cflow::nexthopip
138           Next hop router's IP Address as a dotted-decimal string
139
140       $Cflow::startime
141           secs since epoch at start of flow
142
143       $Cflow::start_msecs
144           fractional portion of startime (in milliseconds)
145
146           This will be zero unless the source is flow-tools or argus.
147
148       $Cflow::endtime
149           secs since epoch at last packet of flow
150
151       $Cflow::end_msecs
152           fractional portion of endtime (in milliseconds)
153
154           This will be zero unless the source is flow-tools or argus.
155
156       $Cflow::protocol
157           IP protocol number (as is specified in /etc/protocols, i.e.
158           1=ICMP, 6=TCP, 17=UDP, etc.)
159
160       $Cflow::tos
161           IP Type-of-Service
162
163       $Cflow::tcp_flags
164           bitwise OR of all TCP flags that were set within packets in the
165           flow; 0x10 for non-TCP flows
166
167       $Cflow::TCPFlags
168           symbolic representation of $Cflow::tcp_flags The value will be a
169           bitwise-or expression.  E.g.
170
171              PUSH⎪SYN⎪FIN⎪ACK
172
173           Undefined if the current flow is not a TCP flow.
174
175       $Cflow::raw
176           the entire "packed" flow record as read from the input file
177
178           This is useful when the "wanted" subroutine wants to write the flow
179           to another FILEHANDLE. E.g.:
180
181              syswrite(FILEHANDLE, $Cflow::raw, length $Cflow::raw)
182
183           Note that if you're using a modern version of perl that supports
184           PerlIO Layers, be sure that FILEHANDLE is using something appropri‐
185           ate like the ":bytes" layer.  This can be activated on open, or
186           with:
187
188              binmode(FILEHANDLE, ":bytes");
189
190           This will prevent the external LANG setting from causing perl to do
191           such things as interpreting your raw flow records as UTF-8 charac‐
192           ters and corrupting the record.
193
194       $Cflow::reraw
195           the entire "re-packed" flow record formatted like $Cflow::raw.
196
197           This is useful when the "wanted" subroutine wants to write a modi‐
198           fied flow to another FILEHANDLE. E.g.:
199
200              $srcaddr = my_encode($srcaddr);
201              $dstaddr = my_encode($dstaddr);
202              syswrite(FILEHANDLE, $Cflow::reraw, length $Cflow::raw)
203
204           These flow variables are packed into $Cflow::reraw:
205
206              $Cflow::index, $Cflow::exporter,
207              $Cflow::srcaddr, $Cflow::dstaddr,
208              $Cflow::input_if, $Cflow::output_if,
209              $Cflow::srcport, $Cflow::dstport,
210              $Cflow::pkts, $Cflow::bytes,
211              $Cflow::nexthop,
212              $Cflow::startime, $Cflow::endtime,
213              $Cflow::protocol, $Cflow::tos,
214              $Cflow::src_as, $Cflow::dst_as,
215              $Cflow::src_mask, $Cflow::dst_mask,
216              $Cflow::tcp_flags,
217              $Cflow::engine_type, $Cflow::engine_id
218
219           Note that if you're using a modern version of perl that supports
220           PerlIO Layers, be sure that FILEHANDLE is using something appropri‐
221           ate like the ":bytes" layer.  This can be activated on open, or
222           with:
223
224              binmode(FILEHANDLE, ":bytes");
225
226           This will prevent the external LANG setting from causing perl to do
227           such things as interpreting your raw flow records as UTF-8 charac‐
228           ters and corrupting the record.
229
230       $Cflow::Bps
231           the minimum bytes per second for the current flow
232
233       $Cflow::pps
234           the minimum packets per second for the current flow
235
236       The following variables are undefined if using NetFlow v1 (which does
237       not contain the requisite information):
238
239       $Cflow::src_as
240           originating or peer AS of source address
241
242       $Cflow::dst_as
243           originating or peer AS of destination address
244
245       The following variables are undefined if using NetFlow v1 or LFAPv4
246       (which do not contain the requisite information):
247
248       $Cflow::src_mask
249           source address prefix mask bits
250
251       $Cflow::dst_mask
252           destination address prefix mask bits
253
254       $Cflow::engine_type
255           type of flow switching engine
256
257       $Cflow::engine_id
258           ID of the flow switching engine
259
260       Optionally, a reference to a perfile() function can be passed to
261       Cflow::find as the argument following the reference to the wanted()
262       function.  This perfile() function will be called once for each flow
263       file.  The argument to the perfile() function will be name of the flow
264       file which is about to be processed.  The purpose of the perfile()
265       function is to allow you to periodically report the progress of
266       Cflow::find() and to provide an opportunity to periodically reclaim
267       storage used by data objects that may have been allocated or maintained
268       by the wanted() function.  For instance, when counting the number of
269       active hosts IP addresses in each time-stamped flow file, perfile() can
270       reset the counter to zero and clear the search tree or hash used to
271       remember those IP addresses.
272
273       Since Cflow is an Exporter, you can request that all those scalar flow
274       variables be exported (so that you need not use the "Cflow::" prefix):
275
276          use Cflow qw(:flowvars);
277
278       Also, you can request that the symbolic names for the TCP flags, ICMP
279       types, and/or ICMP codes be exported:
280
281          use Cflow qw(:tcpflags :icmptypes :icmpcodes);
282
283       The tcpflags are:
284
285          $TH_FIN $TH_SYN $TH_RST $TH_PUSH $TH_ACK $TH_URG
286
287       The icmptypes are:
288
289          $ICMP_ECHOREPLY     $ICMP_DEST_UNREACH $ICMP_SOURCE_QUENCH
290          $ICMP_REDIRECT      $ICMP_ECHO         $ICMP_TIME_EXCEEDED
291          $ICMP_PARAMETERPROB $ICMP_TIMESTAMP    $ICMP_TIMESTAMPREPLY
292          $ICMP_INFO_REQUEST  $ICMP_INFO_REPLY   $ICMP_ADDRESS
293          $ICMP_ADDRESSREPLY
294
295       The icmpcodes are:
296
297          $ICMP_NET_UNREACH  $ICMP_HOST_UNREACH $ICMP_PROT_UNREACH
298          $ICMP_PORT_UNREACH $ICMP_FRAG_NEEDED  $ICMP_SR_FAILED
299          $ICMP_NET_UNKNOWN  $ICMP_HOST_UNKNOWN $ICMP_HOST_ISOLATED
300          $ICMP_NET_ANO      $ICMP_HOST_ANO     $ICMP_NET_UNR_TOS
301          $ICMP_HOST_UNR_TOS $ICMP_PKT_FILTERED $ICMP_PREC_VIOLATION
302          $ICMP_PREC_CUTOFF  $ICMP_UNREACH      $ICMP_REDIR_NET
303          $ICMP_REDIR_HOST   $ICMP_REDIR_NETTOS $ICMP_REDIR_HOSTTOS
304          $ICMP_EXC_TTL      $ICMP_EXC_FRAGTIME
305
306       Please note that the names above are not necessarily exactly the same
307       as the names of the flags, types, and codes as set in the values of the
308       aforemented $Cflow::TCPFlags and $Cflow::ICMPTypeCode flow variables.
309
310       Lastly, as is usually the case for modules, the subroutine names can be
311       imported, and a minimum version of Cflow can be specified:
312
313          use Cflow qw(:flowvars find verbose 1.031);
314
315       Cflow::find() returns a "hit-ratio".  This hit-ratio is a string for‐
316       matted similarly to that of the value of a perl hash when taken in a
317       scalar context.  This hit-ratio indicates ((# of "wanted" flows) / (#
318       of scanned flows)).  A flow is considered to have been "wanted" if your
319       wanted() function returns non-zero.
320
321       Cflow::verbose() takes a single scalar boolean argument which indicates
322       whether or not you wish warning messages to be generated to STDERR when
323       "problems" occur.  Verbose mode is set by default.
324

EXAMPLES

326       Here's a complete example with a sample wanted function.  It will print
327       all UDP flows that involve either a source or destination port of 31337
328       and port on the other end that is unreserved (greater than 1024):
329
330          use Cflow qw(:flowvars find);
331
332          my $udp = getprotobyname('udp');
333          verbose(0);
334          find(\&wanted, @ARGV? @ARGV : <*.flows*>);
335
336          sub wanted {
337             return if ($srcport < 1024 ⎪⎪ $dstport < 1024);
338             return unless (($srcport == 31337 ⎪⎪ $dstport == 31337) &&
339                             $udp == $protocol);
340
341             printf("%s %15.15s.%-5hu %15.15s.%-5hu %2hu %10u %10u\n",
342                    $localtime,
343                    $srcip,
344                    $srcport,
345                    $dstip,
346                    $dstport,
347                    $protocol,
348                    $pkts,
349                    $bytes)
350          }
351
352       Here's an example which demonstrates a technique which can be used to
353       pass arbitrary arguments to your wanted function by passing a reference
354       to an anonymous subroutine as the wanted() function argument to
355       Cflow::find():
356
357          sub wanted {
358             my @params = @_;
359             # ...
360          }
361
362          Cflow::find(sub { wanted(@params) }, @files);
363

ARGUS NOTES

365       Argus uses a bidirectional flow model.  This means that some argus
366       flows represent packets not only in the forward direction (from
367       "source" to "destination"), but also in the reverse direction (from the
368       so-called "destination" to the "source").  However, this module uses a
369       unidirection flow model, and therfore splits some argus flows into two
370       unidirectional flows for the purpose of reporting.
371
372       Currently, using this module's API there is no way to determine if two
373       subsequently reported unidirectional flows were really a single argus
374       flow.  This may be addressed in a future release of this package.
375
376       Furthermore, for argus flows which represent bidirectional ICMP traf‐
377       fic, this module presumes that all the reverse packets were ECHOREPLYs
378       (sic).  This is sometimes incorrect as described here:
379
380          http://www.theorygroup.com/Archive/Argus/2002/msg00016.html
381
382       and will be fixed in a future release of this package.
383
384       Timestamps ($startime and $endtime) are sometimes reported incorrectly
385       for bidirectional argus flows that represent only one packet in each
386       direction.  This will be fixed in a future release.
387
388       Argus flows sometimes contain information which does not map directly
389       to the flow variables presented by this module.  For the time being,
390       this information is simply not accessible through this module's API.
391       This may be addressed in a future release.
392
393       Lastly, argus flows produced from observed traffic on a local ethernet
394       do not contain enough information to meaningfully set the values of all
395       this module's flow variables.  For instance, the next-hop and
396       input/output ifIndex numbers are missing.  For the time being, all
397       argus flows accessed throught this module's API will have both the
398       $input_if and $output_if as 42.  Althought 42 is the answer to life,
399       the universe, and everthing, in this context, it is just an arbitrary
400       number.  It is important that $output_if is non-zero, however, since
401       existing FlowScan reports interpret an $output_if value of zero to mean
402       that the traffic represented by that flow was not forwarded (i.e.
403       dropped).  For similar reasons, the $nexthopip for all argus flows is
404       reported as "127.0.0.1".
405

BUGS

407       Currently, only NetFlow version 5 is supported when reading cflowd-for‐
408       mat raw flow files.
409
410       When built with support for flow-tools and attempting to read a cflowd
411       format raw flow file from standard input, you'll get the error:
412
413          open "-": No such file or directory
414
415       For the time being, the workaround is to write the content to a file
416       and read it from directly from there rather than from standard input.
417       (This happens because we can't close and re-open file descriptor zero
418       after determining that the content was not in flow-tools format.)
419
420       When built with support for flow-tools and using verbose mode,
421       Cflow::find will generate warnings if you process a cflowd format raw
422       flow file.  This happens because it will first attempt to open the file
423       as a flow-tools format raw flow file (which will produce a warning mes‐
424       sage), and then revert to handling it as cflowd format raw flow file.
425
426       Likewise, when built with support for argus and attempting to read a
427       cflowd format raw flow file from standard input, you'll get this warn‐
428       ing message:
429
430          not Argus-2.0 data stream.
431
432       This is because argus (as of argus-2.0.4) doesn't seem to have a mode
433       in which such warning messages are supressed.
434
435       The $Cflow::raw flow variable contains the flow record in cflowd for‐
436       mat, even if it was read from a raw flow file produced by flow-tools or
437       argus.  Because cflowd discards the fractional portion of the flow
438       start and end time, only the whole seconds portion of these times will
439       be retained.  (That is, the raw record in $Cflow::raw does not contain
440       the $start_msecs and $end_msecs, so using $Cflow::raw to convert to
441       cflowd format is a lossy operation.)
442
443       When used with cflowd, Cflow::find() will generate warnings if the flow
444       data file is "invalid" as far as its concerned.  To avoid this, you
445       must be using Cisco version 5 flow-export and configure cflowd so that
446       it saves all flow-export data.  This is the default behavior when
447       cflowd produces time-stamped raw flow files after being patched as
448       described here:
449
450          http://net.doit.wisc.edu/~plonka/cflowd/
451

NOTES

453       The interface presented by this package is a blatant ripoff of
454       File::Find.
455

AUTHOR

457       Dave Plonka <plonka@doit.wisc.edu>
458
459       Copyright (C) 1998-2005  Dave Plonka.  This program is free software;
460       you can redistribute it and/or modify it under the terms of the GNU
461       General Public License as published by the Free Software Foundation;
462       either version 2 of the License, or (at your option) any later version.
463

VERSION

465       The version number is the module file RCS revision number ($Revision:
466       1.53 $) with the minor number printed right justified with leading
467       zeroes to 3 decimal places.  For instance, RCS revision 1.1 would yield
468       a package version number of 1.001.
469
470       This is so that revision 1.10 (which is version 1.010), for example,
471       will test greater than revision 1.2 (which is version 1.002) when you
472       want to require a minimum version of this module.
473

SEE ALSO

475       perl(1), Socket, Net::Netmask, Net::Patricia.
476
477
478
479perl v5.8.8                       2005-09-28                          Cflow(3)
Impressum