1Cflow(3)              User Contributed Perl Documentation             Cflow(3)
2
3
4

NAME

6       Cflow::find - find "interesting" flows in raw IP flow files
7

SYNOPSIS

9          use Cflow;
10
11          Cflow::verbose(1);
12          Cflow::find(\&wanted, <*.flows*>);
13
14          sub wanted { ... }
15
16       or:
17
18          Cflow::find(\&wanted, \&perfile, <*.flows*>);
19
20          sub perfile {
21             my $fname = shift;
22             ...
23          }
24

BACKROUND

26       This module implements an API for processing IP flow accounting
27       information which as been collected from routers and written into flow
28       files by one of the various flow collectors listed below.
29
30       It was originally conceived and written for use by FlowScan:
31
32          http://net.doit.wisc.edu/~plonka/FlowScan/
33

Flow File Sources

35       This package is of little use on its own.  It requires input in the
36       form of time-stamped raw flow files produced by other software
37       packages.  These "flow sources" either snoop a local ethernet (via
38       libpcap) or collect flow information from IP routers that are
39       configured to export said information.  The following flow sources are
40       supported:
41
42       argus by Carter Bullard:
43              http://www.qosient.com/argus/
44
45       flow-tools by Mark Fullmer (with NetFlow v1, v5, v6, or v7):
46              http://www.splintered.net/sw/flow-tools/
47
48       CAIDA's cflowd (with NetFlow v5):
49              http://www.caida.org/tools/measurement/cflowd/
50              http://net.doit.wisc.edu/~plonka/cflowd/
51
52       lfapd by Steve Premeau (with LFAPv4):
53              http://www.nmops.org/
54

DESCRIPTION

56       Cflow::find() will iterate across all the flows in the specified files.
57       It will call your wanted() function once per flow record.  If the file
58       name argument passed to find() is specified as "-", flows will be read
59       from standard input.
60
61       The wanted() function does whatever you write it to do.  For instance,
62       it could simply print interesting flows or it might maintain byte,
63       packet, and flow counters which could be written to a database after
64       the find subroutine completes.
65
66       Within your wanted() function, tests on the "current" flow can be
67       performed using the following variables:
68
69       $Cflow::unix_secs
70           secs since epoch (deprecated)
71
72       $Cflow::exporter
73           Exporter IP Address as a host-ordered "long"
74
75       $Cflow::exporterip
76           Exporter IP Address as dotted-decimal string
77
78       $Cflow::localtime
79           $Cflow::unix_secs interpreted as localtime with this strftime(3)
80           format:
81
82              %Y/%m/%d %H:%M:%S
83
84       $Cflow::srcaddr
85           Source IP Address as a host-ordered "long"
86
87       $Cflow::srcip
88           Source IP Address as a dotted-decimal string
89
90       $Cflow::dstaddr
91           Destination IP Address as a host-ordered "long"
92
93       $Cflow::dstip
94           Destination IP Address as a dotted-decimal string
95
96       $Cflow::input_if
97           Input interface index
98
99       $Cflow::output_if
100           Output interface index
101
102       $Cflow::srcport
103           TCP/UDP src port number or equivalent
104
105       $Cflow::dstport
106           TCP/UDP dst port number or equivalent
107
108       $Cflow::ICMPType
109           high byte of $Cflow::dstport
110
111           Undefined if the current flow is not an ICMP flow.
112
113       $Cflow::ICMPCode
114           low byte of $Cflow::dstport
115
116           Undefined if the current flow is not an ICMP flow.
117
118       $Cflow::ICMPTypeCode
119           symbolic representation of $Cflow::dstport
120
121           The value is a the type-specific ICMP code, if any, followed by the
122           ICMP type.  E.g.
123
124              ECHO
125              HOST_UNREACH
126
127           Undefined if the current flow is not an ICMP flow.
128
129       $Cflow::pkts
130           Packets sent in Duration
131
132       $Cflow::bytes
133           Octets sent in Duration
134
135       $Cflow::nexthop
136           Next hop router's IP Address as a host-ordered "long"
137
138       $Cflow::nexthopip
139           Next hop router's IP Address as a dotted-decimal string
140
141       $Cflow::startime
142           secs since epoch at start of flow
143
144       $Cflow::start_msecs
145           fractional portion of startime (in milliseconds)
146
147           This will be zero unless the source is flow-tools or argus.
148
149       $Cflow::endtime
150           secs since epoch at last packet of flow
151
152       $Cflow::end_msecs
153           fractional portion of endtime (in milliseconds)
154
155           This will be zero unless the source is flow-tools or argus.
156
157       $Cflow::protocol
158           IP protocol number (as is specified in /etc/protocols, i.e.
159           1=ICMP, 6=TCP, 17=UDP, etc.)
160
161       $Cflow::tos
162           IP Type-of-Service
163
164       $Cflow::tcp_flags
165           bitwise OR of all TCP flags that were set within packets in the
166           flow; 0x10 for non-TCP flows
167
168       $Cflow::TCPFlags
169           symbolic representation of $Cflow::tcp_flags The value will be a
170           bitwise-or expression.  E.g.
171
172              PUSH|SYN|FIN|ACK
173
174           Undefined if the current flow is not a TCP flow.
175
176       $Cflow::raw
177           the entire "packed" flow record as read from the input file
178
179           This is useful when the "wanted" subroutine wants to write the flow
180           to another FILEHANDLE. E.g.:
181
182              syswrite(FILEHANDLE, $Cflow::raw, length $Cflow::raw)
183
184           Note that if you're using a modern version of perl that supports
185           PerlIO Layers, be sure that FILEHANDLE is using something
186           appropriate like the ":bytes" layer.  This can be activated on
187           open, or with:
188
189              binmode(FILEHANDLE, ":bytes");
190
191           This will prevent the external LANG setting from causing perl to do
192           such things as interpreting your raw flow records as UTF-8
193           characters and corrupting the record.
194
195       $Cflow::reraw
196           the entire "re-packed" flow record formatted like $Cflow::raw.
197
198           This is useful when the "wanted" subroutine wants to write a
199           modified flow to another FILEHANDLE. E.g.:
200
201              $srcaddr = my_encode($srcaddr);
202              $dstaddr = my_encode($dstaddr);
203              syswrite(FILEHANDLE, $Cflow::reraw, length $Cflow::raw)
204
205           These flow variables are packed into $Cflow::reraw:
206
207              $Cflow::index, $Cflow::exporter,
208              $Cflow::srcaddr, $Cflow::dstaddr,
209              $Cflow::input_if, $Cflow::output_if,
210              $Cflow::srcport, $Cflow::dstport,
211              $Cflow::pkts, $Cflow::bytes,
212              $Cflow::nexthop,
213              $Cflow::startime, $Cflow::endtime,
214              $Cflow::protocol, $Cflow::tos,
215              $Cflow::src_as, $Cflow::dst_as,
216              $Cflow::src_mask, $Cflow::dst_mask,
217              $Cflow::tcp_flags,
218              $Cflow::engine_type, $Cflow::engine_id
219
220           Note that if you're using a modern version of perl that supports
221           PerlIO Layers, be sure that FILEHANDLE is using something
222           appropriate like the ":bytes" layer.  This can be activated on
223           open, or with:
224
225              binmode(FILEHANDLE, ":bytes");
226
227           This will prevent the external LANG setting from causing perl to do
228           such things as interpreting your raw flow records as UTF-8
229           characters and corrupting the record.
230
231       $Cflow::Bps
232           the minimum bytes per second for the current flow
233
234       $Cflow::pps
235           the minimum packets per second for the current flow
236
237       The following variables are undefined if using NetFlow v1 (which does
238       not contain the requisite information):
239
240       $Cflow::src_as
241           originating or peer AS of source address
242
243       $Cflow::dst_as
244           originating or peer AS of destination address
245
246       The following variables are undefined if using NetFlow v1 or LFAPv4
247       (which do not contain the requisite information):
248
249       $Cflow::src_mask
250           source address prefix mask bits
251
252       $Cflow::dst_mask
253           destination address prefix mask bits
254
255       $Cflow::engine_type
256           type of flow switching engine
257
258       $Cflow::engine_id
259           ID of the flow switching engine
260
261       Optionally, a reference to a perfile() function can be passed to
262       Cflow::find as the argument following the reference to the wanted()
263       function.  This perfile() function will be called once for each flow
264       file.  The argument to the perfile() function will be name of the flow
265       file which is about to be processed.  The purpose of the perfile()
266       function is to allow you to periodically report the progress of
267       Cflow::find() and to provide an opportunity to periodically reclaim
268       storage used by data objects that may have been allocated or maintained
269       by the wanted() function.  For instance, when counting the number of
270       active hosts IP addresses in each time-stamped flow file, perfile() can
271       reset the counter to zero and clear the search tree or hash used to
272       remember those IP addresses.
273
274       Since Cflow is an Exporter, you can request that all those scalar flow
275       variables be exported (so that you need not use the "Cflow::" prefix):
276
277          use Cflow qw(:flowvars);
278
279       Also, you can request that the symbolic names for the TCP flags, ICMP
280       types, and/or ICMP codes be exported:
281
282          use Cflow qw(:tcpflags :icmptypes :icmpcodes);
283
284       The tcpflags are:
285
286          $TH_FIN $TH_SYN $TH_RST $TH_PUSH $TH_ACK $TH_URG
287
288       The icmptypes are:
289
290          $ICMP_ECHOREPLY     $ICMP_DEST_UNREACH $ICMP_SOURCE_QUENCH
291          $ICMP_REDIRECT      $ICMP_ECHO         $ICMP_TIME_EXCEEDED
292          $ICMP_PARAMETERPROB $ICMP_TIMESTAMP    $ICMP_TIMESTAMPREPLY
293          $ICMP_INFO_REQUEST  $ICMP_INFO_REPLY   $ICMP_ADDRESS
294          $ICMP_ADDRESSREPLY
295
296       The icmpcodes are:
297
298          $ICMP_NET_UNREACH  $ICMP_HOST_UNREACH $ICMP_PROT_UNREACH
299          $ICMP_PORT_UNREACH $ICMP_FRAG_NEEDED  $ICMP_SR_FAILED
300          $ICMP_NET_UNKNOWN  $ICMP_HOST_UNKNOWN $ICMP_HOST_ISOLATED
301          $ICMP_NET_ANO      $ICMP_HOST_ANO     $ICMP_NET_UNR_TOS
302          $ICMP_HOST_UNR_TOS $ICMP_PKT_FILTERED $ICMP_PREC_VIOLATION
303          $ICMP_PREC_CUTOFF  $ICMP_UNREACH      $ICMP_REDIR_NET
304          $ICMP_REDIR_HOST   $ICMP_REDIR_NETTOS $ICMP_REDIR_HOSTTOS
305          $ICMP_EXC_TTL      $ICMP_EXC_FRAGTIME
306
307       Please note that the names above are not necessarily exactly the same
308       as the names of the flags, types, and codes as set in the values of the
309       aforemented $Cflow::TCPFlags and $Cflow::ICMPTypeCode flow variables.
310
311       Lastly, as is usually the case for modules, the subroutine names can be
312       imported, and a minimum version of Cflow can be specified:
313
314          use Cflow qw(:flowvars find verbose 1.031);
315
316       Cflow::find() returns a "hit-ratio".  This hit-ratio is a string
317       formatted similarly to that of the value of a perl hash when taken in a
318       scalar context.  This hit-ratio indicates ((# of "wanted" flows) / (#
319       of scanned flows)).  A flow is considered to have been "wanted" if your
320       wanted() function returns non-zero.
321
322       Cflow::verbose() takes a single scalar boolean argument which indicates
323       whether or not you wish warning messages to be generated to STDERR when
324       "problems" occur.  Verbose mode is set by default.
325

EXAMPLES

327       Here's a complete example with a sample wanted function.  It will print
328       all UDP flows that involve either a source or destination port of 31337
329       and port on the other end that is unreserved (greater than 1024):
330
331          use Cflow qw(:flowvars find);
332
333          my $udp = getprotobyname('udp');
334          verbose(0);
335          find(\&wanted, @ARGV? @ARGV : <*.flows*>);
336
337          sub wanted {
338             return if ($srcport < 1024 || $dstport < 1024);
339             return unless (($srcport == 31337 || $dstport == 31337) &&
340                             $udp == $protocol);
341
342             printf("%s %15.15s.%-5hu %15.15s.%-5hu %2hu %10u %10u\n",
343                    $localtime,
344                    $srcip,
345                    $srcport,
346                    $dstip,
347                    $dstport,
348                    $protocol,
349                    $pkts,
350                    $bytes)
351          }
352
353       Here's an example which demonstrates a technique which can be used to
354       pass arbitrary arguments to your wanted function by passing a reference
355       to an anonymous subroutine as the wanted() function argument to
356       Cflow::find():
357
358          sub wanted {
359             my @params = @_;
360             # ...
361          }
362
363          Cflow::find(sub { wanted(@params) }, @files);
364

ARGUS NOTES

366       Argus uses a bidirectional flow model.  This means that some argus
367       flows represent packets not only in the forward direction (from
368       "source" to "destination"), but also in the reverse direction (from the
369       so-called "destination" to the "source").  However, this module uses a
370       unidirection flow model, and therfore splits some argus flows into two
371       unidirectional flows for the purpose of reporting.
372
373       Currently, using this module's API there is no way to determine if two
374       subsequently reported unidirectional flows were really a single argus
375       flow.  This may be addressed in a future release of this package.
376
377       Furthermore, for argus flows which represent bidirectional ICMP
378       traffic, this module presumes that all the reverse packets were
379       ECHOREPLYs (sic).  This is sometimes incorrect as described here:
380
381          http://www.theorygroup.com/Archive/Argus/2002/msg00016.html
382
383       and will be fixed in a future release of this package.
384
385       Timestamps ($startime and $endtime) are sometimes reported incorrectly
386       for bidirectional argus flows that represent only one packet in each
387       direction.  This will be fixed in a future release.
388
389       Argus flows sometimes contain information which does not map directly
390       to the flow variables presented by this module.  For the time being,
391       this information is simply not accessible through this module's API.
392       This may be addressed in a future release.
393
394       Lastly, argus flows produced from observed traffic on a local ethernet
395       do not contain enough information to meaningfully set the values of all
396       this module's flow variables.  For instance, the next-hop and
397       input/output ifIndex numbers are missing.  For the time being, all
398       argus flows accessed throught this module's API will have both the
399       $input_if and $output_if as 42.  Althought 42 is the answer to life,
400       the universe, and everthing, in this context, it is just an arbitrary
401       number.  It is important that $output_if is non-zero, however, since
402       existing FlowScan reports interpret an $output_if value of zero to mean
403       that the traffic represented by that flow was not forwarded (i.e.
404       dropped).  For similar reasons, the $nexthopip for all argus flows is
405       reported as "127.0.0.1".
406

BUGS

408       Currently, only NetFlow version 5 is supported when reading cflowd-
409       format raw flow files.
410
411       When built with support for flow-tools and attempting to read a cflowd
412       format raw flow file from standard input, you'll get the error:
413
414          open "-": No such file or directory
415
416       For the time being, the workaround is to write the content to a file
417       and read it from directly from there rather than from standard input.
418       (This happens because we can't close and re-open file descriptor zero
419       after determining that the content was not in flow-tools format.)
420
421       When built with support for flow-tools and using verbose mode,
422       Cflow::find will generate warnings if you process a cflowd format raw
423       flow file.  This happens because it will first attempt to open the file
424       as a flow-tools format raw flow file (which will produce a warning
425       message), and then revert to handling it as cflowd format raw flow
426       file.
427
428       Likewise, when built with support for argus and attempting to read a
429       cflowd format raw flow file from standard input, you'll get this
430       warning message:
431
432          not Argus-2.0 data stream.
433
434       This is because argus (as of argus-2.0.4) doesn't seem to have a mode
435       in which such warning messages are supressed.
436
437       The $Cflow::raw flow variable contains the flow record in cflowd
438       format, even if it was read from a raw flow file produced by flow-tools
439       or argus.  Because cflowd discards the fractional portion of the flow
440       start and end time, only the whole seconds portion of these times will
441       be retained.  (That is, the raw record in $Cflow::raw does not contain
442       the $start_msecs and $end_msecs, so using $Cflow::raw to convert to
443       cflowd format is a lossy operation.)
444
445       When used with cflowd, Cflow::find() will generate warnings if the flow
446       data file is "invalid" as far as its concerned.  To avoid this, you
447       must be using Cisco version 5 flow-export and configure cflowd so that
448       it saves all flow-export data.  This is the default behavior when
449       cflowd produces time-stamped raw flow files after being patched as
450       described here:
451
452          http://net.doit.wisc.edu/~plonka/cflowd/
453

NOTES

455       The interface presented by this package is a blatant ripoff of
456       File::Find.
457

AUTHOR

459       Dave Plonka <plonka@doit.wisc.edu>
460
461       Copyright (C) 1998-2005  Dave Plonka.  This program is free software;
462       you can redistribute it and/or modify it under the terms of the GNU
463       General Public License as published by the Free Software Foundation;
464       either version 2 of the License, or (at your option) any later version.
465

VERSION

467       The version number is the module file RCS revision number ($Revision:
468       1.53 $) with the minor number printed right justified with leading
469       zeroes to 3 decimal places.  For instance, RCS revision 1.1 would yield
470       a package version number of 1.001.
471
472       This is so that revision 1.10 (which is version 1.010), for example,
473       will test greater than revision 1.2 (which is version 1.002) when you
474       want to require a minimum version of this module.
475

SEE ALSO

477       perl(1), Socket, Net::Netmask, Net::Patricia.
478
479
480
481perl v5.28.1                      2005-09-28                          Cflow(3)
Impressum