rabins(1)

1RABINS(1)                   General Commands Manual                  RABINS(1)
2
3
4

NAME

6       rabins - process argus(8) data within specified bins.
7

SYNOPSIS

9       rabins [-B secs] -M splitmode [options]] [raoptions] [-- filter-expres‐
10       sion]
11

DESCRIPTION

13       Rabins reads argus data from an argus-data source, and adjusts the data
14       so  that  it  is  aligned to a set of bins, or slots, that are based on
15       either time, input size, or count.  The resulting output is split, mod‐
16       ified,  and  optionally  aggregated  so  that the data fits to the con‐
17       straints of the specified bins.  rabins is designed to be a combination
18       of rasplit and racluster, acting on multiple contexts of argus data.
19
20       The  principal function of rabins is to align input data to a series of
21       bins, and then process the data within the context of each  bin.   This
22       is the basis for real-time stream block processing.  Time series stream
23       block processing is cricital for flow data graphing, comparing, analyz‐
24       ing, and correlation.  Fixed load stream block processing, based on the
25       number of argus data records ('count'),  or  a  fixed  volume  of  data
26       ('size')  allows  for  control  of resources in processing.  While load
27       based options are very useful,  they  are  rather  esoteric.   See  the
28       online  examples  and  rasplit.1  for  examples of using these modes of
29       operation.
30
31

Time Series Bins

33       Time series bin'ing is specified using the -M time option.   Time  bins
34       are  specified by the size and granularity of the time bin.  The granu‐
35       larity, 's'econds, 'm'inutes, 'h'ours, 'd'ays, 'w'eeks,  'M'onths,  and
36       'y'ears,  dictates  where  the bin boundaries lie.  To ensure that 0.5d
37       and 12h start on the same point in time, second, minute, hour, and  day
38       based bins start at midnight, Jan 1st of the year of processing.  Week,
39       month and year bins all start  on  natural  time  boundaries,  for  the
40       period.
41
42       rabins  provides  a  separate  processing context for each bin, so that
43       aggregation and sorting occur only within  the  context  of  each  time
44       period.   Records are placed into bins based on load or time.  For load
45       based bins, input records are processed in received order and  are  not
46       modified.  When  using  time  based  bins, records are placed into bins
47       based on the starting time of the record.   By  default,  records  that
48       span  a  time  boundary are split into as many records as needed to fit
49       the record into appropriate bin sizes, using  the  algorithms  used  by
50       rasplit.1.   Metrics are distributed uniformly within all the appropri‐
51       ate bins. The result is a series of data and/or fragments that are time
52       aligned, appropriate for time seried analysis, and visualization.
53
54       When  a  record is split to conform to a time series bin, the resulting
55       starting and ending timestamps may or may not coincide with  the  time‐
56       stamps of the bins themselves. For some applications, this treatment is
57       critical to the analytics that are working on the resulting data,  such
58       as transaction duration, and flow traffic burst behavior.  However, for
59       other analytics, like average load, and rate  analysis  and  reporting,
60       the  timestamps need to be modified so that they reflect the time range
61       of the actual time bin boundaries.  Rabins supports the  optional  hard
62       option  to  specify  that  timestamps should conform to bin boundaries.
63       One of the results of this  is  that  all  durations  in  the  reported
64       records  will  be  the  bin duration.  This is extremely important when
65       processing certain time series metrics, like load.
66
67

Load Based Bins

69       Load based bin'ing is specified using the -M size or -M count  options.
70       Load  bins  are  used to constrain the resource used in bin processing.
71       So much load is input, aggregation is performed on the input load,  and
72       when  a  threshold  is reached, the entire aggregation cache is dumped,
73       reinitiallized, and reused.  These can be used effectively  to  provide
74       realtime data reduction, but within a fixed amount of memory.
75
76
77

Output Processing

79       rabins  has  two basic modes of output, the default holds all output in
80       main memory until EOF is encountered on input, where each sorted bin is
81       written  out.  The  second output mode, has rabins writing out the con‐
82       tents of individual sorted bins, periodically based on a holding  time,
83       specified  using  the  -B secs option.  The secs value should be chosen
84       such that rabins will have seen all the appropriate incoming  data  for
85       that time period.  This is determined by the ARGUS_FLOW_STATUS_INTERVAL
86       used by the collection of argus data sources in the input data  stream,
87       as  well  as  any time drift that may exist amoung argus data processin
88       elements.  When there is good time sync, and  with  an  ARGUS_FLOW_STA‐
89       TUS_INTERVAL  of  5  seconds,  appropriate secs values are between 5-15
90       seconds.
91
92       The output of rabins when using the -B secs option, is  appropriate  to
93       drive  a  number of processing elements, such as near real-time visual‐
94       izations and alarm and reporting.
95
96
97

Output Stream

99       Like all ra.1 client programs, the output of rabins.1 is an argus  data
100       stream,  that  can be written as binary data to a file or standard out‐
101       put, or can be printed.  rabins supports all the output functions  pro‐
102       vided by rasplit.1.
103
104
105       The  output  files  name consists of a prefix, which is specified using
106       the -w ra option, and for all modes except time mode, a  suffix,  which
107       is  created  for  each  resulting file.  If no prefix is provided, then
108       rabins will use 'x' as the default prefix.  The suffix that is used  is
109       determined  by the mode of operation.  When rabins is using the default
110       count mode or the size mode, the suffix is a  group  of  letters  'aa',
111       'ab',  and  so  on,  such that concatenating the output files in sorted
112       order by file name produces the original input file.   If  rabins  will
113       need to create more output files than are allowed by the default suffix
114       strategy, more letters will be added, in order to accomodate the needed
115       files.
116
117       When  rabins is spliting based on time, rabins uses a default extension
118       of %Y.%m.%d.%h.%m.%s.  This default can be overrided by  adding  a  '%'
119       extension to the name provided using the -w option.
120
121       When standard out is specified, using -w -, rabins will output a single
122       argus-stream with START and  STOP  argus  management  records  inserted
123       appropriately  to indicate where the output is split.  See argus(8) for
124       more information on output stream formats.
125
126       When rabins is spliting on output record count (the default), the  num‐
127       ber  of records is specified as an ordinal counter, the default is 1000
128       records.  When rabins is spliting based  on  the  maximum  output  file
129       size,  the  size  is specified as bytes.  The scale of the bytes can be
130       specified by appending 'b', 'k' and 'm' to the number provided.
131
132       When rabins is spliting base on time, the time period is specified with
133       the  option,  and  can be any period based in seconds (s), minutes (m),
134       hours (h), days (d), weeks (w), months (M) or years (y).   Rabins  will
135       create  and  modify  records  as  required  to split on prescribed time
136       boundaries.  If any record spans a time boundary, the record  is  split
137       and the metrics are adjusted using a uniform distribution model to dis‐
138       tribute the statistics between the two records.
139
140       See rasplit.1 for specifics.
141
142
143

RABINS SPECIFIC OPTIONS

145       rabins, like all ra based clients, supports  a  number  of  ra  options
146       including remote data access, reading from multiple files and filtering
147       of input argus records through a terminating filter expression.  Rabins
148       also  provides all the functions of racluster.1 and rasplit.1, for pro‐
149       cessing and outputing data.  rabins specific options are:
150
151
152       -B secs
153            Holding time in seconds before closing a  bin  and  outputing  its
154            contents.
155
156       -M splitmode
157            Supported spliting modes are:
158
159              time <n[smhdwMy]>
160                   bin  records  into  time slots of n size.  This is used for
161                   time series analytics, especially  graphing.   Records,  by
162                   default are split, so that their timestamps do not span the
163                   time range specified.  Metrics  are  uniformly  distributed
164                   among the resulting records.
165
166              count <n[kmb]>
167                   bin  records  into  chunks  based on the number of records.
168                   This is used for archive management and parallel processing
169                   analytics,  to  limit  the size of data processing to fixed
170                   numbers of records.
171
172              size <n[kmb]>
173                   bin records into chunks based on the number of total bytes.
174                   This is used for archive management and parallel processing
175                   analytics, to limit the size of data  processing  to  fixed
176                   byte limitations.
177
178       -M modes
179            Supported processing modes are:
180              hard split on hard time boundaries.  Each flow records start and
181                   stop times will be the time boundary times.  The default is
182                   to  use  the  original  start  and stop timestamps from the
183                   records that make up the resulting aggregation.
184              nomodify
185                   Do not split the record when including it into a time  bin.
186                   This  allows  a  time bin to represent times outside of its
187                   defintion.  This option should not be used with the  'hard'
188                   option, as you will modify metrics and semantics.
189       -m aggregation object
190            Supported aggregation objects are:
191              none           use a null flow key.
192              srcid          argus source identifier.
193              smac           source mac(ether) addr.
194              dmac           destination mac(ether) addr.
195              soui           oui portion of the source mac(ether) addr.
196              doui           oui portion of the destination mac(ether) addr.
197              smpls          source mpls label.
198              dmpls          destination label addr.
199              svlan          source vlan label.
200              dvlan          destination vlan addr.
201              saddr/[l|m]    source IP addr/[cidr len | m.a.s.k].
202              daddr/[l|m]    destination IP addr/[cidr len | m.a.s.k].
203              matrix/l       sorted src and dst IP addr/cidr len.
204              proto          transaction protocol.
205              sport          source port number. Implies use of 'proto'.
206              dport          destination port number. Implies use of 'proto'.
207              stos           source TOS byte value.
208              dtos           destination TOS byte value.
209              sttl           src -> dst TTL value.
210              dttl           dst -> src TTL value.
211              stcpb          src -> dst TCP base sequence number.
212              dtcpb          dst -> src TCP base sequence number.
213              inode[/l|m]]   intermediate  node  IP addr/[cidr len | m.a.s.k],
214                             source of ICMP mapped events.
215              sco            source ARIN country code, if present.
216              dco            destination ARIN country code, if present.
217              sas            source node origin AS number, if available.
218              das            destination node origin AS number, if available.
219              ias            intermediate node origin AS number, if available.
220
221       -P sort field
222            Rabins can sort its output based on a  sort  field  specification.
223            Because  the  -m option is used for aggregation fields, -P is used
224            to specify the print priority order.  See rasort(1) for  the  list
225            of sortable fields.
226
227       -w filename
228            Rabins  supports  an  extended  -w  option  that allows for output
229            record contents to be inserted into the output  filename.   Speci‐
230            fied using '$' (dollar) notation, any printable field can be used.
231            Care should be taken to honor any shell escape  requirements  when
232            specifying  on the command line.  See ra(1) for the list of print‐
233            able fields.
234
235            Another extended  feature,  when  using  time  mode,  rabins  will
236            process  the  supplied  filename  using  strftime(3), so that time
237            fields can be inserted into the resulting output filename.
238

INVOCATION

240       This invocation aggregates inputfile based on  10  minute  time  bound‐
241       aries.   Input  is  split  to fit within a 10 minute time boundary, and
242       within those boundaries, argus records are aggregated.   The  resulting
243       output its streamed to a single file.
244
245          rabins -r * -M time 10m -w outputfile
246
247       This  next  invocation  aggregates  inputfiles  based  on 5 minute time
248       boundaries, and the output is written to  5  minute  files.   Input  is
249       split  such that all records conform to hard 10 minute time boundaries,
250       and within those boundaries, argus  records  are  aggregated,  in  this
251       case, based on IP address matrix.
252       The  resulting  output its streamed to files that are named relative to
253       the records output content, a prefix of /matrix/%Y/%m/%d/argus. and the
254       suffixes %H.%M.%S.
255
256          rabins -r * -M hard time 5m -m matrix -w "/matrix/%Y/%m/%d/argus.%H.%M.%S"
257
258       This next invocation aggregates input.stream based on matrix/24 into 10
259       second time boundaries, holds the data  for  an  additional  5  seconds
260       after the time boundary has passed, and then prints the complete sorted
261       contents of each bin to standard output.  The output is printed  at  10
262       second intervals, and the output is the content of the previous  10 sec
263       time bin.  This example is meant to provide, every 10 seconds, the sum‐
264       mary  of all Class C subnet activity seen.  It is intended to run inde‐
265       finately printing out aggregated summary  records.   By  modifying  the
266       aggregation  model,  using  the  "-f  racluster.conf"  option,  you can
267       achieve a great deal of data reduction with a lot of  semantic  report‐
268       ing.
269
270
271       % rabins -S localhost -m matrix/24 -B 5s -M hard time 10s -p0 -s +1trans - ipv4
272                  StartTime  Trans  Proto            SrcAddr   Dir            DstAddr  SrcPkts  DstPkts     SrcBytes     DstBytes State
273        2012/02/15.13:37:00      5     ip     192.168.0.0/24   <->     192.168.0.0/24       41       40         2860        12122   CON
274        2012/02/15.13:37:00      2     ip     192.168.0.0/24    ->       224.0.0.0/24        2        0          319            0   INT
275       [ 10 seconds pass]
276        2012/02/15.13:37:10     13     ip     192.168.0.0/24   <->    208.59.201.0/24      269      351        97886       398700   CON
277        2012/02/15.13:37:10     14     ip     192.168.0.0/24   <->     192.168.0.0/24       86       92         7814        46800   CON
278        2012/02/15.13:37:10      1     ip    17.172.224.0/24   <->     192.168.0.0/24       52       37        68125         4372   CON
279        2012/02/15.13:37:10      1     ip     192.168.0.0/24   <->      199.7.55.0/24        7        7          784         2566   CON
280        2012/02/15.13:37:10      1     ip     184.85.13.0/24   <->     192.168.0.0/24        6        5         3952         2204   CON
281        2012/02/15.13:37:10      2     ip    66.235.132.0/24   <->     192.168.0.0/24        5        6          915         3732   CON
282        2012/02/15.13:37:10      1     ip    74.125.226.0/24   <->     192.168.0.0/24        3        4          709          888   CON
283        2012/02/15.13:37:10      3     ip       66.39.3.0/24   <->     192.168.0.0/24        3        3          369          198   CON
284        2012/02/15.13:37:10      1     ip     192.168.0.0/24   <->     205.188.1.0/24        1        1           54          356   CON
285       [ 10 seconds pass]
286        2012/02/15.13:37:20      6     ip     192.168.0.0/24   <->    208.59.201.0/24      392      461        60531       623894   CON
287        2012/02/15.13:37:20      8     ip     192.168.0.0/24   <->     192.168.0.0/24       95      111         6948        93536   CON
288        2012/02/15.13:37:20      3     ip     72.14.204.0/24   <->     192.168.0.0/24       38       32        38568         4414   CON
289        2012/02/15.13:37:20      1     ip    17.112.156.0/24   <->     192.168.0.0/24       26       13        21798         7116   CON
290        2012/02/15.13:37:20      2     ip    66.235.132.0/24   <->     192.168.0.0/24        6        3         1232         4450   CON
291        2012/02/15.13:37:20      1     ip    66.235.133.0/24   <->     192.168.0.0/24        1        2           82          132   CON
292       [ 10 seconds pass]
293        2012/02/15.13:37:30    117     ip     192.168.0.0/24   <->    208.59.201.0/24      697      663       369769       134382   CON
294        2012/02/15.13:37:30     11     ip     192.168.0.0/24   <->     192.168.0.0/24      147      187        11210       193253   CON
295        2012/02/15.13:37:30      1     ip     184.85.13.0/24   <->     192.168.0.0/24       13        9        13408         9031   CON
296        2012/02/15.13:37:30      2     ip    66.235.132.0/24   <->     192.168.0.0/24        8        7         1920        11563   CON
297        2012/02/15.13:37:30      1     ip     192.168.0.0/24   <->    207.46.193.0/24        5        3          802          562   CON
298        2012/02/15.13:37:30      1     ip    17.112.156.0/24   <->     192.168.0.0/24        5        2          646         3684   CON
299        2012/02/15.13:37:30      2     ip     192.168.0.0/24    ->       224.0.0.0/24        2        0          382            0   REQ
300       [ 10 seconds pass]
301
302
303       This  next  invocation  reads  IP argus(8) data from inputfile and pro‐
304       cesses, the argus(8) data stream based on input byte size of no greater
305       than  1  Megabyte.   The resulting output stream is written to a single
306       argus.out data file.
307
308          rabins -r argusfile -M size 1m -s +1dur -m proto -w argus.out - ip
309
310
311       This invocation reads IP argus(8) data from  inputfile  and  aggregates
312       the argus(8) data stream based on input file size of no greater than 1K
313       flows.  The resulting output stream is printed to the screen  as  stan‐
314       dard argus records.
315
316          rabins -r argusfile -M count 1k -m proto -s stime dur proto spkts dpkts - ip
317
318

COPYRIGHT

320       Copyright (c) 2000-2016 QoSient. All rights reserved.
321

AUTHORS

326       Carter Bullard (carter@qosient.com).
327
328
329
330rabins 3.0.8                    12 August 2003                       RABINS(1)