1RABINS(1) General Commands Manual RABINS(1)
2
3
4
6 rabins - process argus(8) data within specified bins.
7
9 rabins [-B secs] -M splitmode [options]] [raoptions] [-- filter-expres‐
10 sion]
11
13 Rabins reads argus data from an argus-data source, and adjusts the data
14 so that it is aligned to a set of bins, or slots, that are based on
15 either time, input size, or count. The resulting output is split, mod‐
16 ified, and optionally aggregated so that the data fits to the con‐
17 straints of the specified bins. rabins is designed to be a combination
18 of rasplit and racluster, acting on multiple contexts of argus data.
19
20 The principal function of rabins is to align input data to a series of
21 bins, and then process the data within the context of each bin. This
22 is the basis for real-time stream block processing. Time series stream
23 block processing is cricital for flow data graphing, comparing, analyz‐
24 ing, and correlation. Fixed load stream block processing, based on the
25 number of argus data records ('count'), or a fixed volume of data
26 ('size') allows for control of resources in processing. While load
27 based options are very useful, they are rather esoteric. See the
28 online examples and rasplit.1 for examples of using these modes of
29 operation.
30
31
33 Time series bin'ing is specified using the -M time option. Time bins
34 are specified by the size and granularity of the time bin. The granu‐
35 larity, 's'econds, 'm'inutes, 'h'ours, 'd'ays, 'w'eeks, 'M'onths, and
36 'y'ears, dictates where the bin boundaries lie. To ensure that 0.5d
37 and 12h start on the same point in time, second, minute, hour, and day
38 based bins start at midnight, Jan 1st of the year of processing. Week,
39 month and year bins all start on natural time boundaries, for the
40 period.
41
42 rabins provides a separate processing context for each bin, so that
43 aggregation and sorting occur only within the context of each time
44 period. Records are placed into bins based on load or time. For load
45 based bins, input records are processed in received order and are not
46 modified. When using time based bins, records are placed into bins
47 based on the starting time of the record. By default, records that
48 span a time boundary are split into as many records as needed to fit
49 the record into appropriate bin sizes, using the algorithms used by
50 rasplit.1. Metrics are distributed uniformly within all the appropri‐
51 ate bins. The result is a series of data and/or fragments that are time
52 aligned, appropriate for time seried analysis, and visualization.
53
54 When a record is split to conform to a time series bin, the resulting
55 starting and ending timestamps may or may not coincide with the time‐
56 stamps of the bins themselves. For some applications, this treatment is
57 critical to the analytics that are working on the resulting data, such
58 as transaction duration, and flow traffic burst behavior. However, for
59 other analytics, like average load, and rate analysis and reporting,
60 the timestamps need to be modified so that they reflect the time range
61 of the actual time bin boundaries. Rabins supports the optional hard
62 option to specify that timestamps should conform to bin boundaries.
63 One of the results of this is that all durations in the reported
64 records will be the bin duration. This is extremely important when
65 processing certain time series metrics, like load.
66
67
69 Load based bin'ing is specified using the -M size or -M count options.
70 Load bins are used to constrain the resource used in bin processing.
71 So much load is input, aggregation is performed on the input load, and
72 when a threshold is reached, the entire aggregation cache is dumped,
73 reinitiallized, and reused. These can be used effectively to provide
74 realtime data reduction, but within a fixed amount of memory.
75
76
77
79 rabins has two basic modes of output, the default holds all output in
80 main memory until EOF is encountered on input, where each sorted bin is
81 written out. The second output mode, has rabins writing out the con‐
82 tents of individual sorted bins, periodically based on a holding time,
83 specified using the -B secs option. The secs value should be chosen
84 such that rabins will have seen all the appropriate incoming data for
85 that time period. This is determined by the ARGUS_FLOW_STATUS_INTERVAL
86 used by the collection of argus data sources in the input data stream,
87 as well as any time drift that may exist amoung argus data processin
88 elements. When there is good time sync, and with an ARGUS_FLOW_STA‐
89 TUS_INTERVAL of 5 seconds, appropriate secs values are between 5-15
90 seconds.
91
92 The output of rabins when using the -B secs option, is appropriate to
93 drive a number of processing elements, such as near real-time visual‐
94 izations and alarm and reporting.
95
96
97
99 Like all ra.1 client programs, the output of rabins.1 is an argus data
100 stream, that can be written as binary data to a file or standard out‐
101 put, or can be printed. rabins supports all the output functions pro‐
102 vided by rasplit.1.
103
104
105 The output files name consists of a prefix, which is specified using
106 the -w ra option, and for all modes except time mode, a suffix, which
107 is created for each resulting file. If no prefix is provided, then
108 rabins will use 'x' as the default prefix. The suffix that is used is
109 determined by the mode of operation. When rabins is using the default
110 count mode or the size mode, the suffix is a group of letters 'aa',
111 'ab', and so on, such that concatenating the output files in sorted
112 order by file name produces the original input file. If rabins will
113 need to create more output files than are allowed by the default suffix
114 strategy, more letters will be added, in order to accomodate the needed
115 files.
116
117 When rabins is spliting based on time, rabins uses a default extension
118 of %Y.%m.%d.%h.%m.%s. This default can be overrided by adding a '%'
119 extension to the name provided using the -w option.
120
121 When standard out is specified, using -w -, rabins will output a single
122 argus-stream with START and STOP argus management records inserted
123 appropriately to indicate where the output is split. See argus(8) for
124 more information on output stream formats.
125
126 When rabins is spliting on output record count (the default), the num‐
127 ber of records is specified as an ordinal counter, the default is 1000
128 records. When rabins is spliting based on the maximum output file
129 size, the size is specified as bytes. The scale of the bytes can be
130 specified by appending 'b', 'k' and 'm' to the number provided.
131
132 When rabins is spliting base on time, the time period is specified with
133 the option, and can be any period based in seconds (s), minutes (m),
134 hours (h), days (d), weeks (w), months (M) or years (y). Rabins will
135 create and modify records as required to split on prescribed time
136 boundaries. If any record spans a time boundary, the record is split
137 and the metrics are adjusted using a uniform distribution model to dis‐
138 tribute the statistics between the two records.
139
140 See rasplit.1 for specifics.
141
142
143
145 rabins, like all ra based clients, supports a number of ra options
146 including remote data access, reading from multiple files and filtering
147 of input argus records through a terminating filter expression. Rabins
148 also provides all the functions of racluster.1 and rasplit.1, for pro‐
149 cessing and outputing data. rabins specific options are:
150
151
152 -B secs
153 Holding time in seconds before closing a bin and outputing its
154 contents.
155
156 -M splitmode
157 Supported spliting modes are:
158
159 time <n[smhdwMy]>
160 bin records into time slots of n size. This is used for
161 time series analytics, especially graphing. Records, by
162 default are split, so that their timestamps do not span the
163 time range specified. Metrics are uniformly distributed
164 among the resulting records.
165
166 count <n[kmb]>
167 bin records into chunks based on the number of records.
168 This is used for archive management and parallel processing
169 analytics, to limit the size of data processing to fixed
170 numbers of records.
171
172 size <n[kmb]>
173 bin records into chunks based on the number of total bytes.
174 This is used for archive management and parallel processing
175 analytics, to limit the size of data processing to fixed
176 byte limitations.
177
178 -M modes
179 Supported processing modes are:
180 hard split on hard time boundaries. Each flow records start and
181 stop times will be the time boundary times. The default is
182 to use the original start and stop timestamps from the
183 records that make up the resulting aggregation.
184 nomodify
185 Do not split the record when including it into a time bin.
186 This allows a time bin to represent times outside of its
187 defintion. This option should not be used with the 'hard'
188 option, as you will modify metrics and semantics.
189 -m aggregation object
190 Supported aggregation objects are:
191 none use a null flow key.
192 srcid argus source identifier.
193 smac source mac(ether) addr.
194 dmac destination mac(ether) addr.
195 soui oui portion of the source mac(ether) addr.
196 doui oui portion of the destination mac(ether) addr.
197 smpls source mpls label.
198 dmpls destination label addr.
199 svlan source vlan label.
200 dvlan destination vlan addr.
201 saddr/[l|m] source IP addr/[cidr len | m.a.s.k].
202 daddr/[l|m] destination IP addr/[cidr len | m.a.s.k].
203 matrix/l sorted src and dst IP addr/cidr len.
204 proto transaction protocol.
205 sport source port number. Implies use of 'proto'.
206 dport destination port number. Implies use of 'proto'.
207 stos source TOS byte value.
208 dtos destination TOS byte value.
209 sttl src -> dst TTL value.
210 dttl dst -> src TTL value.
211 stcpb src -> dst TCP base sequence number.
212 dtcpb dst -> src TCP base sequence number.
213 inode[/l|m]] intermediate node IP addr/[cidr len | m.a.s.k],
214 source of ICMP mapped events.
215 sco source ARIN country code, if present.
216 dco destination ARIN country code, if present.
217 sas source node origin AS number, if available.
218 das destination node origin AS number, if available.
219 ias intermediate node origin AS number, if available.
220
221 -P sort field
222 Rabins can sort its output based on a sort field specification.
223 Because the -m option is used for aggregation fields, -P is used
224 to specify the print priority order. See rasort(1) for the list
225 of sortable fields.
226
227 -w filename
228 Rabins supports an extended -w option that allows for output
229 record contents to be inserted into the output filename. Speci‐
230 fied using '$' (dollar) notation, any printable field can be used.
231 Care should be taken to honor any shell escape requirements when
232 specifying on the command line. See ra(1) for the list of print‐
233 able fields.
234
235 Another extended feature, when using time mode, rabins will
236 process the supplied filename using strftime(3), so that time
237 fields can be inserted into the resulting output filename.
238
240 This invocation aggregates inputfile based on 10 minute time bound‐
241 aries. Input is split to fit within a 10 minute time boundary, and
242 within those boundaries, argus records are aggregated. The resulting
243 output its streamed to a single file.
244
245 rabins -r * -M time 10m -w outputfile
246
247 This next invocation aggregates inputfiles based on 5 minute time
248 boundaries, and the output is written to 5 minute files. Input is
249 split such that all records conform to hard 10 minute time boundaries,
250 and within those boundaries, argus records are aggregated, in this
251 case, based on IP address matrix.
252 The resulting output its streamed to files that are named relative to
253 the records output content, a prefix of /matrix/%Y/%m/%d/argus. and the
254 suffixes %H.%M.%S.
255
256 rabins -r * -M hard time 5m -m matrix -w "/matrix/%Y/%m/%d/argus.%H.%M.%S"
257
258 This next invocation aggregates input.stream based on matrix/24 into 10
259 second time boundaries, holds the data for an additional 5 seconds
260 after the time boundary has passed, and then prints the complete sorted
261 contents of each bin to standard output. The output is printed at 10
262 second intervals, and the output is the content of the previous 10 sec
263 time bin. This example is meant to provide, every 10 seconds, the sum‐
264 mary of all Class C subnet activity seen. It is intended to run inde‐
265 finately printing out aggregated summary records. By modifying the
266 aggregation model, using the "-f racluster.conf" option, you can
267 achieve a great deal of data reduction with a lot of semantic report‐
268 ing.
269
270
271 % rabins -S localhost -m matrix/24 -B 5s -M hard time 10s -p0 -s +1trans - ipv4
272 StartTime Trans Proto SrcAddr Dir DstAddr SrcPkts DstPkts SrcBytes DstBytes State
273 2012/02/15.13:37:00 5 ip 192.168.0.0/24 <-> 192.168.0.0/24 41 40 2860 12122 CON
274 2012/02/15.13:37:00 2 ip 192.168.0.0/24 -> 224.0.0.0/24 2 0 319 0 INT
275 [ 10 seconds pass]
276 2012/02/15.13:37:10 13 ip 192.168.0.0/24 <-> 208.59.201.0/24 269 351 97886 398700 CON
277 2012/02/15.13:37:10 14 ip 192.168.0.0/24 <-> 192.168.0.0/24 86 92 7814 46800 CON
278 2012/02/15.13:37:10 1 ip 17.172.224.0/24 <-> 192.168.0.0/24 52 37 68125 4372 CON
279 2012/02/15.13:37:10 1 ip 192.168.0.0/24 <-> 199.7.55.0/24 7 7 784 2566 CON
280 2012/02/15.13:37:10 1 ip 184.85.13.0/24 <-> 192.168.0.0/24 6 5 3952 2204 CON
281 2012/02/15.13:37:10 2 ip 66.235.132.0/24 <-> 192.168.0.0/24 5 6 915 3732 CON
282 2012/02/15.13:37:10 1 ip 74.125.226.0/24 <-> 192.168.0.0/24 3 4 709 888 CON
283 2012/02/15.13:37:10 3 ip 66.39.3.0/24 <-> 192.168.0.0/24 3 3 369 198 CON
284 2012/02/15.13:37:10 1 ip 192.168.0.0/24 <-> 205.188.1.0/24 1 1 54 356 CON
285 [ 10 seconds pass]
286 2012/02/15.13:37:20 6 ip 192.168.0.0/24 <-> 208.59.201.0/24 392 461 60531 623894 CON
287 2012/02/15.13:37:20 8 ip 192.168.0.0/24 <-> 192.168.0.0/24 95 111 6948 93536 CON
288 2012/02/15.13:37:20 3 ip 72.14.204.0/24 <-> 192.168.0.0/24 38 32 38568 4414 CON
289 2012/02/15.13:37:20 1 ip 17.112.156.0/24 <-> 192.168.0.0/24 26 13 21798 7116 CON
290 2012/02/15.13:37:20 2 ip 66.235.132.0/24 <-> 192.168.0.0/24 6 3 1232 4450 CON
291 2012/02/15.13:37:20 1 ip 66.235.133.0/24 <-> 192.168.0.0/24 1 2 82 132 CON
292 [ 10 seconds pass]
293 2012/02/15.13:37:30 117 ip 192.168.0.0/24 <-> 208.59.201.0/24 697 663 369769 134382 CON
294 2012/02/15.13:37:30 11 ip 192.168.0.0/24 <-> 192.168.0.0/24 147 187 11210 193253 CON
295 2012/02/15.13:37:30 1 ip 184.85.13.0/24 <-> 192.168.0.0/24 13 9 13408 9031 CON
296 2012/02/15.13:37:30 2 ip 66.235.132.0/24 <-> 192.168.0.0/24 8 7 1920 11563 CON
297 2012/02/15.13:37:30 1 ip 192.168.0.0/24 <-> 207.46.193.0/24 5 3 802 562 CON
298 2012/02/15.13:37:30 1 ip 17.112.156.0/24 <-> 192.168.0.0/24 5 2 646 3684 CON
299 2012/02/15.13:37:30 2 ip 192.168.0.0/24 -> 224.0.0.0/24 2 0 382 0 REQ
300 [ 10 seconds pass]
301
302
303 This next invocation reads IP argus(8) data from inputfile and pro‐
304 cesses, the argus(8) data stream based on input byte size of no greater
305 than 1 Megabyte. The resulting output stream is written to a single
306 argus.out data file.
307
308 rabins -r argusfile -M size 1m -s +1dur -m proto -w argus.out - ip
309
310
311 This invocation reads IP argus(8) data from inputfile and aggregates
312 the argus(8) data stream based on input file size of no greater than 1K
313 flows. The resulting output stream is printed to the screen as stan‐
314 dard argus records.
315
316 rabins -r argusfile -M count 1k -m proto -s stime dur proto spkts dpkts - ip
317
318
320 Copyright (c) 2000-2016 QoSient. All rights reserved.
321
323 ra(1), racluster(1), rasplit(1), rarc(5), argus(8),
324
326 Carter Bullard (carter@qosient.com).
327
328
329
330rabins 3.0.8 12 August 2003 RABINS(1)