1COLLECTL(1) Collectl COLLECTL(1)
2
3
4
6 collectl - Collects data that describes the current system status.
7
8
10 Record Mode - read data from live system and write to file or display
11 on terminal
12
13 collectl [-f file] [options]
14
15 Playback Mode - read data from one or more raw data files and display
16 on terminal
17
18 collectl -p file1 [file2 ...] [options]
19
20
22 Record Mode
23
24 In this mode data is taken from a live system and either displayed on
25 the terminal or written to one or more files or a socket.
26
27 --align
28 If the HiRes modules is present, collectl sample monitoring will
29 be aligned such that a sample will always be taken at the top of
30 a minute (this does NOT mean the first sample will occur then)
31 so that all instances of collectl running on any systems which
32 have their clocks synchronized will all take samples at the same
33 time. Furthermore, if one is doing process monitoring, those
34 samples will also be taken at the top of the minute and so can
35 delay the start of sampling up to 2 full process monitoring
36 intervals.
37
38 --all
39 Collect summary data for ALL subsystems except slabs, since slab
40 monitoring requires a different monitoring interval. This also
41 means you won't get any detail data which also includes pro‐
42 cesses and environmementals. You can use this switch anywhere
43 -s can be used but not both together. If the system supports
44 lustre and/or interconnect monitoring those statistics will be
45 provided but the warnings produced when they are not available
46 you try to select them with -s will not be displayed.
47
48 --ALL
49 This is actually a superset of --all by adding detail statistics
50 as well with the exception of TCP details when displaying to a
51 terminal since those are only available with -P or -f.
52
53 -A, --address address[:port[:timeout]] | server[:port]
54 In the first form, one specifies an address, optional port and
55 timeout (the first colon is required to specify timeout for
56 default port). All data is then written to that socket prefaced
57 with the current host name at the named address and port until
58 the socket is closed, at which time collectl will exit.
59
60 In the second form one enters the text "server" and optional
61 port. In this form, collectl runs as a server, waiting for a
62 connection and once established writes data on that socket. The
63 key difference here is if the client exists collectl keeps run‐
64 ning and will again look for a new connection, allowing it to
65 survive client restarts or crashes.
66
67 The default port is set at 2655 but can be changed - see col‐
68 lectl.conf.
69
70 In both forms, one can additionally request local data logging
71 by specifying a combination of -P and -f. See man collectl-log‐
72 ging for more details.
73
74 --comment string
75 Add the specified string to the end of the headers in the data
76 files. If any embedded spaces be sure to quote it. This can be
77 very useful when doing characterizations or benchmarking and
78 you're frequently changing system/application parameters and
79 restarting collectl between tests.
80
81 -C, --config filename
82 Name/location of the collectl configuration file. If not speci‐
83 fied, collectl searches for collectl.conf first in /etc (the
84 default), then in the same directory the collectl executable is
85 in, and finally the current working directory.
86
87 -c, --count Samples
88 The number of samples to record. This is one way of 3 ways of
89 describing how long collectl should run (see -r and -R ). Note
90 that these 3 switches are mutually exclusive.
91
92 -D, --daemon
93 Run collectl as a daemon, primarily used when starting as a ser‐
94 vice. One caveat about this mode is you can only run one copy.
95
96 --export file[,options]
97 This requests that collectl does not print anything on the ter‐
98 minal (or send it to a socket) using the standard brief/ver‐
99 bose/plot formats. Instead it executes a perl "require" on the
100 named file, using an extension of ph if not specified. It first
101 looks in the current directory and if not there the directory
102 the executable is in. It then calls the function
103 "file"Init(options) towards the beginning of collectl and again
104 as simply "file"(@options) to generate the exported formatted
105 output. See the online documentation on Exporting Custom Output
106 and Logging for more details.
107
108 -f, --filename Filename
109 This is the name of a file to write the output to. For details
110 on how the output files are named, see the File Naming section
111 of the documentation on collectl.sourceforge.net OR
112 /usr/share/doc/collectl/FileNaming.html
113
114
115 -F, --flush seconds
116 Flush output buffers after this number of seconds. This is
117 equivalent to issuing kill -s USR1 at the same frequency (but a
118 lot easier!). If 0, a flush will occur every data collection
119 interval.
120
121 --grep pattern
122 The main purpose of this switch is for those users who have dis‐
123 covered there is some data in the raw files that never appears
124 in any display and have taken to displaying it themselves with
125 grep. Unfortunately this method does not include timestamps and
126 so makes it difficult to interpret the results. Even if you
127 include the timestamp from the file it is in UTC and so needs to
128 be translated to be of any real value. This switch does just
129 that and then some.
130
131 Specifically, it allows you to playback a file and instead of
132 processing it normally it simply searches for any entries that
133 match the perl pattern and reports those lines prefaced with
134 time stamps. You can optionally change the time format with the
135 usual -o options and can even select the timeframe with --from
136 and --thru.
137
138 --home
139 Always start the display for the current interval at the top of
140 the screen also known as the home position (non-plot format
141 only). This generates a real-time, continously refreshing dis‐
142 play when the data fits on a single screen.
143
144 --import file1[,options][:file2[,options]...]
145 This loads the named files and executes callbacks to them, which
146 is the API mechanism for importing additional metrics into col‐
147 lectl. See the webpage on the API for further detail.
148
149 Since these files also include instructions for how to report
150 the output in all the various forms, you will also need to
151 include --import during playback. Finally, since the default is
152 to seamlessly include imported data with everything else col‐
153 lectl reports, if you ONLY want to display imported data you
154 much explicitly deselect all other subsystems either by includ‐
155 ing -s- (note the trailing minus sign) followed by all the sub‐
156 systems were recorded OR simply say -s-all.
157
158 -i, --interval interval[:interval2[:interval3]]
159 This is the sampling interval in seconds. The default is 10
160 seconds when run as a daemon and 1 second otherwise. The
161 process subsystem and slabs (-sY and -sZ) are sampled at the
162 lower rate of interval2. Environmentals (-sE), which only apply
163 to a subset of hardware, are sampled at interval3. Both inter‐
164 val2 and interval3, if specified, must be an even multiple of
165 interval1. The daemon default is -i10:60:300 and all other
166 modes are -i1:60:300. To sample only processes once every 10
167 seconds use -i:10.
168
169 --nohup
170 Whenever collectl finishes a data collection interval, it checks
171 to see if the starting parent has exited. This is to prevent
172 the case in which someone might start a copy of collectl and
173 then the process dies and collectl keeps running. If that is
174 the behavior someone actually intends, they should start col‐
175 lectl with --nohup.
176
177 NOTE - when running as a daemon, --nohup is implied.
178
179 --quiet
180 Whenever collectl wants to tell the user something, it assigns a
181 category to it such as Informational, Warning, Error or Fatal.
182 When run with -m, all messages are displayed for the user and if
183 logging data to a file with -f, these messages are also sent to
184 a log file which is in the data collection directory and has an
185 extenion of "log". However, if -m is not specified Informa‐
186 tional messages (such as collectl starting or stopping) are not
187 reported on the terminal but the other 3 are. Sometimes the
188 warnings can be annoying and one can suppress these with --quiet
189 though they will still be written to the message log in -f. You
190 cannot suppress Error or Fatal errors.
191
192 -r, --rolllogs time[[,days[:months]][,minutes]]
193 When selected, collectl runs indefinately (or at least until the
194 system reboots). The maximum number of raw and/or plot files
195 that will be retained (older ones are automatically deleted) is
196 controlled by the days field, the default is 7. When -m is also
197 specified to direct collectl to write messages to a log file in
198 the logging directory, the number of months to retain those logs
199 is controlled by the months field and its default is 12. The
200 increment field which is also optional (but is position depen‐
201 dent) specifies the duration of an individual collection file in
202 minutes the default of which is 1440 or 1 day.
203
204 --rawdskfilt
205 This switch overrides the DiskFilter setting in collectl.conf
206 and explicitly defines a perl regx expression against which
207 records from /prod/diskstats are selected for processing. When
208 there are a lot of disks to process, this can be a handy way to
209 reduce the amount of data collected and actually improve perfor‐
210 mance since there are less patterns to match each input record
211 against. Just remember that unlike --dskfilt which only filters
212 during display, records filtered with this switch are never even
213 recorded and so lost forever.
214
215 You can optionally specify your filter with a leading plus-sign
216 which tells collectl to just add your filter to the default
217 specification. Care should be taken here as longer filters will
218 slightly increase overhead and with a lot of disks and/or
219 shorter monitoring intervals can add up.
220
221 As a side benefit of this switch, if you really want to look at
222 partition level stats you can do so by leaving off the trailing
223 space in the default pattern.
224
225 One must be also be careful in selecting the correct pattern
226 since it's easy to get it wrong and you may end up collecting
227 the WRONG data! To verify you are collecting what you think you
228 are, make a test run using -d4 to see the raw data being
229 recorded in real-time.
230
231 --rawdskignore
232 This is the opposite of the rawdskfilt switch. When specified
233 any disks listed are completely ignored and will not appear in
234 the raw file. Typically this switch is useful when you're only
235 interested in recording a subset of disk statistics.
236
237 --rawnetfilt
238 This works just like --rawdskfilt except it applies to networks.
239 Unlike disk filtering which has an explicit default pattern, the
240 default for network filtering is to simply record all network
241 data from /proc/net/dev.
242
243 The -d4 switch also works here, as well as everywhere, to see
244 the raw data as it is being collected.
245
246 --rawnetignore
247 This is the opposite of the rawnetfilt switch and works just
248 like the rawdskignore switch. When specified any networks
249 listed are ignored and will not appear in the raw file. Typi‐
250 cally this switch is useful when you're only interested in
251 recording a subset of network statistics.
252
253 --rawtoo
254 Only available in conjunction with -P, this switch causes the
255 creation/logging of raw data in addition to plottable data.
256 While this may seem excessive, keep in mind that unlike plot‐
257 table data, raw data can be played back with different switches
258 potentially providing more details. The overhead to write out
259 this additional data is minimal, the only real cost being that
260 of extra disk space.
261
262 -R, --runas uid[:gid]
263 This switch only works when running in daemon mode and so must
264 be specified in the DaemonCommands line. Its presence will
265 cause collectl to write the collectl.pid file into the same
266 directory as its other output files as specified by -f, since
267 /var/run does not normally grant non-privileged users write
268 access. Furthermore, the ownership of that directory must match
269 the specified ownership since collectl needs to write ALL it's
270 files to that directory and can no longer assume global permis‐
271 sions when run as root.
272
273 This WILL also require manually modifying /etc/init.d/collectl
274 to change the PIDFILE variable to point to the same directory
275 which the -f switch in the DaemonCommands line of collectl.conf
276 points to.
277
278 As a final note of caution, since this mechanism changes where
279 collectl reads/writes its pid file, once you start using
280 --runas, all calls to run collectl as a daemon must use it or it
281 may be confused and exhibit unpredictable behavior.
282
283 -R, --runtime duration
284 Specify the duration of data collection where the duration is a
285 number followed by one of wdhms, indicating how many weeks,
286 days, hours, minutes or seconds the collection is to be taken
287 for.
288
289 --sep separator
290 Specify the plot format separator - default is a space. If this
291 is a numeric field it is interpretted as the decimal value of
292 the associated ASCII character code. Otherwise it is interpret‐
293 ted as the character itself. In other words, "--sep :" sets the
294 separator character to a colon and "--sep 9" sets it to a hori‐
295 zontal tab. "--sep 58" would also set it to a colon.
296
297 --tworaw
298 The switches -G and --group have been replaced by --rawtoo,
299 which is more rescriptive of its function. When specified, it
300 tells collectl to treat process and slab data as an entirely
301 separate group of raw files, named with the extention "rawp".
302 These separate files can be played back and processed just like
303 any other collectl raw files and in fact one can even play back
304 both at the same time if that is what is desired. The only real
305 purpose of this switch is that on some systems with many pro‐
306 cesses, it is possible to generate huge raw files (some have
307 been observerd to be >250MB!) and while collectl will happily
308 play back/process these files it can take a long time. By using
309 the --tworaw switch one still gets a huge rawp file, but the
310 normal raw file is a much more manageable size and as a result
311 will faster to process then when all data is combined into the
312 same file.
313
314 Playback Mode
315
316 In this mode, data is read from one or more data files that were gener‐
317 ated in Record Mode
318
319 --export Filename
320 When playing back a file, use this switch to create an identical
321 raw file differing only in the timeframe being convered, so nat‐
322 urally one must also include --from, --thru or both. Further,
323 since the resultant file will contain the exact same raw data
324 you cannot select a subset using -s. This switch is actually
325 intended for a support function for situations where somone is
326 having problems playing back a file and a subset of the original
327 raw file that covers the problem time has been requested, hope‐
328 fully allowing a significantly file to be posted or emailed.
329
330 --extract filename
331 If specified, rather than actually play back the file specified
332 with -p, ALL raw data between the date ranges is selected and a
333 subset of that raw file created. The rules for how to interpret
334 the filename are the same as used for -f.
335
336 -f, --filename filename
337 If specified, this is the name of a file or directory to write
338 the output to (rather than the terminal). See the description
339 for details on the format of this field. This requires the -P
340 flag as well.
341
342 --from time range
343 Play back data starting with this time, which may optionally
344 include the ending time as well, which is of the format of
345 [date:]time[-[date:]time]. The leading 0 of the hour is
346 optional and if the seconds field is not specified is assumed to
347 be 0. If no dates specified the time(s) apply to each file
348 specified by -P. Otherwise the time(s) only apply to the
349 first/last dates and any files between those dates will have all
350 their data reported.
351
352 --full
353 Full mode is actually a superset of --verbose and if selected
354 will force --verbose. It will also force the RECORD separator
355 to be printed for every interval even if only a single subsystem
356 was requested and to include the actual subsystems that follow
357 following the utc timestamp as a parsing aid for those who may
358 wish to parse the text output rather than the plot data.
359
360 --offsettime seconds
361 This field originally was used before collectl reported the
362 timezone in the file headers and allowed one to compensate.
363 Since then it is rarely needed except in two possible cases, one
364 in which data on two systems is to be compared and they weren't
365 synchonized with ntp. This allows all the times to be reported
366 as shifted by some number of seconds. The other case (and this
367 is very rare) is when a clock had changed in the middle of a
368 sample and will not be converted correctly. When this happens
369 one may have to play back the samples in pieces and manually set
370 the time offset.
371
372 --passwd filename
373 When reporting usernames associated with a UID, use this file
374 for the mapping. This is particularly important on systems run‐
375 ning NIS where this are no user names in /etc/passwd.
376
377 -p, --playback Filename
378 Read data from the specified playback file(s), noting that one
379 can use wildcards in the filename if quoted (if playing back
380 multiple files to the terminal you probably want to include -m
381 to see the filenames as they are processed). The filename must
382 either end in raw or raw.gz. As an added feature, since people
383 sometimes automate the running of this option and don't want to
384 hard code a date, you can specify the string YESTERDAY or TODAY
385 and they will be replaced in the filename string by the appro‐
386 priate date.
387
388 --pname name
389 By default, collectl uses the file /var/run/collectl.pid to
390 indicate the pid of the running instance of collectl and prevent
391 multiple copies from being run. If you DO want to run a second
392 copy, this switch will cause collectl to change its process name
393 to collectl-name and use that name as the associated pid file as
394 well.
395
396 --procanalyze
397 When specified and there is process data in the raw file, a sum‐
398 mary file will be generated with one entry unique process con‐
399 taining such things as the total cpu consumed for both user and
400 system, min/max utilization of various memory types, total page
401 faults and several others.
402
403 --slabanalyze
404 When specified and there is slab data in the raw file, a summary
405 file will be generated with one entry unique slab containing
406 data on physical memory usage by that slab.
407
408 --thru time
409 Time thru which to play back a raw file. See --from for more
410
411 Common Switches - both record and playback modes
412
413 -d, --debug debug
414 Control the level of debugging information, not typically used.
415 For details see the source code.
416
417 -h, --help, -x, --helpext, -X, --helpall
418 Display standard, extended help message (which doesn't include
419 the optional displays such as --showoptions, --showsubsys,
420 --showsubopts, --showtopopts) or everything.
421
422 --hr, --headerrepeat num
423 Sets the number of intervals to display data for before repeat‐
424 ing the header. A value -1 will prevent any headers from being
425 displayed and a value of 0 will cause only a single header to be
426 displayed and never repeated.
427
428 --iosize
429 In brief mode, include iosize with disk, infiniband and network
430 data.
431
432 -l, --limits limit
433 Override one or more default exception limits. If more than one
434 limit they must be separated by hyphens. Current values are:
435
436 SVC:value
437 Report partition activity with Service times >= 30 msec
438
439 IOS:value
440 Report device activity with 10 or more reads or writes
441 per second
442
443 LusKBS:value
444 Report client or OSS activity greater than limit. Only
445 applies to Client Summary or OSS Detail reporting.
446 [default=100000]
447
448 LusReints:value
449 Report MDS activity with Reint greater than limit. Only
450 applies to MDS Summary reporting. [default=1000]
451
452 AND
453 Both the IOS and SCV limits must be reached before a
454 device is reported. This is the default value and is
455 only included for completeness.
456
457 OR
458 Report device activity if either IOS or SVC thresholds
459 are reached.
460
461 -L, --lustsvcs [c|m|o][:seconds]
462 This switch limits which servics lustre checks for and
463 the frequency of those checks. For more information see
464 the man page collectl-lustre.
465
466 -m, --messages
467 Write status to a monthly log file in the same directory as the
468 output file (requires -f to be specified as well). The name of
469 the file will be collectl-yyyymm.log and will track various mes‐
470 sages that may get generated during every run of collectl.
471
472 -N, --nice
473 Set priority to a nicer one of 10.
474
475 -o, --options Options
476 These apply to the way output is displayed OR written to a plot
477 file. They do not effect the way data is selected for record‐
478 ing. Most of these switches work in both record as well as
479 playback mode. If you're not sure, just try it.
480
481 1
482 Data in plotting format should use 1 decimal point of
483 precision as appropriate.
484
485 2
486 Data in plotting format should use 2 decimal points of
487 precision as appropriate.
488
489 a
490 Always append data to an existing plot file. By default
491 if a plot file exists, the playback file will be skipped
492 as a way of assuring it is associated with a single
493 recorded file. This switch overrides that mechanism
494 allowing muliple recorded files to be processed and writ‐
495 ten to a single plot file.
496
497 c
498 Always open newly named plot fies in create mode, over‐
499 writing any old ones that may already exists. If one
500 processes multiple files for the same day in append mode
501 multiple times, the same data will be appended to the
502 same file mulitple times. This assures a new file is
503 created at the start of the processing.
504
505 d
506 For use with terminal output and brief mode. Preceed
507 each line with a date/time stamp, the date being in mm/dd
508 format. This option can also be applied to plot formatit
509 which will cause the date portion to also be displayed in
510 this format as opposed to D format.
511
512 D
513 For use with terminal output and brief mode. Preceed
514 each line with a date/time stamp, the date being in
515 yyyymmdd format.
516
517 g
518 For use with terminal output and brief mode. When dis‐
519 playing values of 1G or greater there is limited preci‐
520 sion for 1 digit values. This options provides a way to
521 display additional digits for more granularity by substi‐
522 tuting a "g" for the decimal point rather than the trail‐
523 ing "G".
524
525 G
526 For use with terminal output and brief mode. This is
527 similar to "g" but preserves the trailing "G" by sacri‐
528 ficing a digit of granularity.
529
530 m
531 Whenever times are reported in plot format, in the normal
532 terminal reporting format at the bginning of each inter‐
533 val or when when one of the time reporting options (d, D,
534 T or U is selected), append the milliseconds to the time.
535
536 n
537 Where appropriate, data such as disk KBs or transfers are
538 normalized to units per second by taking the change in a
539 counter and dividing by the number of seconds in that
540 interval. In the case of CPUs, utilization (calculated
541 in jiffies) is normalized as a percentage of the inter‐
542 val.
543
544 Normalization can be disabled via this option, the result
545 being the reported values are not divided by the duration
546 of the interval. This can be particulary useful for
547 reporting values that are < 1/2 the sampling, which will
548 be rounded to 0.
549
550 T
551 For use with terminal output and brief mode, preceeds
552 each line with a time stamp.
553
554 u
555 Create plot files with unique names by include the start‐
556 ing time of a colletion in the name. This forces multi‐
557 ple collections taken the same day to be written to mul‐
558 tiple files.
559
560 -U or --utc
561 In plot format only, report timestamps in Coordinated
562 Universal time which is more commonly know as UTC.
563
564 x
565 Report only exception records for selected subsystems.
566 Exception reporting also requires --verbose. Currently
567 this only applies to disk detail and Lustre server infor‐
568 mation so one must select at least -s D, l or L for this
569 to apply. If writing to a detail file, this data will go
570 into a separate file with the extension X appended to the
571 regular detail file name.
572
573 X
574 Report both exceptions as well as all details for
575 selected subsystems, for -s D, l or L only.
576
577 z
578 If the compression library has been installed, all output
579 files will be compressed by default. This switch tells
580 collectl not to compress any plottable files. If col‐
581 lectl tries to compress but cannot because the library
582 hasn't been installed, it will generate a warning which
583 can be suppressed with this switch.
584
585 -P, --plot
586 Generate output in plot format. This format is space separated
587 data which consists of a header (prefaced with a # for easy
588 identification by an analysis program as well as identifying it
589 as a comment for programs, such as gnuplot, which honor that
590 convention). When written to disk, which is the typical way
591 this option is used, summary data elements are written to the
592 tab file and the detail elements written to one or more files,
593 one per detail subsystem. If -f is not specified, all output is
594 sent to the terminal. Output is always one line per sampling
595 interval.
596
597 --stats
598 This switch will cause brief data to be reported as both totals
599 and averages after processing one or more files for the same day
600 or in playback mode.
601
602 --statopts option(s)
603 This switch controls the way brief stats are reported, the
604 default is to report the totals once, at the end of a day's
605 worth of raw files, if more than one.
606
607 a - include averages along with totals
608 i - include the interval data itself, which is the equivalent of
609 -oA
610 s - print summary stats at the end of each file processed even
611 if more than one per day
612
613 -s, --subsys subsystem
614 This field controls which subsystem data is to be collected or
615 played back. The default for collecting data is "cdn", which
616 stands for CPU, Disk and Network summary data and the default
617 for playback is everthing that was collected.
618
619 The rules for displaying results vary depending on the type of
620 data selected. If you write data for CPUs and DISKs to a raw
621 file and play it back with -sc, you will only see CPU data. If
622 you play it back with -scm you will still only see CPU data
623 since memory data was not collected. However, when used with
624 -P, collectl will always honor the subsystems specified with
625 this switch so in the previous example you will see CPU data
626 plus memory data of all 0s. To see the current set of default
627 subsystems, which are a subset of this full list, use -h.
628
629 You can also use + or - to add or subtract subsystems to/from
630 the default values. For example, "-s-cdn+N"< will remove cpu,
631 disk and network monitoring from the defaults while adding net‐
632 work detail.
633
634 Refer to data definitions on the sourceforge website OR in
635 /usr/share/collectl/doc/collectl-xxx to see complete descrip‐
636 tions of the data returned.
637
638 SUMMARY SUBSYSTEMS
639
640 b - buddy info (memory fragmentation)
641 c - CPU
642 d - Disk
643 f - NFS V3 Data
644 i - Inode and File System
645 j - Interrupts
646 l - Lustre
647 m - Memory
648 n - Networks
649 s - Sockets
650 t - TCP
651 x - Interconnect
652 y - Slabs (system object caches)
653
654 DETAIL SUBSYSTEMS
655
656 This is the set of detail data from which in most cases the cor‐
657 responding summary data is derived. There are currently 2 types
658 that do not have corresponding summary data and those are "Envi‐
659 ronmental" and "Process". So, if one has 3 disks and chooses
660 -sd, one will only see a single total taken across all 3 disks.
661 If one chooses -sD, individual disk totals will be reported but
662 no totals. Choosing -sdD will get you both.
663
664 C - CPU
665 D - Disk
666 E - Environmental data (fan, power, temp), via ipmitool
667 F - NFS Data
668 J - Interrupts
669 L - Lustre OST detail OR client Filesystem detail
670 M - Memory node data, which is also known as numa data
671 N - Networks
672 T - 65 TCP counters only available in plot format
673 X - Interconnect
674 Y - Slabs (system object caches)
675 Z - Processes
676
677 --showheader
678 In collectl mode this command will cause the header that is nor‐
679 mally written to a data file to be displayed on the terminal and
680 collectl then exists. This can be a handy way to get a brief
681 overview of the system configuration.
682
683 --showoptions
684 This command shows only the portion of the help text that
685 desribes the -o and --options switches to save the time of wad‐
686 ing through the entire help screen.
687
688 --showcolheaders
689 This command shows the first set of headers that will be printed
690 by collectl and exits. Doesn't really make sense for multi-sec‐
691 tion output like several sets of verbose or detail data. Also
692 note that since it requires one monitoring interval to build up
693 some headers which may be dynamic, it also forces the interval
694 to 0.
695
696 --showsubopts
697 List all the subsystem specifice options
698
699 --showtopopts
700 Show all the different values for the --top type field, which
701 specify the field(s) by to sort the data
702
703 --showrootslabs
704 This command only works on systems using the new slab allocator
705 and will list the root name (these are those entries in
706 /sys/slab which are not soft links) along with all its alias
707 names. If a name doesn't have an alias, it will not appear in
708 this report.
709
710 --showslabaliases
711 This command only works on systems using the new slab allocator.
712 Like --showrootslabs, it will name a slab and all its aliases
713 but rather than show the root slab name it will show one of the
714 aliases to provide a more meaningful name. If there are any
715 slabs that only have a single (or no) alias they will not be
716 included in this report.
717
718 --showsubopts
719 Similar to --showoptions, this command summaries just the para‐
720 maters associated with -O and --subopts.
721
722 --showsubsys
723 Yet another way to summare a portion of the help text, this com‐
724 mand only shows valid subsystems.
725
726 --top [type][,num[,v]]
727 Include the top "num" consumers by resource for this interval.
728 The default number is the height of the window if it can be
729 determined otherwise 24, and the default resource is the total
730 cpu time which is taken as the sum of SysT and UsrT. See
731 --showtopopts for a list of other types of data you can sort on.
732
733 This switch can also be used with -s in which case a portion of
734 the window is reserved at the top to fill in the subsystem data,
735 which is currently in verbose mode though a brief format is con‐
736 templated for some time in the future.
737
738 In interactive mode and if not specified, the process monitoring
739 interval will be set to that for other subsystems. The screen
740 will be cleared for each interval resulting in a display similar
741 to the "top" utility. In playback more the screen will NOT be
742 cleared. You cannot use this switch in "record" mode.
743
744 Finally, if v is specified as the 3rd parameter, the output
745 scrolls vertically (like playbak mode) rather than clearing the
746 screen between intervals.
747
748 --umask mask
749 Sets collectl's umask to control output file permissions. Only
750 root can set the umask. See "man umask" for details.
751
752 --utime mask
753 Write periodic micro-timestamps into raw file at different
754 points in time for fine grained measurements of operation times.
755 1 - write timestamps when entering major sections
756 2 - write timestamps for all /proc accesses except for process
757 data
758 4 - write timestamps for /proc data for all processes including
759 threads
760
761 -v
762 Show version and whether or not Compression and/or HiResTime
763 modules have been installed and exit.
764
765 -V
766 Show default parmeter and control settings, all of which can be
767 changed in /etc/collectl.conf
768
769 --verbose
770 Display output in verbose mode. This often displays more data
771 than in the default mode. When displaying detail data, verbose
772 mode is forced. Furthermore, if summary data for a single sub‐
773 system is to be displayed in verbose mode, the headers are only
774 repeated occasionally whereas if multiple subsystems are
775 involved each needs their own header.
776
777 -w
778 Disply data in wide mode. When displaying data on the terminal,
779 some data is formatted followed by a K, M or G as appropriate.
780 Selecting this switch will cause the full field to be displayed.
781 Note that there is no attempt to align data with the column
782 headings in this mode.
783
784
786 The following options are subsystem specific and typically filter data
787 for collection and/or display as well as affect the output format:
788
789 --cpufilt[^]perl-regx[,perl-regx...]
790 Works the same as dskfilt and netfilt, allows one to select a
791 subset of CPUs. These filters are also honored by interrupt
792 reporting as well.
793
794 --cpuopts
795 z - only applies to cpu details, do not report any CPUs with no
796 load. In other words all entries are zero except for IDLE.
797
798 --dskfilt [^]perl-regx[,perl-regx...]
799 NOTE - this does NOT effect data collection and ALL disk data
800 will always be collected, unless --rawdskfilt is specified too.
801 However, only data for disk names that match the pattern(s) will
802 be included in the summary totals and displayed when details are
803 requested. Alternatively, if you preface the first expression
804 with a caret, all names that match all strings will be excluded
805 from the summary totals and detail displays rather then
806 included. If you don't know perl, a partial string will usually
807 work too.
808
809 Just remember, this only applies to collected data and so if for
810 example you specify a parition, such as sda1, you'll never see
811 the data since it was filtered out at the time of data collec‐
812 tion. To see those stats you would need to say --rawdskfilt
813 sda1.
814
815 --dskopts
816 f - report some columns as fractions for more precision on
817 detail output
818 i - display the i/o sizes in brief mode just like with --iosize
819 o - exclude unused disks from new file headers and plot data
820 z - only applies to disk details, do not report any lines with
821 values of all zeros.
822
823 --dskremap aaa:bbb,ccc:ddd...
824 This will cause disk names matching the perl pattern aaa to be
825 replaced with the string bbb. In some cases, you may simply
826 want to remove the entire string in which case the second string
827 should be left empty. If you want to remove a string container
828 a /, be sure to escape it with a backslash.
829
830 --envopts Environmental Options
831 The default is to display ALL data but the following will cause
832 a subset to be displayed
833
834 f - display fan data
835 p - display current (power) data
836 t - display temperature data
837 C - convert temperature to Celcius if in Farenheit
838 F - convert temperature to Farenheit if in Celcius
839 M - display each type of data on separate line
840 T - display data truncated to whole integers (some implemena‐
841 tions displayed them with fractional components)
842 9 - any number, will tell ipmitool to read on this device number
843
844 --envfilt regx If specified, this regx is evaluated against each line
845 of data returned by ipmitool and only those that match are retained.
846 All other data is lost.
847
848 --envremap perl-regx,...
849 If specified as a comma separated list of perl regular substitu‐
850 tion expressions without the =~s portion, each expression is
851 applied to each environmental field name, thereby allowing one
852 to rename the column headers. This can be most useful when run‐
853 ning on heterogeneuos systems and you want consistent column
854 names.
855
856 --intfilt [^]perl-regx[,perl-regx...]
857 NOTE - this does NOT effect data collection, ALL interrupt data
858 will always be collected. However, only data for interrupts
859 that match the pattern(s) will be included in the summary totals
860 and displayed when details are requested. Alternatively, if you
861 preface the first expression with a caret, all names that match
862 all strings will be excluded from the summary totals and detail
863 displays rather then included. If you don't know perl, a par‐
864 tial string will usually work too.
865
866 NOTE - these expressions are applied to the entire line one sees
867 in /proc/interrupts, including the interrupt number, name and
868 even counters so if you do want to include an interrupt number
869 in the pattern be sure to include the trailing colon as well.
870
871 --lustopts Lustre Options
872 B - For clients and servers, show buffer stats
873 D - For MDSs and OSTs AND running earlier versions of HPSFS,
874 collect disk block iostats
875 M - For clients, collect metadata
876 O - For OSTs, show detail level stats
877 R - For client, collect readahead stats
878
879 --memopts Memory Options
880 R - show memory values (including swap space) as rates of change
881 as opposed to absolute values. One can also show absolute
882 changes between intervals by including -on.
883
884 --netfilt [^]perl-regx[,perl-regx...]
885 NOTE - this does NOT effect data collection and ALL network data
886 will always be collected, unless --rawnetfilt is specified too.
887 Also note that by default only eth, ib, em and p1p networks when
888 present are included in the summary. When this switch is speci‐
889 fied, only data for network names that match the pattern(s) will
890 be included in the summary and displayed when details are
891 requested. This switch therefore also gives you the ability to
892 add other, possibly new, network devices to the summary totals.
893
894 Alternatively, if you preface the first expression with a caret,
895 all names that match all strings will be excluded from the sum‐
896 mary totals and detail displays rather then included. If you
897 don't know perl, a partial string will usually work too.
898
899 --netopts
900 e - include network error counts in brief and explicit error
901 types elsewhere
902 E - only include lines with network errors in them
903 i - include i/o sizes in brief mode
904 o - exclude unused networks from new file headers and plot data
905 w - set width of network device name
906
907 --nfsfilt NFS Filters
908 Specify one or more comma separated filters as a C/S followed by
909 an nfs version number and only those will have data reported on.
910 For example, C2 says to report data on V2 Clients. As a data
911 collection performance optimization, if one or more client fil‐
912 ters are specified, data will actually be collected for all
913 clients as is also done for servers.
914
915 --nfsopts NFS Options q.RS z - only display detail lines which have
916 data
917
918 --procfilt Process Filters
919 These filters restrict which processes are selected for collec‐
920 tion/display. Using this filter will significanly reduce the
921 load on process data collection since collectl creates a black‐
922 list of those existing processes that do not pass the filter and
923 so are permanently excluded from any future processing.
924
925 The format of a filter is a one charter type followed by a match
926 string. Multiple filters may be specified if separated by com‐
927 mas.
928
929 c - substring of the command being executed as explicitly read
930 from /proc/pid/stat. Note that this can actually be a perl
931 expression, so if you want a command that ends in a particular
932 string all you need to is append a to the end of the string.
933 Otherwise it would match any commands containing that string.
934 C - any command that starts with the specified string
935 f - full path of the command, including arguments, as read from
936 /proc/pid/cmdline. Like the c modifier this too can be a perl
937 expression.
938 p - pid
939 P - parent pid
940 u - any process ownerd by this user's UID or in the range speci‐
941 fide by uxxx-yyy
942 U - any process owned by this username
943
944 caution: the process names collectl tries to match with c and C
945 is the second field in /proc/pid/stat which may not necessarily
946 be what you think! eg the name for X emacs is actually emacs-x
947
948 --procopts options
949 These options control the way data is displayed and can also
950 improve data collection performance
951
952 c - include CPU time of children who have exited (same as ps -S)
953 f - use cumulative totals for page faults in process data
954 instead of rates
955 i - show process I/O counters in display instead of default for‐
956 mat
957 I - disable collection of I/O counters, see note below
958 k - remove known shells from process names, making it possible
959 to see actual command
960 m - show breakdown of memory utilization instead of default for‐
961 mat
962 p - never look for new pids or threads during data collection
963 r - show root command name only (no directory) for narrower dis‐
964 play. Note that this is applied AFTER 'k' so if arg1 becomes the
965 new command it will be truncated now, which is very handy when
966 running in a virtual python environment
967 R - show ALL process priorities ('RT' currently displayed if
968 realtime)
969 s - show process start time in hh:mm:ss format
970 S - show process start time in mmmdd-hh:mm:ss format
971 t - include ALL process threads (increases collection overhead)
972 u - report username as 12 chars instead of 8, noting uxx will
973 cause column width to be xx but cannot be less than 8
974 w - widen display by including whole argument string, with
975 optional max width
976 x - include extended process attributes (currently only for con‐
977 text switches)
978 z - exclude any processes with 0 in sort field (in --top mode)
979
980 Process data is the most expensive type of data collected, cost‐
981 ing as much as 3 times the CPU load as all other types of data
982 combined. Collecting thread data makes this even more expen‐
983 sive. One can significantly reduce this load by over 25 percent
984 by disabling the collection of I/O stats. However, keep in mind
985 that even if you don't try to optimize process data collection,
986 the overall system load by collectl can still be on the order of
987 about 0.2% when running as a daemon with default collection
988 rates. See the online documentation on measuring performance
989 for more information.
990
991 A security hole was identified that allowed non-priviledged
992 users to read /proc/pid/io and guess password lengths and noe
993 many distros retrict access to the owner or root. As a result,
994 non-priviledged users will see all 0 I/O counts for processes
995 that are not theirs when specifying --procopt i.
996
997 --slabfilt Slab Filters
998 One can specify a list of slab names separated by commas and
999 only those slabs whose names start with those strings will be
1000 listed or summaried.
1001
1002 --slabopts Slab Options
1003 s - exclude any slabs with an allocation of 0
1004 S - only show those slabs whose allocations changed since last
1005 display
1006
1007 --tcpfilt
1008 These filters actually control both what is collected as well as
1009 displayed. If one selects non-collected filters, 0s will be
1010 reported. There is one special case and that is if one includes
1011 T (tcp extended stats) in the filter string, there are no brief
1012 ones and therefore --verbose will be forced.
1013 i - ip stats
1014 t - tcp stats
1015 u - udp stats
1016 c - icmp stats
1017 I - ip extended stats
1018 T - tcp excented stats
1019
1020 --xopts
1021 i - include i/o sizes in brief mode
1022
1023
1025 The collectl utility is a system monitoring tool that records or dis‐
1026 plays specific operating system data for one or more sets of subsys‐
1027 tems. Any set of the subsystems, such as CPU, Disks, Memory or Sockets
1028 can be included in or excluded from data collection. Data can either
1029 be displayed back to the terminal, or stored in either a compressed or
1030 uncompressed data file. The data files themselves can either be in raw
1031 format (essentially a direct copy from the associated /proc structures)
1032 or in a space separated plottable format such that it can be easily
1033 plotted using tools such as gnuplot or excel. Data files can be read
1034 and manipulated from the command line, or through use of command
1035 scripts.
1036
1037 Upon startup, collectl.conf is read, which sets a number of default
1038 parameters and switch values. Collectl searches for this file first in
1039 /etc, then in the directory the collectl execuable lives in (typically
1040 /usr/sbin) and finally the current directory. These locations can be
1041 overriden with the -C switch. Unless you're doing something really
1042 special, this file need never be touched, the only exception perhaps
1043 being when choosing to run collectl as a service and you wish to change
1044 it's default behavior which is set by the DaemonCommand entry.
1045
1046
1048 Thread reporting currently only works with 2.6 kernels.
1049
1050 The pagesize has been hardcoded for perl 5.6 systems to 4096 for IA32
1051 and 16384 for all others. If you are running 5.6 on a system with a
1052 different pagesize you will see incorrect SLAB allocation sizes and
1053 will need to scale the numbers you're seeing accordingly.
1054
1055 I have recently discovered there is a bug in /proc in that an extra
1056 line is occasionally read with the end of the previous buffer! When
1057 this occurs a message is written (if -m enabled) and always written to
1058 the terminal. Since this happens with a higher frequency with process
1059 data I silently ignore those as the output can get pretty noisey. If
1060 for any reason this is a problem, be sure to let me know.
1061
1062 Since collectl has no control over the frequency at which data gets
1063 written to /proc, one can get anomolous statistics as collectl is only
1064 reporting a snapshot of what is being recorded. For more information
1065 see http://collectl.sourceforge.net/TheMath.html.
1066
1067 At least one network card occasionally generates erroneous network
1068 stats and to try to keep the data rational, collectl tries to detect
1069 this and when it does generates a message that bogus data has been
1070 detected.
1071
1072
1074 http://collectl.sourceforge.net OR /opt/hp/collectl/docs
1075
1076
1078 I would like to thank Rob Urban for his creation of the Tru64 Unix col‐
1079 lect tool, which collectl is based on.
1080
1081
1083 This program was written by Mark Seger (mjseger@gmail.com).
1084 Copyright 2003-2016 Hewlett-Packard Development Company, LP
1085 collectl may be copied only under the terms of either the Artistic
1086 License or the GNU General Public License, which may be found in the
1087 source kit
1088
1089
1090
1091LOCAL APRIL 2003 COLLECTL(1)