1COLLECTL(1)                        Collectl                        COLLECTL(1)
2
3
4

NAME

6       collectl - Collects data that describes the current system status.
7
8

SYNOPSIS

10       Record  Mode  - read data from live system and write to file or display
11       on terminal
12
13       collectl [-f file] [options]
14
15       Playback Mode - read data from one or more raw data files  and  display
16       on terminal
17
18       collectl -p file1 [file2 ...] [options]
19
20

OPTIONS

22       Record Mode
23
24       In  this  mode data is taken from a live system and either displayed on
25       the terminal or written to one or more files or a socket.
26
27       --align
28              If the HiRes modules is present, collectl sample monitoring will
29              be aligned such that a sample will always be taken at the top of
30              a minute (this does NOT mean the first sample will  occur  then)
31              so  that  all instances of collectl running on any systems which
32              have their clocks synchronized will all take samples at the same
33              time.   Furthermore,  if  one is doing process monitoring, those
34              samples will also be taken at the top of the minute and  so  can
35              delay  the  start  of  sampling  up to 2 full process monitoring
36              intervals.
37
38       --all
39              Collect summary data for ALL subsystems except slabs, since slab
40              monitoring  requires a different monitoring interval.  This also
41              means you won't get any detail data  which  also  includes  pro‐
42              cesses  and  environmementals.  You can use this switch anywhere
43              -s can be used but not both together.  If  the  system  supports
44              lustre  and/or  interconnect monitoring those statistics will be
45              provided but the warnings produced when they are  not  available
46              you try to select them with -s will not be displayed.
47
48       --ALL
49              This is actually a superset of --all by adding detail statistics
50              as well with the exception of TCP details when displaying  to  a
51              terminal since those are only available with -P or -f.
52
53       -A, --address address[:port[:timeout]] | server[:port]
54              In  the  first form, one specifies an address, optional port and
55              timeout (the first colon is  required  to  specify  timeout  for
56              default port).  All data is then written to that socket prefaced
57              with the current host name at the named address and  port  until
58              the socket is closed, at which time collectl will exit.
59
60              In  the  second  form  one enters the text "server" and optional
61              port.  In this form, collectl runs as a server,  waiting  for  a
62              connection and once established writes data on that socket.  The
63              key difference here is if the client exists collectl keeps  run‐
64              ning  and  will  again look for a new connection, allowing it to
65              survive client restarts or crashes.
66
67              The default port is set at 2655 but can be changed  -  see  col‐
68              lectl.conf.
69
70              In  both  forms, one can additionally request local data logging
71              by specifying a combination of -P and -f.  See man collectl-log‐
72              ging for more details.
73
74       --comment string
75              Add  the  specified string to the end of the headers in the data
76              files. If any embedded spaces be sure to quote it.  This can  be
77              very  useful  when  doing  characterizations or benchmarking and
78              you're frequently  changing  system/application  parameters  and
79              restarting collectl between tests.
80
81       -C, --config filename
82              Name/location of the collectl configuration file.  If not speci‐
83              fied, collectl searches for collectl.conf  first  in  /etc  (the
84              default),  then in the same directory the collectl executable is
85              in, and finally the current working directory.
86
87       -c, --count Samples
88              The number of samples to record. This is one way of  3  ways  of
89              describing  how long collectl should run (see -r and -R ).  Note
90              that these 3 switches are mutually exclusive.
91
92       -D, --daemon
93              Run collectl as a daemon, primarily used when starting as a ser‐
94              vice.  One caveat about this mode is you can only run one copy.
95
96       --export file[,options]
97              This  requests that collectl does not print anything on the ter‐
98              minal (or send it to a socket)  using  the  standard  brief/ver‐
99              bose/plot  formats.  Instead it executes a perl "require" on the
100              named file, using an extension of ph if not specified.  It first
101              looks  in  the  current directory and if not there the directory
102              the  executable   is   in.    It   then   calls   the   function
103              "file"Init(options)  towards the beginning of collectl and again
104              as simply  "file"(@options) to generate the  exported  formatted
105              output.  See the online documentation on Exporting Custom Output
106              and Logging for more details.
107
108       -f, --filename Filename
109              This is the name of a file to write the output to.  For  details
110              on  how  the output files are named, see the File Naming section
111              of   the   documentation    on    collectl.sourceforge.net    OR
112              /usr/share/doc/collectl/FileNaming.html
113
114
115       -F, --flush seconds
116              Flush  output  buffers  after  this  number of seconds.  This is
117              equivalent to issuing kill -s USR1 at the same frequency (but  a
118              lot  easier!).   If  0, a flush will occur every data collection
119              interval.
120
121       --grep pattern
122              The main purpose of this switch is for those users who have dis‐
123              covered  there  is some data in the raw files that never appears
124              in any display and have taken to displaying it  themselves  with
125              grep.  Unfortunately this method does not include timestamps and
126              so makes it difficult to interpret the  results.   Even  if  you
127              include the timestamp from the file it is in UTC and so needs to
128              be translated to be of any real value.  This  switch  does  just
129              that and then some.
130
131              Specifically,  it  allows  you to playback a file and instead of
132              processing it normally it simply searches for any  entries  that
133              match  the  perl  pattern  and reports those lines prefaced with
134              time stamps.  You can optionally change the time format with the
135              usual  -o  options and can even select the timeframe with --from
136              and --thru.
137
138       --home
139              Always start the display for the current interval at the top  of
140              the  screen  also  known  as  the home position (non-plot format
141              only).  This generates a real-time, continously refreshing  dis‐
142              play when the data fits on a single screen.
143
144       --import file1[,options][:file2[,options]...]
145              This loads the named files and executes callbacks to them, which
146              is the API mechanism for importing additional metrics into  col‐
147              lectl.  See the webpage on the API for further detail.
148
149              Since  these  files  also include instructions for how to report
150              the output in all the various  forms,  you  will  also  need  to
151              include --import during playback.  Finally, since the default is
152              to seamlessly include imported data with  everything  else  col‐
153              lectl  reports,  if  you  ONLY want to display imported data you
154              much explicitly deselect all other subsystems either by  includ‐
155              ing  -s- (note the trailing minus sign) followed by all the sub‐
156              systems were recorded OR simply say -s-all.
157
158       -i, --interval interval[:interval2[:interval3]]
159              This is the sampling interval in seconds.   The  default  is  10
160              seconds  when  run  as  a  daemon  and  1 second otherwise.  The
161              process subsystem and slabs (-sY and -sZ)  are  sampled  at  the
162              lower rate of interval2.  Environmentals (-sE), which only apply
163              to a subset of hardware, are sampled at interval3.  Both  inter‐
164              val2  and  interval3,  if specified, must be an even multiple of
165              interval1.  The daemon default  is  -i10:60:300  and  all  other
166              modes  are  -i1:60:300.   To sample only processes once every 10
167              seconds use -i:10.
168
169       --nohup
170              Whenever collectl finishes a data collection interval, it checks
171              to  see  if  the starting parent has exited.  This is to prevent
172              the case in which someone might start a  copy  of  collectl  and
173              then  the  process  dies and collectl keeps running.  If that is
174              the behavior someone actually intends, they  should  start  col‐
175              lectl with --nohup.
176
177              NOTE - when running as a daemon, --nohup is implied.
178
179       --quiet
180              Whenever collectl wants to tell the user something, it assigns a
181              category to it such as Informational, Warning, Error  or  Fatal.
182              When run with -m, all messages are displayed for the user and if
183              logging data to a file with -f, these messages are also sent  to
184              a  log file which is in the data collection directory and has an
185              extenion of "log".  However, if -m  is  not  specified  Informa‐
186              tional  messages (such as collectl starting or stopping) are not
187              reported on the terminal but the other  3  are.   Sometimes  the
188              warnings can be annoying and one can suppress these with --quiet
189              though they will still be written to the message log in -f.  You
190              cannot suppress Error or Fatal errors.
191
192       -r, --rolllogs time[[,days[:months]][,minutes]]
193              When selected, collectl runs indefinately (or at least until the
194              system reboots).  The maximum number of raw  and/or  plot  files
195              that  will be retained (older ones are automatically deleted) is
196              controlled by the days field, the default is 7.  When -m is also
197              specified  to direct collectl to write messages to a log file in
198              the logging directory, the number of months to retain those logs
199              is  controlled  by  the months field and its default is 12.  The
200              increment field which is also optional (but is  position  depen‐
201              dent) specifies the duration of an individual collection file in
202              minutes the default of which is 1440 or 1 day.
203
204       --rawdskfilt
205              This switch overrides the DiskFilter  setting  in  collectl.conf
206              and  explicitly  defines  a  perl  regx expression against which
207              records from /prod/diskstats are selected for processing.   When
208              there  are a lot of disks to process, this can be a handy way to
209              reduce the amount of data collected and actually improve perfor‐
210              mance  since  there are less patterns to match each input record
211              against.  Just remember that unlike --dskfilt which only filters
212              during display, records filtered with this switch are never even
213              recorded and so lost forever.
214
215              You can optionally specify your filter with a leading  plus-sign
216              which  tells  collectl  to  just  add your filter to the default
217              specification.  Care should be taken here as longer filters will
218              slightly  increase  overhead  and  with  a  lot  of disks and/or
219              shorter monitoring intervals can add up.
220
221              As a side benefit of this switch, if you really want to look  at
222              partition  level stats you can do so by leaving off the trailing
223              space in the default pattern.
224
225              One must be also be careful in  selecting  the  correct  pattern
226              since  it's  easy  to get it wrong and you may end up collecting
227              the WRONG data!  To verify you are collecting what you think you
228              are,  make  a  test  run  using  -d4  to  see the raw data being
229              recorded in real-time.
230
231       --rawdskignore
232              This is the opposite of the rawdskfilt switch.   When  specified
233              any  disks  listed are completely ignored and will not appear in
234              the raw file.  Typically this switch is useful when you're  only
235              interested in recording a subset of disk statistics.
236
237       --rawnetfilt
238              This works just like --rawdskfilt except it applies to networks.
239              Unlike disk filtering which has an explicit default pattern, the
240              default  for  network  filtering is to simply record all network
241              data from /proc/net/dev.
242
243              The -d4 switch also works here, as well as  everywhere,  to  see
244              the raw data as it is being collected.
245
246       --rawnetignore
247              This  is  the  opposite  of the rawnetfilt switch and works just
248              like the  rawdskignore  switch.   When  specified  any  networks
249              listed  are  ignored and will not appear in the raw file.  Typi‐
250              cally this switch is  useful  when  you're  only  interested  in
251              recording a subset of network statistics.
252
253       --rawtoo
254              Only  available  in  conjunction with -P, this switch causes the
255              creation/logging of raw data  in  addition  to  plottable  data.
256              While  this  may  seem excessive, keep in mind that unlike plot‐
257              table data, raw data can be played back with different  switches
258              potentially  providing  more details.  The overhead to write out
259              this additional data is minimal, the only real cost  being  that
260              of extra disk space.
261
262       -R, --runas uid[:gid]
263              This  switch  only works when running in daemon mode and so must
264              be specified in the  DaemonCommands  line.   Its  presence  will
265              cause  collectl  to  write  the  collectl.pid file into the same
266              directory as its other output files as specified  by  -f,  since
267              /var/run  does  not  normally  grant  non-privileged users write
268              access.  Furthermore, the ownership of that directory must match
269              the  specified  ownership since collectl needs to write ALL it's
270              files to that directory and can no longer assume global  permis‐
271              sions when run as root.
272
273              This  WILL  also require manually modifying /etc/init.d/collectl
274              to change the PIDFILE variable to point to  the  same  directory
275              which  the -f switch in the DaemonCommands line of collectl.conf
276              points to.
277
278              As a final note of caution, since this mechanism  changes  where
279              collectl  reads/writes  its  pid  file,  once  you  start  using
280              --runas, all calls to run collectl as a daemon must use it or it
281              may be confused and exhibit unpredictable behavior.
282
283       -R, --runtime duration
284              Specify  the duration of data collection where the duration is a
285              number followed by one of  wdhms,  indicating  how  many  weeks,
286              days,  hours,  minutes  or seconds the collection is to be taken
287              for.
288
289       --sep separator
290              Specify the plot format separator - default is a space.  If this
291              is  a  numeric  field it is interpretted as the decimal value of
292              the associated ASCII character code.  Otherwise it is interpret‐
293              ted as the character itself.  In other words, "--sep :" sets the
294              separator character to a colon and "--sep 9" sets it to a  hori‐
295              zontal tab.  "--sep 58" would also set it to a colon.
296
297       --tworaw
298              The  switches  -G  and  --group  have been replaced by --rawtoo,
299              which is more rescriptive of its function.  When  specified,  it
300              tells  collectl  to  treat  process and slab data as an entirely
301              separate group of raw files, named with  the  extention  "rawp".
302              These  separate files can be played back and processed just like
303              any other collectl raw files and in fact one can even play  back
304              both at the same time if that is what is desired.  The only real
305              purpose of this switch is that on some systems  with  many  pro‐
306              cesses,  it  is  possible  to generate huge raw files (some have
307              been observerd to be >250MB!) and while  collectl  will  happily
308              play back/process these files it can take a long time.  By using
309              the --tworaw switch one still gets a huge  rawp  file,  but  the
310              normal  raw  file is a much more manageable size and as a result
311              will faster to process then when all data is combined  into  the
312              same file.
313
314       Playback Mode
315
316       In this mode, data is read from one or more data files that were gener‐
317       ated in Record Mode
318
319       --export Filename
320              When playing back a file, use this switch to create an identical
321              raw file differing only in the timeframe being convered, so nat‐
322              urally one must also include --from, --thru or  both.   Further,
323              since  the  resultant  file will contain the exact same raw data
324              you cannot select a subset using -s.  This  switch  is  actually
325              intended  for  a support function for situations where somone is
326              having problems playing back a file and a subset of the original
327              raw  file that covers the problem time has been requested, hope‐
328              fully allowing a significantly file to be posted or emailed.
329
330       --extract filename
331              If specified, rather than actually play back the file  specified
332              with  -p, ALL raw data between the date ranges is selected and a
333              subset of that raw file created.  The rules for how to interpret
334              the filename are the same as used for -f.
335
336       -f, --filename filename
337              If  specified,  this is the name of a file or directory to write
338              the output to (rather than the terminal).  See  the  description
339              for  details  on the format of this field.  This requires the -P
340              flag as well.
341
342       --from time range
343              Play back data starting with this  time,  which  may  optionally
344              include  the  ending  time  as  well,  which is of the format of
345              [date:]time[-[date:]time].   The  leading  0  of  the  hour   is
346              optional and if the seconds field is not specified is assumed to
347              be 0.  If no dates specified the  time(s)  apply  to  each  file
348              specified  by  -P.   Otherwise  the  time(s)  only  apply to the
349              first/last dates and any files between those dates will have all
350              their data reported.
351
352       --full
353              Full  mode  is  actually a superset of --verbose and if selected
354              will force --verbose.  It will also force the  RECORD  separator
355              to be printed for every interval even if only a single subsystem
356              was requested and to include the actual subsystems  that  follow
357              following  the  utc timestamp as a parsing aid for those who may
358              wish to parse the text output rather than the plot data.
359
360       --offsettime seconds
361              This field originally was  used  before  collectl  reported  the
362              timezone  in  the  file  headers  and allowed one to compensate.
363              Since then it is rarely needed except in two possible cases, one
364              in  which data on two systems is to be compared and they weren't
365              synchonized with ntp.  This allows all the times to be  reported
366              as  shifted by some number of seconds.  The other case (and this
367              is very rare) is when a clock had changed in  the  middle  of  a
368              sample  and  will not be converted correctly.  When this happens
369              one may have to play back the samples in pieces and manually set
370              the time offset.
371
372       --passwd filename
373              When  reporting  usernames  associated with a UID, use this file
374              for the mapping.  This is particularly important on systems run‐
375              ning NIS where this are no user names in /etc/passwd.
376
377       -p, --playback Filename
378              Read  data  from the specified playback file(s), noting that one
379              can use wildcards in the filename if  quoted  (if  playing  back
380              multiple  files  to the terminal you probably want to include -m
381              to see the filenames as they are processed).  The filename  must
382              either  end in raw or raw.gz.  As an added feature, since people
383              sometimes automate the running of this option and don't want  to
384              hard  code a date, you can specify the string YESTERDAY or TODAY
385              and they will be replaced in the filename string by  the  appro‐
386              priate date.
387
388       --pname name
389              By  default,  collectl  uses  the  file /var/run/collectl.pid to
390              indicate the pid of the running instance of collectl and prevent
391              multiple  copies from being run.  If you DO want to run a second
392              copy, this switch will cause collectl to change its process name
393              to collectl-name and use that name as the associated pid file as
394              well.
395
396       --procanalyze
397              When specified and there is process data in the raw file, a sum‐
398              mary  file  will be generated with one entry unique process con‐
399              taining such things as the total cpu consumed for both user  and
400              system,  min/max utilization of various memory types, total page
401              faults and several others.
402
403       --slabanalyze
404              When specified and there is slab data in the raw file, a summary
405              file  will  be  generated  with one entry unique slab containing
406              data on physical memory usage by that slab.
407
408       --thru time
409              Time thru which to play back a raw file.  See --from for more
410
411       Common Switches - both record and playback modes
412
413       -d, --debug debug
414              Control the level of debugging information, not typically  used.
415              For details see the source code.
416
417       -h, --help, -x, --helpext, -X, --helpall
418              Display  standard,  extended help message (which doesn't include
419              the  optional  displays  such  as  --showoptions,  --showsubsys,
420              --showsubopts, --showtopopts) or everything.
421
422       --hr, --headerrepeat num
423              Sets  the number of intervals to display data for before repeat‐
424              ing the header.  A value -1 will prevent any headers from  being
425              displayed and a value of 0 will cause only a single header to be
426              displayed and never repeated.
427
428       --iosize
429              In brief mode, include iosize with disk, infiniband and  network
430              data.
431
432       -l, --limits limit
433              Override one or more default exception limits.  If more than one
434              limit they must be separated by hyphens.  Current values are:
435
436              SVC:value
437                     Report partition activity with Service times >= 30 msec
438
439              IOS:value
440                     Report device activity with 10 or more  reads  or  writes
441                     per second
442
443              LusKBS:value
444                     Report  client  or OSS activity greater than limit.  Only
445                     applies  to  Client  Summary  or  OSS  Detail  reporting.
446                     [default=100000]
447
448              LusReints:value
449                     Report  MDS activity with Reint greater than limit.  Only
450                     applies to MDS Summary reporting.  [default=1000]
451
452              AND
453                     Both the IOS and SCV limits  must  be  reached  before  a
454                     device  is  reported.   This  is the default value and is
455                     only included for completeness.
456
457              OR
458                     Report device activity if either IOS  or  SVC  thresholds
459                     are reached.
460
461              -L, --lustsvcs [c|m|o][:seconds]
462                     This  switch  limits  which servics lustre checks for and
463                     the frequency of those checks.  For more information  see
464                     the man page collectl-lustre.
465
466       -m, --messages
467              Write  status to a monthly log file in the same directory as the
468              output file (requires -f to be specified as well).  The name  of
469              the file will be collectl-yyyymm.log and will track various mes‐
470              sages that may get generated during every run of collectl.
471
472       -N, --nice
473              Set priority to a nicer one of 10.
474
475       -o, --options Options
476              These apply to the way output is displayed OR written to a  plot
477              file.   They  do not effect the way data is selected for record‐
478              ing.  Most of these switches work in  both  record  as  well  as
479              playback mode.  If you're not sure, just try it.
480
481              1
482                     Data  in  plotting  format  should use 1 decimal point of
483                     precision as appropriate.
484
485              2
486                     Data in plotting format should use 2  decimal  points  of
487                     precision as appropriate.
488
489              a
490                     Always  append data to an existing plot file.  By default
491                     if a plot file exists, the playback file will be  skipped
492                     as  a  way  of  assuring  it  is associated with a single
493                     recorded file.   This  switch  overrides  that  mechanism
494                     allowing muliple recorded files to be processed and writ‐
495                     ten to a single plot file.
496
497              c
498                     Always open newly named plot fies in create  mode,  over‐
499                     writing  any  old  ones  that may already exists.  If one
500                     processes multiple files for the same day in append  mode
501                     multiple  times,  the  same  data will be appended to the
502                     same file mulitple times.  This assures  a  new  file  is
503                     created at the start of the processing.
504
505              d
506                     For  use  with  terminal output and  brief mode.  Preceed
507                     each line with a date/time stamp, the date being in mm/dd
508                     format.  This option can also be applied to plot formatit
509                     which will cause the date portion to also be displayed in
510                     this format as opposed to D format.
511
512              D
513                     For  use  with  terminal  output and brief mode.  Preceed
514                     each line with a  date/time  stamp,  the  date  being  in
515                     yyyymmdd format.
516
517              g
518                     For  use with terminal output and brief mode.   When dis‐
519                     playing values of 1G or greater there is  limited  preci‐
520                     sion  for 1 digit values.  This options provides a way to
521                     display additional digits for more granularity by substi‐
522                     tuting a "g" for the decimal point rather than the trail‐
523                     ing "G".
524
525              G
526                     For use with terminal output and  brief  mode.   This  is
527                     similar  to  "g" but preserves the trailing "G" by sacri‐
528                     ficing a digit of granularity.
529
530              m
531                     Whenever times are reported in plot format, in the normal
532                     terminal  reporting format at the bginning of each inter‐
533                     val or when when one of the time reporting options (d, D,
534                     T or U is selected), append the milliseconds to the time.
535
536              n
537                     Where appropriate, data such as disk KBs or transfers are
538                     normalized to units per second by taking the change in  a
539                     counter  and  dividing  by  the number of seconds in that
540                     interval.  In the case of CPUs,  utilization  (calculated
541                     in  jiffies)  is normalized as a percentage of the inter‐
542                     val.
543
544                     Normalization can be disabled via this option, the result
545                     being the reported values are not divided by the duration
546                     of the interval.  This  can  be  particulary  useful  for
547                     reporting  values that are < 1/2 the sampling, which will
548                     be rounded to 0.
549
550              T
551                     For use with terminal output  and  brief  mode,  preceeds
552                     each line with a time stamp.
553
554              u
555                     Create plot files with unique names by include the start‐
556                     ing time of a colletion in the name.  This forces  multi‐
557                     ple  collections taken the same day to be written to mul‐
558                     tiple files.
559
560              -U or --utc
561                     In plot format only,  report  timestamps  in  Coordinated
562                     Universal time which is more commonly know as UTC.
563
564              x
565                     Report  only  exception  records for selected subsystems.
566                     Exception reporting also requires  --verbose.   Currently
567                     this only applies to disk detail and Lustre server infor‐
568                     mation so one must select at least -s D, l or L for  this
569                     to apply.  If writing to a detail file, this data will go
570                     into a separate file with the extension X appended to the
571                     regular detail file name.
572
573              X
574                     Report  both  exceptions  as  well  as  all  details  for
575                     selected subsystems, for -s D, l or L only.
576
577              z
578                     If the compression library has been installed, all output
579                     files  will  be compressed by default.  This switch tells
580                     collectl not to compress any plottable  files.   If  col‐
581                     lectl  tries  to  compress but cannot because the library
582                     hasn't been installed, it will generate a  warning  which
583                     can be suppressed with this switch.
584
585       -P, --plot
586              Generate  output in plot format.  This format is space separated
587              data which consists of a header (prefaced  with  a  #  for  easy
588              identification  by an analysis program as well as identifying it
589              as a comment for programs, such as  gnuplot,  which  honor  that
590              convention).   When  written  to  disk, which is the typical way
591              this option is used, summary data elements are  written  to  the
592              tab  file  and the detail elements written to one or more files,
593              one per detail subsystem.  If -f is not specified, all output is
594              sent  to  the  terminal.  Output is always one line per sampling
595              interval.
596
597       --stats
598              This switch will cause brief data to be reported as both  totals
599              and averages after processing one or more files for the same day
600              or in playback mode.
601
602       --statopts option(s)
603              This switch controls the  way  brief  stats  are  reported,  the
604              default  is  to  report  the  totals once, at the end of a day's
605              worth of raw files, if more than one.
606
607              a - include averages along with totals
608              i - include the interval data itself, which is the equivalent of
609              -oA
610              s  -  print summary stats at the end of each file processed even
611              if more than one per day
612
613       -s, --subsys subsystem
614              This field controls which subsystem data is to be  collected  or
615              played  back.   The  default for collecting data is "cdn", which
616              stands for CPU, Disk and Network summary data  and  the  default
617              for playback is everthing that was collected.
618
619              The  rules  for displaying results vary depending on the type of
620              data selected.  If you write data for CPUs and DISKs  to  a  raw
621              file  and play it back with -sc, you will only see CPU data.  If
622              you play it back with -scm you will  still  only  see  CPU  data
623              since  memory  data  was not collected.  However, when used with
624              -P, collectl will always honor  the  subsystems  specified  with
625              this  switch  so  in  the previous example you will see CPU data
626              plus memory data of all 0s.  To see the current set  of  default
627              subsystems, which are a subset of this full list, use -h.
628
629              You  can  also  use + or - to add or subtract subsystems to/from
630              the default values.  For example, "-s-cdn+N"< will  remove  cpu,
631              disk  and network monitoring from the defaults while adding net‐
632              work detail.
633
634              Refer to data definitions  on  the  sourceforge  website  OR  in
635              /usr/share/collectl/doc/collectl-xxx  to  see  complete descrip‐
636              tions of the data returned.
637
638              SUMMARY SUBSYSTEMS
639
640              b - buddy info (memory fragmentation)
641              c - CPU
642              d - Disk
643              f - NFS V3 Data
644              i - Inode and File System
645              j - Interrupts
646              l - Lustre
647              m - Memory
648              n - Networks
649              s - Sockets
650              t - TCP
651              x - Interconnect
652              y - Slabs (system object caches)
653
654              DETAIL SUBSYSTEMS
655
656              This is the set of detail data from which in most cases the cor‐
657              responding summary data is derived.  There are currently 2 types
658              that do not have corresponding summary data and those are "Envi‐
659              ronmental"  and  "Process".   So, if one has 3 disks and chooses
660              -sd, one will only see a single total taken across all 3  disks.
661              If  one chooses -sD, individual disk totals will be reported but
662              no totals.  Choosing -sdD will get you both.
663
664              C - CPU
665              D - Disk
666              E - Environmental data (fan, power, temp),  via ipmitool
667              F - NFS Data
668              J - Interrupts
669              L - Lustre OST detail OR client Filesystem detail
670              M - Memory node data, which is also known as numa data
671              N - Networks
672              T - 65 TCP counters only available in plot format
673              X - Interconnect
674              Y - Slabs (system object caches)
675              Z - Processes
676
677       --showheader
678              In collectl mode this command will cause the header that is nor‐
679              mally written to a data file to be displayed on the terminal and
680              collectl then exists.  This can be a handy way to  get  a  brief
681              overview of the system configuration.
682
683       --showoptions
684              This  command  shows  only  the  portion  of  the help text that
685              desribes the -o and --options switches to save the time of  wad‐
686              ing through the entire help screen.
687
688       --showcolheaders
689              This command shows the first set of headers that will be printed
690              by collectl and exits.  Doesn't really make sense for multi-sec‐
691              tion  output  like several sets of verbose or detail data.  Also
692              note that since it requires one monitoring interval to build  up
693              some  headers  which may be dynamic, it also forces the interval
694              to 0.
695
696       --showsubopts
697              List all the subsystem specifice options
698
699       --showtopopts
700              Show all the different values for the --top  type  field,  which
701              specify the field(s) by to sort the data
702
703       --showrootslabs
704              This  command only works on systems using the new slab allocator
705              and will  list  the  root  name  (these  are  those  entries  in
706              /sys/slab  which  are  not  soft links) along with all its alias
707              names.  If a name doesn't have an alias, it will not  appear  in
708              this report.
709
710       --showslabaliases
711              This command only works on systems using the new slab allocator.
712              Like --showrootslabs, it will name a slab and  all  its  aliases
713              but  rather than show the root slab name it will show one of the
714              aliases to provide a more meaningful name.   If  there  are  any
715              slabs  that  only  have  a single (or no) alias they will not be
716              included in this report.
717
718       --showsubopts
719              Similar to --showoptions, this command summaries just the  para‐
720              maters associated with -O and --subopts.
721
722       --showsubsys
723              Yet another way to summare a portion of the help text, this com‐
724              mand only shows valid subsystems.
725
726       --top [type][,num[,v]]
727              Include the top "num" consumers by resource for  this  interval.
728              The  default  number  is  the  height of the window if it can be
729              determined otherwise 24, and the default resource is  the  total
730              cpu  time  which  is  taken  as  the  sum of SysT and UsrT.  See
731              --showtopopts for a list of other types of data you can sort on.
732
733              This switch can also be used with -s in which case a portion  of
734              the window is reserved at the top to fill in the subsystem data,
735              which is currently in verbose mode though a brief format is con‐
736              templated for some time in the future.
737
738              In interactive mode and if not specified, the process monitoring
739              interval will be set to that for other subsystems.   The  screen
740              will be cleared for each interval resulting in a display similar
741              to the "top" utility.  In playback more the screen will  NOT  be
742              cleared.  You cannot use this switch in "record" mode.
743
744              Finally,  if  v  is  specified  as the 3rd parameter, the output
745              scrolls vertically (like playbak mode) rather than clearing  the
746              screen between intervals.
747
748       --umask mask
749              Sets  collectl's umask to control output file permissions.  Only
750              root can set the umask.  See "man umask" for details.
751
752       --utime mask
753              Write periodic  micro-timestamps  into  raw  file  at  different
754              points in time for fine grained measurements of operation times.
755              1 - write timestamps when entering major sections
756              2  -  write timestamps for all /proc accesses except for process
757              data
758              4 - write timestamps for /proc data for all processes  including
759              threads
760
761       -v
762              Show  version  and  whether  or not Compression and/or HiResTime
763              modules have been installed and exit.
764
765       -V
766              Show default parmeter and control settings, all of which can  be
767              changed in /etc/collectl.conf
768
769       --verbose
770              Display  output  in verbose mode.  This often displays more data
771              than in the default mode.  When displaying detail data,  verbose
772              mode  is forced.  Furthermore, if summary data for a single sub‐
773              system is to be displayed in verbose mode, the headers are  only
774              repeated   occasionally   whereas  if  multiple  subsystems  are
775              involved each needs their own header.
776
777       -w
778              Disply data in wide mode.  When displaying data on the terminal,
779              some  data  is formatted followed by a K, M or G as appropriate.
780              Selecting this switch will cause the full field to be displayed.
781              Note  that  there  is  no  attempt to align data with the column
782              headings in this mode.
783
784

SUBSYSTEM OPTIONS

786       The following options are subsystem specific and typically filter  data
787       for collection and/or display as well as affect the output format:
788
789       --cpufilt[^]perl-regx[,perl-regx...]
790              Works  the  same  as dskfilt and netfilt, allows one to select a
791              subset of CPUs.  These filters are  also  honored  by  interrupt
792              reporting as well.
793
794       --cpuopts
795              z  - only applies to cpu details, do not report any CPUs with no
796              load.  In other words all entries are zero except for IDLE.
797
798       --dskfilt [^]perl-regx[,perl-regx...]
799              NOTE - this does NOT effect data collection and  ALL  disk  data
800              will  always be collected, unless --rawdskfilt is specified too.
801              However, only data for disk names that match the pattern(s) will
802              be included in the summary totals and displayed when details are
803              requested.  Alternatively, if you preface the  first  expression
804              with  a caret, all names that match all strings will be excluded
805              from  the  summary  totals  and  detail  displays  rather   then
806              included.  If you don't know perl, a partial string will usually
807              work too.
808
809              Just remember, this only applies to collected data and so if for
810              example  you  specify a parition, such as sda1, you'll never see
811              the data since it was filtered out at the time of  data  collec‐
812              tion.   To  see  those  stats you would need to say --rawdskfilt
813              sda1.
814
815       --dskopts
816              f - report some columns  as  fractions  for  more  precision  on
817              detail output
818              i - display the i/o sizes in brief mode just like with --iosize
819              o - exclude unused disks from new file headers and plot data
820              z  -  only applies to disk details, do not report any lines with
821              values of all zeros.
822
823       --dskremap aaa:bbb,ccc:ddd...
824              This will cause disk names matching the perl pattern aaa  to  be
825              replaced  with  the  string  bbb.  In some cases, you may simply
826              want to remove the entire string in which case the second string
827              should  be left empty.  If you want to remove a string container
828              a /, be sure to escape it with a backslash.
829
830       --envopts Environmental Options
831              The default is to display ALL data but the following will  cause
832              a subset to be displayed
833
834              f - display fan data
835              p - display current (power) data
836              t - display temperature data
837              C - convert temperature to Celcius if in Farenheit
838              F - convert temperature to Farenheit if in Celcius
839              M - display each type of data on separate line
840              T  -  display  data truncated to whole integers (some implemena‐
841              tions displayed them with fractional components)
842              9 - any number, will tell ipmitool to read on this device number
843
844       --envfilt regx If specified, this regx is evaluated against  each  line
845       of  data  returned  by ipmitool and only those that match are retained.
846       All other data is lost.
847
848       --envremap perl-regx,...
849              If specified as a comma separated list of perl regular substitu‐
850              tion  expressions  without  the  =~s portion, each expression is
851              applied to each environmental field name, thereby  allowing  one
852              to rename the column headers.  This can be most useful when run‐
853              ning on heterogeneuos systems and  you  want  consistent  column
854              names.
855
856       --intfilt [^]perl-regx[,perl-regx...]
857              NOTE - this does NOT effect data collection,  ALL interrupt data
858              will always be collected.  However,  only  data  for  interrupts
859              that match the pattern(s) will be included in the summary totals
860              and displayed when details are requested.  Alternatively, if you
861              preface  the first expression with a caret, all names that match
862              all strings will be excluded from the summary totals and  detail
863              displays  rather  then included.  If you don't know perl, a par‐
864              tial string will usually work too.
865
866              NOTE - these expressions are applied to the entire line one sees
867              in  /proc/interrupts,  including  the interrupt number, name and
868              even counters so if you do want to include an  interrupt  number
869              in the pattern be sure to include the trailing colon as well.
870
871       --lustopts Lustre Options
872              B - For clients and servers, show buffer stats
873              D  -  For  MDSs  and OSTs AND running earlier versions of HPSFS,
874              collect disk block iostats
875              M - For clients, collect metadata
876              O - For OSTs, show detail level stats
877              R - For client, collect readahead stats
878
879       --memopts Memory Options
880              R - show memory values (including swap space) as rates of change
881              as  opposed  to  absolute  values.   One  can also show absolute
882              changes between intervals by including -on.
883
884       --netfilt [^]perl-regx[,perl-regx...]
885              NOTE - this does NOT effect data collection and ALL network data
886              will  always be collected, unless --rawnetfilt is specified too.
887              Also note that by default only eth, ib, em and p1p networks when
888              present are included in the summary.  When this switch is speci‐
889              fied, only data for network names that match the pattern(s) will
890              be  included  in  the  summary  and  displayed  when details are
891              requested.  This switch therefore also gives you the ability  to
892              add other, possibly new, network devices to the summary totals.
893
894              Alternatively, if you preface the first expression with a caret,
895              all names that match all strings will be excluded from the  sum‐
896              mary  totals  and  detail displays rather then included.  If you
897              don't know perl, a partial string will usually work too.
898
899       --netopts
900              e - include network error counts in  brief  and  explicit  error
901              types elsewhere
902              E - only include lines with network errors in them
903              i - include i/o sizes in brief mode
904              o - exclude unused networks from new file headers and plot data
905              w - set width of network device name
906
907       --nfsfilt NFS Filters
908              Specify one or more comma separated filters as a C/S followed by
909              an nfs version number and only those will have data reported on.
910              For  example,  C2  says to report data on V2 Clients.  As a data
911              collection performance optimization, if one or more client  fil‐
912              ters  are  specified,  data  will  actually be collected for all
913              clients as is also done for servers.
914
915       --nfsopts NFS Options q.RS z - only display  detail  lines  which  have
916       data
917
918       --procfilt Process Filters
919              These  filters restrict which processes are selected for collec‐
920              tion/display.  Using this filter will  significanly  reduce  the
921              load  on process data collection since collectl creates a black‐
922              list of those existing processes that do not pass the filter and
923              so are permanently excluded from any future processing.
924
925              The format of a filter is a one charter type followed by a match
926              string.  Multiple filters may be specified if separated by  com‐
927              mas.
928
929              c  -  substring of the command being executed as explicitly read
930              from /proc/pid/stat.  Note that this  can  actually  be  a  perl
931              expression,  so  if you want a command that ends in a particular
932              string all you need to is append a to the  end  of  the  string.
933              Otherwise it would match any commands containing that string.
934              C - any command that starts with the specified string
935              f  - full path of the command, including arguments, as read from
936              /proc/pid/cmdline.  Like the c modifier this too can be  a  perl
937              expression.
938              p - pid
939              P - parent pid
940              u - any process ownerd by this user's UID or in the range speci‐
941              fide by uxxx-yyy
942              U - any process owned by this username
943
944              caution: the process names collectl tries to match with c and  C
945              is  the second field in /proc/pid/stat which may not necessarily
946              be what you think!  eg the name for X emacs is actually emacs-x
947
948       --procopts options
949              These options control the way data is  displayed  and  can  also
950              improve data collection  performance
951
952              c - include CPU time of children who have exited (same as ps -S)
953              f  -  use  cumulative  totals  for  page  faults in process data
954              instead of rates
955              i - show process I/O counters in display instead of default for‐
956              mat
957              I - disable collection of I/O counters, see note below
958              k  -  remove known shells from process names, making it possible
959              to see actual command
960              m - show breakdown of memory utilization instead of default for‐
961              mat
962              p - never look for new pids or threads during data collection
963              r - show root command name only (no directory) for narrower dis‐
964              play. Note that this is applied AFTER 'k' so if arg1 becomes the
965              new  command  it will be truncated now, which is very handy when
966              running in a virtual python environment
967              R - show ALL process priorities  ('RT'  currently  displayed  if
968              realtime)
969              s - show process start time in hh:mm:ss format
970              S - show process start time in mmmdd-hh:mm:ss format
971              t - include ALL process threads (increases collection overhead)
972              u  -  report  username as 12 chars instead of 8, noting uxx will
973              cause column width to be xx but cannot be less than 8
974              w - widen display  by  including  whole  argument  string,  with
975              optional max width
976              x - include extended process attributes (currently only for con‐
977              text switches)
978              z - exclude any processes with 0 in sort field (in --top mode)
979
980              Process data is the most expensive type of data collected, cost‐
981              ing  as  much as 3 times the CPU load as all other types of data
982              combined.  Collecting thread data makes this  even  more  expen‐
983              sive.  One can significantly reduce this load by over 25 percent
984              by disabling the collection of I/O stats.  However, keep in mind
985              that  even if you don't try to optimize process data collection,
986              the overall system load by collectl can still be on the order of
987              about  0.2%  when  running  as  a daemon with default collection
988              rates.  See the online documentation  on  measuring  performance
989              for more information.
990
991              A  security  hole  was  identified  that allowed non-priviledged
992              users to read /proc/pid/io and guess password  lengths  and  noe
993              many  distros retrict access to the owner or root.  As a result,
994              non-priviledged users will see all 0 I/O  counts  for  processes
995              that are not theirs when specifying --procopt i.
996
997       --slabfilt Slab Filters
998              One  can  specify  a  list of slab names separated by commas and
999              only those slabs whose names start with those  strings  will  be
1000              listed or summaried.
1001
1002       --slabopts Slab Options
1003              s - exclude any slabs with an allocation of 0
1004              S  -  only show those slabs whose allocations changed since last
1005              display
1006
1007       --tcpfilt
1008              These filters actually control both what is collected as well as
1009              displayed.   If  one  selects  non-collected filters, 0s will be
1010              reported.  There is one special case and that is if one includes
1011              T  (tcp extended stats) in the filter string, there are no brief
1012              ones and therefore --verbose will be forced.
1013              i - ip stats
1014              t - tcp stats
1015              u - udp stats
1016              c - icmp stats
1017              I - ip extended stats
1018              T - tcp excented stats
1019
1020       --xopts
1021              i - include i/o sizes in brief mode
1022
1023

DESCRIPTION

1025       The collectl utility is a system monitoring tool that records  or  dis‐
1026       plays  specific  operating  system data for one or more sets of subsys‐
1027       tems. Any set of the subsystems, such as CPU, Disks, Memory or  Sockets
1028       can  be  included in or excluded from data collection.  Data can either
1029       be displayed back to the terminal, or stored in either a compressed  or
1030       uncompressed  data file. The data files themselves can either be in raw
1031       format (essentially a direct copy from the associated /proc structures)
1032       or  in  a  space  separated plottable format such that it can be easily
1033       plotted using tools such as gnuplot or excel.  Data files can  be  read
1034       and  manipulated  from  the  command  line,  or  through use of command
1035       scripts.
1036
1037       Upon startup, collectl.conf is read, which sets  a  number  of  default
1038       parameters and switch values.  Collectl searches for this file first in
1039       /etc, then in the directory the collectl execuable lives in  (typically
1040       /usr/sbin)  and  finally the current directory.  These locations can be
1041       overriden with the -C switch.  Unless  you're  doing  something  really
1042       special,  this  file  need never be touched, the only exception perhaps
1043       being when choosing to run collectl as a service and you wish to change
1044       it's default behavior which is set by the DaemonCommand entry.
1045
1046

RESTRICTIONS/PROBLEMS

1048       Thread reporting currently only works with 2.6 kernels.
1049
1050       The  pagesize  has been hardcoded for perl 5.6 systems to 4096 for IA32
1051       and 16384 for all others.  If you are running 5.6 on a  system  with  a
1052       different  pagesize  you  will  see incorrect SLAB allocation sizes and
1053       will need to scale the numbers you're seeing accordingly.
1054
1055       I have recently discovered there is a bug in /proc  in  that  an  extra
1056       line  is  occasionally  read with the end of the previous buffer!  When
1057       this occurs a message is written (if -m enabled) and always written  to
1058       the  terminal.  Since this happens with a higher frequency with process
1059       data I silently ignore those as the output can get pretty  noisey.   If
1060       for any reason this is a problem, be sure to let me know.
1061
1062       Since  collectl  has  no  control over the frequency at which data gets
1063       written to /proc, one can get anomolous statistics as collectl is  only
1064       reporting  a  snapshot of what is being recorded.  For more information
1065       see http://collectl.sourceforge.net/TheMath.html.
1066
1067       At least one network  card  occasionally  generates  erroneous  network
1068       stats  and  to  try to keep the data rational, collectl tries to detect
1069       this and when it does generates a message  that  bogus  data  has  been
1070       detected.
1071
1072

FILES, EXAMPLES AND MORE INFORMATION

1074       http://collectl.sourceforge.net OR /opt/hp/collectl/docs
1075
1076

ACKNOWLEDGEMENTS

1078       I would like to thank Rob Urban for his creation of the Tru64 Unix col‐
1079       lect tool, which collectl is based on.
1080
1081

AUTHOR

1083       This program was written by Mark Seger (mjseger@gmail.com).
1084       Copyright 2003-2016 Hewlett-Packard Development Company, LP
1085       collectl may be copied only under the  terms  of  either  the  Artistic
1086       License  or  the  GNU General Public License, which may be found in the
1087       source kit
1088
1089
1090
1091LOCAL                             APRIL 2003                       COLLECTL(1)
Impressum