rig(1) - f35

1rig(1)                      General Commands Manual                     rig(1)
2
3
4

NAME

6       rig - Monitor a system for events and trigger specific actions
7

USAGE

9       rig <RESOURCE OR SUBCOMMAND> [OPTIONS] <ACTIONS> [ACTION OPTIONS]
10
11

DESCRIPTION

13       rig is a tool to assist in troubleshooting seemingly randomly occurring
14       events or events that occur at times that make active monitoring  by  a
15       sysadmin difficult.
16
17       rig  sets-up  detached  processes,  known as 'rigs', that watch a given
18       resource for a trigger condition, and once that  trigger  condition  is
19       met takes actions defined by the user.
20
21
22

GLOBAL OPTIONS

24       The following are options available to all rigs (resources).
25
26
27       --delay DELAY
28              Specify  the  number of seconds to wait after a rig is triggered
29              before running the configured actions. Note that  the  rig  will
30              still  trigger  and  stop all watcher threads immediately - this
31              delay comes after thread termination but before action execution
32              in  order  to  avoid  a  possible  race condition where multiple
33              watcher threads could conceivably trigger during a  sufficiently
34              high delay time.
35
36              Default  0 seconds, meaning execute actions immediately upon rig
37              trigger condition being met.
38
39
40       --debug
41              Set logging level to debug instead of the default info level.
42
43
44       --foreground
45              Run the rig in the foreground, keeping stdout attached.
46
47       --interval SECONDS
48              Specify the amount of time  to  wait  between  a  rig's  polling
49              cycles. Most rigs monitor their resources in a flow of update ->
50              compare -> wait, where wait is simply sleeping  until  the  next
51              needed  update.  Use  this  option  to set how long a rig should
52              wait/sleep before updating their monitors again.
53
54              Default: 1, meaning update and compare once every second.
55
56
57       --name NAME
58              Give the rig a name, rather than  generating  a  random  one  at
59              deployment.
60
61              By  default,  rigs  are  given  a randomly generated string as a
62              name, which will appear in rig info  output  and  in  the  rig's
63              socket  name.  Using  this  option  will  use  the provided name
64              instead, and may be useful in distinguishing rigs  when  several
65              are deployed at one time
66
67       --no-archive
68              Do  not  create  a tar archive of the collected data after a rig
69              has been triggered.
70
71              Normally, once all data has been collected, rig  will  create  a
72              gzip'd  tar archive under /var/tmp containing all the files cre‐
73              ated from the rig's actions - after which, the temp directory at
74              /var/tmp/rig/<id>/ is deleted.
75
76              Using  this  option skips creating the archive and preserves the
77              temp directory.
78
79
80       --repeat COUNT
81              Repeat certain actions COUNT number of times after  the  initial
82              execution of the action.
83
84              Actions  will,  unless  otherwise specified by this option, only
85              execute once. Using this option actions that support  repetition
86              will  be repeated an additional COUNT number of times. For exam‐
87              ple, using --repeat 2 will result in  repeatable  actions  being
88              executed three (3) total times.
89
90              Not  every action supports repetition - in fact most do not. See
91              specific action's sections for  information  on  if  it  can  be
92              repeated or not.
93
94
95       --repeat-delay SECONDS
96              Number  of  seconds to wait between repetitive executions of the
97              same action.
98
99              This can be useful when using an action like gcore when you want
100              to  get coredumps over a certain time period. For example, using
101              --repeat 1 --repeat-delay 60 will give  you  two  (2)  coredumps
102              taken one minute apart.
103
104              Defaults to one second.
105
106
107       --restart COUNT
108              Restart a configured rig up to COUNT number of times after being
109              triggered.
110
111              By default, a rig will trigger once and  then  terminate.  Using
112              this  option,  an  individual rig may restart itself up to COUNT
113              number  of  times,  producing  an  additional  archive  of   the
114              requested data after the triggering event happens again.
115
116              Note  that this is the number of times to restart, not the total
117              number of times to run. Using a restart value of '2' means  that
118              there will be 3 total archives generated for a rig.
119
120              By  default, this is set to 0, meaning terminate after the first
121              trigger event.  Use a value of '-1' to have  a  rig  perpetually
122              restart itself without limit.
123
124
125

SUBCOMMANDS

127       rig list
128              Show  a  list  of  known  existing rigs and their status. Status
129              information is obtained by querying the socket created for  that
130              particular rig.
131
132
133       rig destroy -i [ID or 'all']
134              Destroy  a  deployed rig with id ID. If ID is 'all', destroy all
135              known rigs. Note that if another entity kills the  pid  for  the
136              running  rig,  destroy will fail as the socket is no longer con‐
137              nected to the (now killed) process. In this case use the --force
138              option to cleanup the lingering socket.
139
140
141       rig info -i [ID]
142              Get detailed information on a rig. This information will include
143              configuration options, the entire cmdline string given to launch
144              the  rig,  as well as information on each action the rig is con‐
145              figured  to  take  and  what  the  expected  result  from  those
146              action(s) are.
147
148              Currently, this data is written to stdout in JSON format.
149
150
151       rig trigger -i [ID]
152              Manually  trigger  rig with id ID. This will cause the specified
153              rig to begin executing the actions configured for it, as if  the
154              trigger condition had been met.
155
156              Note that this is only effective on a single rig basis, so using
157              a value of 'all' for the ID will not work.
158
159

RESOURCES

161       These are the system resources that rig can monitor. There may be addi‐
162       tional  manpages  for specific resources. Where applicable this will be
163       noted below.
164
165       Note that 'resources', 'monitors', and referencing 'a rig'  as  a  dis‐
166       tinct entity all refer to the same thing.
167
168       When creating a rig, if successful the rig's ID will be printed to con‐
169       sole.
170
171
172       logs   Watch a single or multiple log files and/or journald units for a
173              specified  message.  When that message is matched to any watched
174              file or journal, the trigger condition  is  met  and  configured
175              actions are initiated.
176
177              The following options are available for the logs rig:
178
179              -m|--message STRING
180                     Define  the  string  that serves as the trigger condition
181                     for the rig. This can be a regex string or an exact  mes‐
182                     sage. Be very careful in using the '*' regex character as
183                     this may cause unintended behavior such as the rig  imme‐
184                     diately triggering on the first message seen.
185
186                     Note that a small amount of transformation and testing is
187                     done on the provided STRING.  First, '*'  characters  are
188                     converted  to the python-style regex match of '.*'. After
189                     which, rig performs a test on  if  the  provided  message
190                     will regex-match itself, and if that fails the rig aborts
191                     the creation process.
192
193                     Aside from the conversion noted above,  regexes  provided
194                     in this option must be python-style and not shell-style.
195
196
197              --logfile FILE
198                     A comma-delimited list of files to watch. Each FILE spec‐
199                     ified will be monitored from the current end of the file,
200                     so old entries will not set off the rig's actions.
201
202                     Default: /var/log/messages
203
204              --no-files
205                     Do not monitor any log files.
206
207              --journal UNIT
208                     A  comma-delimited  list  of  journal units to watch. The
209                     journal is watched as a singular entity, and will be fil‐
210                     tered to only read from the provided UNIT(s).  If no UNIT
211                     is specified, the whole system journal will be monitored.
212
213                     Default: 'system'
214
215              --no-journal
216                     Do not monitor the journal.
217
218              --count COUNT
219                     The number  of  times  the  --message  string  should  be
220                     matched  before the rig is triggered. Default 1 - meaning
221                     match on the first occurence.
222
223
224
225       ping   Perform a simple ongoing ping test  against  a  specified  host.
226              Pings  are  sent  one  at  a time at a defined interval, and the
227              response is evaluated. Ping-type rigs may monitor for number  of
228              lost  packets  and/or  packets exceeding a specified RTT in mil‐
229              liseconds.
230
231              Packets are first evaluated for loss (including timeouts),  then
232              for RTT time.
233
234              The following options are available for the ping rig:
235
236              --host ADDRESS
237                     The  target  IP  or  hostname to ping. This is a required
238                     option in order for a ping rig to be created.
239
240                     During rig creation, a 'sanity check' ping is sent to the
241                     ADDRESS to ensure that it is an address that is reachable
242                     on the network and that it will respond to ICMP  packets.
243                     If this sanity check fails, rig creation is aborted.
244
245              --ping-timeout SECONDS
246                     Specify  the  number  of  SECONDS  to  allow  for  a ping
247                     response. If a ping encounters a timeout, then it is con‐
248                     sidered both a lost packet and a packet exceeding the RTT
249                     threshold (see --ping-ms-max and --ping-ms-count).
250
251              --lost-count PACKETS
252                     Specify the number of PACKETS to  accept  being  lost  or
253                     timed-out, before triggering the rig.
254
255                     Default: 1 (trigger on the first lost packet)
256
257              --ping-interval SECONDS
258                     Specify  the  number  of  SECONDS  to  wait  between ping
259                     requests sent to the target host.
260
261                     Default: 1
262
263              --ping-ms-max MILLISECONDS
264                     Specify the RTT threshold to allow for  a  returned  ping
265                     request. If the RTT reported by the ping command is above
266                     this value in milliseconds, it  is  counted  against  the
267                     threshold  of  packets  exceeding this value specified by
268                     --ping-ms-count.
269
270                     By default, this form of checking is disabled. Any  inte‐
271                     ger  value passed to this option will enable RTT monitor‐
272                     ing.
273
274              --ping-ms-count PACKETS
275                     Specify the number of PACKETS that may exceed the defined
276                     --ping-ms-max RTT value before triggering the rig.
277
278                     Default: 5
279
280       process
281              Watch a single process or list of processes for state changes or
282              resource consumption thresholds. When  the  process  enters  the
283              specified  state or the specified resource consumption threshold
284              is met, the trigger condition is met.
285
286              The following options are available for the process rig:
287
288              --proc A PID or process name of processes to watch. If a process
289                     name  is specified, then rig will attempt to convert this
290                     to a PID during rig creation. If multiple PIDs are found,
291                     the  default  behavior  is  to fail creation and exit. To
292                     have rig monitor all processes found for a process  name,
293                     use the --all option.
294
295              --state STATE
296                     The  state  that  a  process  needs to be in, in order to
297                     trigger the rig. The following is  a  list  of  supported
298                     states:
299
300                         NAME          DESCRIPTION                      SHORT‐
301                     HAND
302                         dead         Dead - should never be seen         'X'
303                         disk-sleep   Uninterruptible sleep            'D'  or
304                     'UN'
305                         running       Currently  running               'R' or
306                     'run'
307                         sleeping     Interruptible sleep              'S'  or
308                     'sleep'
309                         stopped       Stopped                          'T' or
310                     'stop'
311                         zombie       Exited, still in proc table      'Z'  or
312                     'zomb'
313
314                     Users  can use either the full status name, or the short‐
315                     hand noted in the final column of the table  above.  Both
316                     the names and the shorthand values are case sensitive.
317
318                     This  can  also be set to a "not" value by preceeding one
319                     of the above state strings with a exclaimation mark  (!),
320                     e.g.  '!sleeping' will match any non-sleep (S) state sta‐
321                     tus for the process(es). Most shells will require you  to
322                     quote the state string when using the '!' character.
323
324                     Note  that using '!running' will cause rig to not trigger
325                     against a state  of  'sleeping',  as  generally  speaking
326                     'running'  processes spend much of their time in S state,
327                     and it is assumed that triggering against such a  process
328                     is not desired.
329
330                     Process status is polled once every second.
331
332              --rss INTEGER
333                     The amount of rss (resident set size) memory usage to use
334                     as a threshold for triggering the rig.  If  the  process'
335                     RSS usage goes above this value, trigger.
336
337                     The  value  provided here may be suffixed with K, M, or G
338                     to denote the IEC unit.  Rig will  convert  the  provided
339                     value and suffix into a value in bytes.
340
341              --vms INTEGER
342                     The  same  as  --rss  but  monitoring Virtual Memory Size
343                     instead.
344
345              --memperc PERCENT
346                     The percentage of total system memory a process  is  con‐
347                     suming  to  use as a threshold for triggering the rig. If
348                     the process' %mem meets or exceeds this value, trigger.
349
350                     PERCENT may be a whole integer or a float. When  using  a
351                     float,  the  process  rig  respects up to two (2) decimal
352                     points of precision. For example, using ´--memperc 10.25´
353                     is the same as using ´--memperc 10.25678´.
354
355              --cpuperc PERCENT
356                     The percentage of CPU usage a process is consuming to use
357                     as a threshold for triggering the rig.  If  the  process'
358                     %cpu meets or exceeds this value, trigger.
359
360                     PERCENT  may  be a whole integer or a float. When using a
361                     float and monitoring for CPU usage, rig respects one  (1)
362                     decimal  point  of  precision  due  to  how  CPU usage is
363                     reported.
364
365                     PERCENT may be above 100 - as CPU usage  can  exceed  100
366                     when a process is running on multiple CPUs.
367
368
369       system
370
371              Watch  the  system's  utilization  of resources as a whole, e.g.
372              total CPU or memory usage.  When  the  utilization  of  a  given
373              resource  is  either exceeded or falls below the given threshold
374              (determined as appropriate for each resource), the trigger  con‐
375              dition is met.
376
377              The following options are available for the system rig:
378
379              --iowait PERCENT
380                     The amount of %iowait as reported by the kernel to use as
381                     a threshold value.
382
383                     If exceeded, trigger the rig.
384
385              --steal PERCENT
386                     The amount of %steal as reported by the kernel to use  as
387                     a threshold value.
388
389                     If exceeded, trigger the rig.
390
391              --nice PERCENT
392                     The amount of %nice as reported by the kernel to use as a
393                     threshold value.
394
395                     If exceeded, trigger the rig.
396
397              --guest PERCENT
398                     The amount of %guest as reported by the kernel to use  as
399                     a threshold value.
400
401                     If exceeded, trigger the rig.
402
403              --user The amount of %user as reported by the kernel to use as a
404                     threshold value.
405
406                     If exceeded, trigger the rig.
407
408              --available INTEGER
409                     The amount of available memory in MiB as reported by  the
410                     kernel to use as a threshold value.
411
412                     If  the  amount  of  available  memory  falls  below this
413                     threshold, trigger the rig.
414
415              --free INTEGER
416                     The amount of free memory in MiB as reported by the  ker‐
417                     nel to use as a threshold value.
418
419                     If  the amount of free memory falls below this threshold,
420                     trigger the rig.
421
422              --used INTEGER
423                     The amount of used memory in MiB as reported by the  ker‐
424                     nel to use as a threshold value.
425
426                     If  the  amount  of  used  memory exceeds this threshold,
427                     trigger the rig.
428
429              --slab INTEGER
430                     The amount of slab memory in MiB as reported by the  ker‐
431                     nel to use as a threshold value.
432
433                     If  the  amount  of  slab  memory exceeds this threshold,
434                     trigger the rig.
435
436              --cpuperc PERCENT
437                     The amount of total CPU usage as reported by  the  kernel
438                     as a percentage to use as a threshold value.
439
440                     If exceeded, trigger the rig.
441
442                     This  value may be a whole integer or a float. Floats are
443                     precise out to one (1) decimal point.
444
445              --memperc PERCENT
446                     The amount of total memory usage as reported by the  ker‐
447                     nel as a percentage to use as a theshold value.
448
449                     If exceeded, trigger the rig.
450
451                     This  value may be a whole integer or a float. Floats are
452                     precise out to one (1) decimal point.
453
454              --loadavg FLOAT
455                     System load average as reported by the OS  to  use  as  a
456                     threshold  value.  If  the  reported loadavg exceeds this
457                     value, trigger the rig. This option can accept either  an
458                     integer (1) or a float (1.0).
459
460                     Linux returns loadavg data for the past 1, 5, and 15 min‐
461                     utes. The system rig will monitor only one (1)  of  these
462                     intervals  at  a  time,  as  controlled by the --loadavg-
463                     interval option.
464
465              --loadavg-interval [1, 5, 15]
466                     Which time interval the rig should monitor when  watching
467                     the  system's  loadavg.   Only  1, 5, and 15 are accepted
468                     values for this option, as that is what the Linux  kernel
469                     returns loadavg data for.
470
471                     Default: 1
472
473              --temp INTEGER
474                     The temperature in Celsius rig should monitor the CPU for
475                     meeting or exceeding.
476
477                     This option takes an integer  value,  though  temperature
478                     data  is single decimal point sensitive, so a temperature
479                     of 50.9 degrees will not trigger a  rig  that  sets  this
480                     option to 51.
481
482                     By  default rig will monitor the first physical CPU pack‐
483                     age installed on the system.  This may be changed via the
484                     --cpu-id  option.  Note  that rig will only monitor whole
485                     packages and not individual cores, and that package  tem‐
486                     peratures  reported  are the highest reported temperature
487                     for any core in that package.
488
489
490              --cpu-id ID
491                     If specified,  monitor  this  physical  CPU  package.  By
492                     default,  rig will monitor physical CPU package 0 - mean‐
493                     ing the first physically installed CPU.
494
495                     When specifying an ID here, remember that  in  Linux  CPU
496                     IDs  are zero-indexed, so the first CPU will be ID 0, the
497                     second ID 1, and so forth.
498
499                     Default: 0
500
501       Filesystem
502
503              Watch a filesystem, directory, or file for utilization  changes.
504              Currently  this  rig  is  focused on space consumption, and will
505              trigger when the specified path or  backing  filesystem  exceeds
506              the defined threshold for space utilization.
507
508              The following options are available for the filesystem rig:
509
510              --path PATH
511                     Specify  the  filesystem, directory, or file path for the
512                     rig to monitor. The location provided  must  exists  when
513                     the rig initializes for monitoring to be supported.
514
515              --size SIZE
516                     Specify the size threshold to trigger on for the provided
517                     --path. The size given must be an integer  suffixed  with
518                     either  K,  M,  G,  or T. The provided value will be con‐
519                     verted to bytes.
520
521              --fs-size SIZE
522                     Use this option instead of --size if you want to  monitor
523                     the  space  usage  of  the  backing filesystem for --path
524                     rather than the size of the path alone.
525
526                     Similar to --size this value must be suffixed with either
527                     K, M, G, or T.
528
529              --fs-used PERCENT
530                     Similar  to  --fs-size  but  instead provide a percentage
531                     value to trigger on, when the filesystem's %used  exceeds
532                     this value.
533
534                     Note  that  using  this  option is ultimately the same as
535                     --fs-size as rig will convert  the  specified  percentage
536                     into a raw bytes value to use for comparisons.
537
538
539

ACTIONS

541       The  following actions are supported responses to triggered rigs. These
542       may be chained together on a single rig,  so  deploying  multiple  rigs
543       with matching trigger conditions with single, varying actions is unnec‐
544       essary.
545
546       Actions are executed based on a priority weighting system, where  lower
547       values represent a higher priority action, and those actions with lower
548       values are executed before those with higher values. This is  to  allow
549       more  time-sensitive  actions  to be taken before those that may either
550       take a long time to execute or are  otherwise  unaffected  by  allowing
551       other actions to run before them. Action priority values are set by the
552       actions directly and are currently not able to be modified by users.
553
554       gcore  Collect a coredump of a given process or processes  using  GDB's
555              gcore utility.
556
557              Note  that  this  does  _not_ interrupt the running process(es).
558              Cores are saved to /tmp and will be named  either  core.$pid  or
559              core.$proc_name.$pid  depending  on if a PID or process name was
560              provided. This action will be executed first when a rig is trig‐
561              gered and multiple actions are specified.
562
563              This action supports repetition via the --repeat option.
564
565              The gcore action supports the following options:
566
567              --gcore PROCESS
568                     Enables  this  action  and  takes either a PID or process
569                     name as a value. If a process name is given, the  PID  is
570                     determined  at  rig  creation. If multiple PIDs are found
571                     for the same process name, the  default  behavior  is  to
572                     fail  rig  creation. Use the --all-pids option to instead
573                     use all PIDs discovered for a process name.
574
575                     This option can be specified multiple times. E.G. --gcore
576                     12345  --gcore myprocess will generate a coredump for PID
577                     12345 and a process matching the name 'myprocess'.
578
579
580              --all-pids
581                     Tells this action to collect  a  coredump  for  all  PIDs
582                     found for a provided process name.
583
584
585              --freeze
586                     Freeze  the process(es) that will be core dumped by send‐
587                     ing a SIGSTOP prior to calling gcore  on  the  discovered
588                     pid(s).
589
590                     If  successful,  then  rig  will send a SIGCONT after the
591                     gcore execution  has  completed  in  order  to  thaw  the
592                     process.
593
594
595       kdump  Generate a vmcore by triggering a kernel crash via sysrq.
596
597              Note that this action WILL cause node disruption by triggering a
598              kernel panic to generate the vmcore. This means your system will
599              reboot when this action is triggered.
600
601              The  kdump  action  does not perform any configuration checks on
602              the system's kdump installation. It is assumed  that  kdump  has
603              been properly configured and tested prior to using this action.
604
605              The kdump action supports the following options:
606
607              --kdump
608                     Enables this action
609
610
611              --sysrq INTEGER
612                     When the rig is deployed, if this option is set, rig will
613                     set the system's /proc/sys/kernel/sysrq to the value pro‐
614                     vided.  See sysrq kernel documentation for information on
615                     what values are supported.
616
617
618
619       sosreport
620              Run a sosreport after the rig has been triggered. Select  plugin
621              enablement options as well as the --plugin-option from sosreport
622              are supported by this rig.  This action  should  run  after  any
623              time-sensitive  actions  otherwise  specified  by the user for a
624              given rig.
625
626              The sosreport action supports the following options:
627
628              --sosreport
629                     Enables this action
630
631              --enable-plugins PLUGINS
632                     Specifically force the specified comma-delimited list  of
633                     PLUGINS to be enabled.
634
635              --plugin-option PLUGOPT
636                     Modify  a  specific  plugin's  runtime  options.  This is
637                     passed directly to sosreport as the same  --plugin-option
638                     value,  which  should  take the form 'name.option=value'.
639                     For example, to increase the podman  plugin  timeout  use
640                     ´--plugin-option podman.timeout=600´.
641
642                     If  you  need  to pass multiple sosreport plugin options,
643                     use a comma-delimited list  here  instead  of  specifying
644                     this option multiple times.
645
646              --skip-plugins PLUGINS
647                     Do not run these specified plugins. Use a comma-delimited
648                     list to skip multiple plugins.
649
650              --only-plugins PLUGINS
651                     Only enable these specific plugins, disable  all  others.
652                     Use a comma-delimited list to specify multiple plugins.
653
654
655       tcpdump
656              Start collecting a tcpdump when the rig is initialized, and stop
657              the collection when the rig triggers. This action will be  trig‐
658              gered before most other actions, but after the gcore action.
659
660              Note  there  will  be a slight delay in configuring any rig that
661              uses the tcpdump action as rig  must  verify  that  the  tcpdump
662              process started successfully during the initialization process.
663
664              The tcpdump action supports the following options:
665
666              --tcpdump
667                     Enables this action
668
669              --iface INTERFACE
670                     Starts  the tcpdump to monitor the provided INTERFACE. In
671                     almost all situations this should likely be set to a spe‐
672                     cific interface on the system, however the value of 'any'
673                     is accepted by the tcpdump command in order to listen  on
674                     all  interfaces.  Be wary of using this however as use of
675                     'any' means will make it impossible  to  determine  which
676                     interface a particular packet came in on in the resulting
677                     packet capture.
678
679                     Default: eth0
680
681              --filter FILTER
682                     Provide a filter to use with tcpdump in order  to  reduce
683                     the  amount  of  traffic  recorded in the packet capture.
684                     This value is passed directly to the tcpdump utility, and
685                     thus can be any valid filter accepted by tcpdump.
686
687                     For  most shells you must quote the filter string for rig
688                     to pass it correctly.
689
690              --size SIZE
691                     Limit the size of the packet capture file(s) to  SIZE  in
692                     MB.
693
694                     Default: 10
695
696              --captures CAPTURES
697                     Specify  the  number  of packet capture files to keep. If
698                     more than one (1), then tcpdump will  rotate  the  packet
699                     capture  file  when  it reaches the --size value and keep
700                     CAPTURES number of files.
701
702                     E.G. Using a CAPTURES of 2 and a SIZE of 5, then when the
703                     rig terminates you will have up to 2 5MB packet captures.
704
705                     Default: 1 (packet capture file is replaced upon reaching
706                     SIZE limit).
707
708
709       noop
710
711              Does nothing - this action runs a no-op. This  is  ideally  used
712              for  when  you need to test a rig's configuration to make sure a
713              rig's trigger condition is set properly - e.g.  a  regex  string
714              for the logs' rig message option.
715
716              The noop action supports the following options:
717
718              --noop Enables this action
719

MAINTAINER

721       Jake Hunsaker <jhunsake@redhat.com>
722
723
724
725                                 January 2019                           rig(1)