strigger(1)

1strigger(1)                     Slurm Commands                     strigger(1)
2
3
4

NAME

6       strigger - Used set, get or clear Slurm trigger information.
7
8

SYNOPSIS

10       strigger --set   [OPTIONS...]
11       strigger --get   [OPTIONS...]
12       strigger --clear [OPTIONS...]
13
14

DESCRIPTION

16       strigger is used to set, get or clear Slurm trigger information.  Trig‐
17       gers include events such as a node failing, a  job  reaching  its  time
18       limit or a job terminating.  These events can cause actions such as the
19       execution of an arbitrary script.  Typical uses include notifying  sys‐
20       tem  administrators  of  node failures and gracefully terminating a job
21       when its time limit is approaching.   A  hostlist  expression  for  the
22       nodelist or job ID is passed as an argument to the program.
23
24       Trigger  events  are  not processed instantly, but a check is performed
25       for trigger events on a periodic basis (currently  every  15  seconds).
26       Any  trigger  events  which occur within that interval will be compared
27       against the trigger programs set at the end of the time interval.   The
28       trigger  program  will be executed once for any event occurring in that
29       interval.  The record of those events (e.g. nodes which  went  DOWN  in
30       the  previous  15  seconds)  will then be cleared.  The trigger program
31       must set a new trigger before the end of the next  interval  to  ensure
32       that  no  trigger events are missed OR the trigger must be created with
33       an argument of "--flags=PERM".  If desired, multiple  trigger  programs
34       can be set for the same event.
35
36       IMPORTANT  NOTE:  This command can only set triggers if run by the user
37       SlurmUser unless SlurmUser is configured as user  root.   This  is  re‐
38       quired  for  the slurmctld daemon to set the appropriate user and group
39       IDs for the executed program.  Also note that the  trigger  program  is
40       executed  on  the  same node that the slurmctld daemon uses rather than
41       some allocated compute node.  To check the value of SlurmUser, run  the
42       command:
43
44       scontrol show config | grep SlurmUser
45
46

ARGUMENTS

48       -a, --primary_slurmctld_failure
49              Trigger an event when the primary slurmctld fails.
50
51
52       -A, --primary_slurmctld_resumed_operation
53              Trigger  an  event when the primary slurmctld resuming operation
54              after failure.
55
56
57       -b, --primary_slurmctld_resumed_control
58              Trigger an event when primary slurmctld resumes control.
59
60
61       -B, --backup_slurmctld_failure
62              Trigger an event when the backup slurmctld fails.
63
64
65       -c, --backup_slurmctld_resumed_operation
66              Trigger an event when the backup slurmctld resumes operation af‐
67              ter failure.
68
69
70       -C, --backup_slurmctld_assumed_control
71              Trigger event when backup slurmctld assumes control.
72
73
74
75       --burst_buffer
76              Trigger event when burst buffer error occurs.
77
78
79       --clear
80              Clear  or  delete a previously defined event trigger.  The --id,
81              --jobid or --user option must be specified to identify the trig‐
82              ger(s)  to  be cleared.  Only user root or the trigger's creator
83              can delete a trigger.
84
85
86       -d, --down
87              Trigger an event if the specified node goes into a DOWN state.
88
89
90       -D, --drained
91              Trigger an event if the  specified  node  goes  into  a  DRAINED
92              state.
93
94
95       -e, --primary_slurmctld_acct_buffer_full
96              Trigger  an  event  when  primary slurmctld accounting buffer is
97              full.
98
99
100       -F, --fail
101              Trigger an event if the  specified  node  goes  into  a  FAILING
102              state.
103
104
105       -f, --fini
106              Trigger an event when the specified job completes execution.
107
108
109       --flags=type
110              Associate  flags  with the reservation. Multiple flags should be
111              comma separated.  Valid flags include:
112
113              PERM   Make the trigger permanent. Do not  purge  it  after  the
114                     event occurs.
115
116
117       --front_end
118              Trigger  events  based  upon changes in state of front end nodes
119              rather than compute nodes. Applies to  Cray  ALPS  architectures
120              only, where the slurmd daemon executes on front end nodes rather
121              than the compute nodes.  Use this option with either the --up or
122              --down option.
123
124
125       -g, --primary_slurmdbd_failure
126              Trigger an event when the primary slurmdbd fails. The trigger is
127              launched by slurmctld in the occasions it tries  to  connect  to
128              slurmdbd, but receives no response on the socket.
129
130
131       -G, --primary_slurmdbd_resumed_operation
132              Trigger an event when the primary slurmdbd resumes operation af‐
133              ter failure.  This event is triggered when opening  the  connec‐
134              tion  from  slurmctld  to slurmdbd results in a response. It can
135              happen also in different situations, periodically every 15  sec‐
136              onds  when  checking  the  connection status, when saving state,
137              when agent queue is filling, and so on.
138
139
140       --get  Show registered event triggers.  Options can be used for filter‐
141              ing purposes.
142
143
144       -h, --primary_database_failure
145              Trigger  an event when the primary database fails. This event is
146              triggered when the accounting plugin tries to open a  connection
147              with mysql and it fails and the slurmctld needs the database for
148              some operations.
149
150
151       -H, --primary_database_resumed_operation
152              Trigger an event when the primary database resumes operation af‐
153              ter  failure.   It happens when the connection to mysql from the
154              accounting plugin is restored.
155
156
157       -i, --id=id
158              Trigger ID number.
159
160
161       -I, --idle
162              Trigger an event if the specified node remains in an IDLE  state
163              for  at  least the time period specified by the --offset option.
164              This can be useful to hibernate a node that remains  idle,  thus
165              reducing power consumption.
166
167
168       -j, --jobid=id
169              Job ID of interest.  NOTE: The --jobid option can not be used in
170              conjunction with the --node option. When the --jobid  option  is
171              used  in  conjunction  with the --up or --down option, all nodes
172              allocated to that job will considered the nodes used as a  trig‐
173              ger event.
174
175
176       -M, --clusters=<string>
177              Clusters  to  issue commands to.  Note that the SlurmDBD must be
178              up for this option to work properly.
179
180
181       -n, --node[=host]
182              Host name(s) of interest.  By default, all nodes associated with
183              the  job  (if --jobid is specified) or on the system are consid‐
184              ered for event triggers.  NOTE: The --node  option  can  not  be
185              used  in  conjunction  with the --jobid option. When the --jobid
186              option is used in conjunction with the --up, --down or --drained
187              option,  all  nodes  allocated  to  that job will considered the
188              nodes used as a trigger event. Since this option's  argument  is
189              optional,  for  proper  parsing the single letter option must be
190              followed immediately with the value and not include a space  be‐
191              tween them. For example "-ntux" and not "-n tux".
192
193
194       -N, --noheader
195              Do not print the header when displaying a list of triggers.
196
197
198       -o, --offset=seconds
199              The specified action should follow the event by this time inter‐
200              val.  Specify a negative value if  action  should  preceded  the
201              event.  The default value is zero if no --offset option is spec‐
202              ified.  The resolution of this time is about 20 seconds,  so  to
203              execute  a  script  not  less  than  five minutes prior to a job
204              reaching its time limit, specify --offset=320 (5 minutes plus 20
205              seconds).
206
207
208       -p, --program=path
209              Execute  the  program  at the specified fully qualified pathname
210              when the event occurs.  You may quote the path and include extra
211              program  arguments  if desired.  The program will be executed as
212              the user who sets the trigger.  If the program fails  to  termi‐
213              nate  within 5 minutes, it will be killed along with any spawned
214              processes.
215
216
217       -Q, --quiet
218              Do not report non-fatal errors.  This can  be  useful  to  clear
219              triggers which may have already been purged.
220
221
222       -r, --reconfig
223              Trigger an event when the system configuration changes.  This is
224              triggered when the slurmctld daemon reads its configuration file
225              or when a node state changes.
226
227
228       --set  Register  an  event  trigger  based  upon  the supplied options.
229              NOTE: An event is only triggered once. A new event trigger  must
230              be set established for future events of the same type to be pro‐
231              cessed.  Triggers can only be set if the command is run  by  the
232              user SlurmUser unless SlurmUser is configured as user root.
233
234
235       -t, --time
236              Trigger an event when the specified job's time limit is reached.
237              This must be used in conjunction with the --jobid option.
238
239
240       -u, --up
241              Trigger an event if the specified node is  returned  to  service
242              from a DOWN state.
243
244
245       --user=user_name_or_id
246              Clear  or get triggers created by the specified user.  For exam‐
247              ple, a trigger created by user root for a job  created  by  user
248              adam  could  be cleared with an option --user=root.  Specify ei‐
249              ther a user name or user ID.
250
251
252       -v, --verbose
253              Print detailed event logging. This includes time-stamps on  data
254              structures, record counts, etc.
255
256
257       -V , --version
258              Print version information and exit.
259
260

OUTPUT FIELD DESCRIPTIONS

262       TRIG_ID
263              Trigger ID number.
264
265
266       RES_TYPE
267              Resource type: job or node
268
269
270       RES_ID Resource ID: job ID or host names or "*" for any host
271
272
273       TYPE   Trigger type: time or fini (for jobs only), down or up (for jobs
274              or nodes), or drained, idle or reconfig (for nodes only)
275
276
277       OFFSET Time offset in seconds. Negative numbers  indicated  the  action
278              should occur before the event (if possible)
279
280
281       USER   Name of the user requesting the action
282
283
284       PROGRAM
285              Pathname of the program to execute when the event occurs
286
287

PERFORMANCE

289       Executing  strigger  sends  a  remote  procedure  call to slurmctld. If
290       enough calls from strigger or other Slurm client commands that send re‐
291       mote  procedure  calls  to the slurmctld daemon come in at once, it can
292       result in a degradation of performance of the slurmctld daemon,  possi‐
293       bly resulting in a denial of service.
294
295       Do  not  run  strigger  or other Slurm client commands that send remote
296       procedure calls to slurmctld from loops in shell scripts or other  pro‐
297       grams. Ensure that programs limit calls to strigger to the minimum nec‐
298       essary for the information you are trying to gather.
299
300

ENVIRONMENT VARIABLES

302       Some strigger options may be set via environment variables. These envi‐
303       ronment  variables,  along with their corresponding options, are listed
304       below.  (Note: commandline options will always override these settings)
305
306       SLURM_CONF          The location of the Slurm configuration file.
307
308

EXAMPLES

310       Execute the program "/usr/sbin/primary_slurmctld_failure" whenever  the
311       primary slurmctld fails.
312
313              $ cat /usr/sbin/primary_slurmctld_failure
314              #!/bin/bash
315              # Submit trigger for next primary slurmctld failure event
316              strigger --set --primary_slurmctld_failure \
317                       --program=/usr/sbin/primary_slurmctld_failure
318              # Notify the administrator of the failure using e-mail
319              /bin/mail slurm_admin@site.com -s Primary_SLURMCTLD_FAILURE
320
321              $ strigger --set --primary_slurmctld_failure \
322                         --program=/usr/sbin/primary_slurmctld_failure
323
324
325       Execute the program "/usr/sbin/slurm_admin_notify" whenever any node in
326       the cluster goes down. The subject line will  include  the  node  names
327       which  have entered the down state (passed as an argument to the script
328       by Slurm).
329
330              $ cat /usr/sbin/slurm_admin_notify
331              #!/bin/bash
332              # Submit trigger for next event
333              strigger --set --node --down \
334                       --program=/usr/sbin/slurm_admin_notify
335              # Notify administrator using by e-mail
336              /bin/mail slurm_admin@site.com -s NodesDown:$*
337
338              $ strigger --set --node --down \
339                         --program=/usr/sbin/slurm_admin_notify
340
341
342       Execute the program "/usr/sbin/slurm_suspend_node" whenever any node in
343       the cluster remains in the idle state for at least 600 seconds.
344
345              $ strigger --set --node --idle --offset=600 \
346                         --program=/usr/sbin/slurm_suspend_node
347
348
349       Execute  the  program  "/home/joe/clean_up"  when job 1234 is within 10
350       minutes of reaching its time limit.
351
352              $ strigger --set --jobid=1234 --time --offset=-600 \
353                         --program=/home/joe/clean_up
354
355
356       Execute the program "/home/joe/node_died" when any  node  allocated  to
357       job 1234 enters the DOWN state.
358
359              $ strigger --set --jobid=1234 --down \
360                         --program=/home/joe/node_died
361
362
363       Show all triggers associated with job 1235.
364
365              $ strigger --get --jobid=1235
366              TRIG_ID RES_TYPE RES_ID TYPE OFFSET USER PROGRAM
367                  123      job   1235 time   -600  joe /home/bob/clean_up
368                  125      job   1235 down      0  joe /home/bob/node_died
369
370
371       Delete event trigger 125.
372
373              $ strigger --clear --id=125
374
375
376       Execute /home/joe/job_fini upon completion of job 1237.
377
378              $ strigger --set --jobid=1237 --fini --program=/home/joe/job_fini
379
380

COPYING

382       Copyright  (C)  2007 The Regents of the University of California.  Pro‐
383       duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
384       Copyright (C) 2008-2010 Lawrence Livermore National Security.
385       Copyright (C) 2010-2013 SchedMD LLC.
386
387       This file is part of Slurm, a resource  management  program.   For  de‐
388       tails, see <https://slurm.schedmd.com/>.
389
390       Slurm  is free software; you can redistribute it and/or modify it under
391       the terms of the GNU General Public License as published  by  the  Free
392       Software  Foundation;  either version 2 of the License, or (at your op‐
393       tion) any later version.
394
395       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
396       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
397       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
398       for more details.
399
400