1strigger(1)                     Slurm Commands                     strigger(1)
2
3
4

NAME

6       strigger - Used set, get or clear Slurm trigger information.
7
8

SYNOPSIS

10       strigger --set   [OPTIONS...]
11       strigger --get   [OPTIONS...]
12       strigger --clear [OPTIONS...]
13
14

DESCRIPTION

16       strigger is used to set, get or clear Slurm trigger information.  Trig‐
17       gers include events such as a node failing, a  job  reaching  its  time
18       limit or a job terminating.  These events can cause actions such as the
19       execution of an arbitrary script.  Typical uses include notifying  sys‐
20       tem  administrators  of  node failures and gracefully terminating a job
21       when it's time limit is approaching.  A  hostlist  expression  for  the
22       nodelist or job ID is passed as an argument to the program.
23
24       Trigger  events  are  not processed instantly, but a check is performed
25       for trigger events on a periodic basis (currently  every  15  seconds).
26       Any  trigger  events  which occur within that interval will be compared
27       against the trigger programs set at the end of the time interval.   The
28       trigger  program  will be executed once for any event occurring in that
29       interval.  The record of those events (e.g. nodes which  went  DOWN  in
30       the  previous  15  seconds)  will then be cleared.  The trigger program
31       must set a new trigger before the end of the next  interval  to  ensure
32       that  no  trigger events are missed OR the trigger must be created with
33       an argument of "--flags=PERM".  If desired, multiple  trigger  programs
34       can be set for the same event.
35
36       IMPORTANT  NOTE:  This command can only set triggers if run by the user
37       SlurmUser unless  SlurmUser  is  configured  as  user  root.   This  is
38       required for the slurmctld daemon to set the appropriate user and group
39       IDs for the executed program.  Also note that the  trigger  program  is
40       executed  on  the  same node that the slurmctld daemon uses rather than
41       some allocated compute node.  To check the value of SlurmUser, run  the
42       command:
43
44       scontrol show config | grep SlurmUser
45
46

ARGUMENTS

48       -a, --primary_slurmctld_failure
49              Trigger an event when the primary slurmctld fails.
50
51
52       -A, --primary_slurmctld_resumed_operation
53              Trigger  an  event when the primary slurmctld resuming operation
54              after failure.
55
56
57       -b, --primary_slurmctld_resumed_control
58              Trigger an event when primary slurmctld resumes control.
59
60
61       -B, --backup_slurmctld_failure
62              Trigger an event when the backup slurmctld fails.
63
64
65       -c, --backup_slurmctld_resumed_operation
66              Trigger an event when the  backup  slurmctld  resumes  operation
67              after failure.
68
69
70       -C, --backup_slurmctld_assumed_control
71              Trigger event when backup slurmctld assumes control.
72
73
74
75       --burst_buffer
76              Trigger event when burst buffer error occurs.
77
78
79       --clear
80              Clear  or  delete a previously defined event trigger.  The --id,
81              --jobid or --user option must be specified to identify the trig‐
82              ger(s)  to  be cleared.  Only user root or the trigger's creator
83              can delete a trigger.
84
85
86       -d, --down
87              Trigger an event if the specified node goes into a DOWN state.
88
89
90       -D, --drained
91              Trigger an event if the  specified  node  goes  into  a  DRAINED
92              state.
93
94
95       -e, --primary_slurmctld_acct_buffer_full
96              Trigger  an  event  when  primary slurmctld accounting buffer is
97              full.
98
99
100       -F, --fail
101              Trigger an event if the  specified  node  goes  into  a  FAILING
102              state.
103
104
105       -f, --fini
106              Trigger an event when the specified job completes execution.
107
108
109       --flags=type
110              Associate  flags  with the reservation. Multiple flags should be
111              comma separated.  Valid flags include:
112
113              PERM   Make the trigger permanent. Do not  purge  it  after  the
114                     event occurs.
115
116
117       --front_end
118              Trigger  events  based  upon changes in state of front end nodes
119              rather than compute nodes. Applies to  Cray  ALPS  architectures
120              only, where the slurmd daemon executes on front end nodes rather
121              than the compute nodes.  Use this option with either the --up or
122              --down option.
123
124
125       -g, --primary_slurmdbd_failure
126              Trigger an event when the primary slurmdbd fails.
127
128
129       -G, --primary_slurmdbd_resumed_operation
130              Trigger  an  event  when  the primary slurmdbd resumes operation
131              after failure.
132
133
134       --get  Show registered event triggers.  Options can be used for filter‐
135              ing purposes.
136
137
138       -h, --primary_database_failure
139              Trigger an event when the primary database fails.
140
141
142       -H, --primary_database_resumed_operation
143              Trigger  an  event  when  the primary database resumes operation
144              after failure.
145
146
147       -i, --id=id
148              Trigger ID number.
149
150
151       -I, --idle
152              Trigger an event if the specified node remains in an IDLE  state
153              for  at  least the time period specified by the --offset option.
154              This can be useful to hibernate a node that remains  idle,  thus
155              reducing power consumption.
156
157
158       -j, --jobid=id
159              Job ID of interest.  NOTE: The --jobid option can not be used in
160              conjunction with the --node option. When the --jobid  option  is
161              used  in  conjunction  with the --up or --down option, all nodes
162              allocated to that job will considered the nodes used as a  trig‐
163              ger event.
164
165
166       -M, --clusters=<string>
167              Clusters  to  issue commands to.  Note that the SlurmDBD must be
168              up for this option to work properly.
169
170
171       -n, --node[=host]
172              Host name(s) of interest.  By default, all nodes associated with
173              the  job  (if --jobid is specified) or on the system are consid‐
174              ered for event triggers.  NOTE: The --node  option  can  not  be
175              used  in  conjunction  with the --jobid option. When the --jobid
176              option is used in conjunction with the --up, --down or --drained
177              option,  all  nodes  allocated  to  that job will considered the
178              nodes used as a trigger event. Since this option's  argument  is
179              optional,  for  proper  parsing the single letter option must be
180              followed immediately with the value  and  not  include  a  space
181              between them. For example "-ntux" and not "-n tux".
182
183
184       -N, --noheader
185              Do not print the header when displaying a list of triggers.
186
187
188       -o, --offset=seconds
189              The specified action should follow the event by this time inter‐
190              val.  Specify a negative value if  action  should  preceded  the
191              event.  The default value is zero if no --offset option is spec‐
192              ified.  The resolution of this time is about 20 seconds,  so  to
193              execute  a  script  not  less  than  five minutes prior to a job
194              reaching its time limit, specify --offset=320 (5 minutes plus 20
195              seconds).
196
197
198       -p, --program=path
199              Execute  the  program  at the specified fully qualified pathname
200              when the event occurs.  You may quote the path and include extra
201              program  arguments  if desired.  The program will be executed as
202              the user who sets the trigger.  If the program fails  to  termi‐
203              nate  within 5 minutes, it will be killed along with any spawned
204              processes.
205
206
207       -Q, --quiet
208              Do not report non-fatal errors.  This can  be  useful  to  clear
209              triggers which may have already been purged.
210
211
212       -r, --reconfig
213              Trigger an event when the system configuration changes.  This is
214              triggered when the slurmctld daemon reads its configuration file
215              or when a node state changes.
216
217
218       --set  Register  an  event  trigger  based  upon  the supplied options.
219              NOTE: An event is only triggered once. A new event trigger  must
220              be set established for future events of the same type to be pro‐
221              cessed.  Triggers can only be set if the command is run  by  the
222              user SlurmUser unless SlurmUser is configured as user root.
223
224
225       -t, --time
226              Trigger an event when the specified job's time limit is reached.
227              This must be used in conjunction with the --jobid option.
228
229
230       -u, --up
231              Trigger an event if the specified node is  returned  to  service
232              from a DOWN state.
233
234
235       --user=user_name_or_id
236              Clear  or get triggers created by the specified user.  For exam‐
237              ple, a trigger created by user root for a job  created  by  user
238              adam  could  be  cleared  with  an  option --user=root.  Specify
239              either a user name or user ID.
240
241
242       -v, --verbose
243              Print detailed event logging. This includes time-stamps on  data
244              structures, record counts, etc.
245
246
247       -V , --version
248              Print version information and exit.
249
250

OUTPUT FIELD DESCRIPTIONS

252       TRIG_ID
253              Trigger ID number.
254
255
256       RES_TYPE
257              Resource type: job or node
258
259
260       RES_ID Resource ID: job ID or host names or "*" for any host
261
262
263       TYPE   Trigger type: time or fini (for jobs only), down or up (for jobs
264              or nodes), or drained, idle or reconfig (for nodes only)
265
266
267       OFFSET Time offset in seconds. Negative numbers  indicated  the  action
268              should occur before the event (if possible)
269
270
271       USER   Name of the user requesting the action
272
273
274       PROGRAM
275              Pathname of the program to execute when the event occurs
276
277

ENVIRONMENT VARIABLES

279       Some strigger options may be set via environment variables. These envi‐
280       ronment variables, along with their corresponding options,  are  listed
281       below.  (Note: commandline options will always override these settings)
282
283       SLURM_CONF          The location of the Slurm configuration file.
284
285

EXAMPLES

287       Execute  the program "/usr/sbin/primary_slurmctld_failure" whenever the
288       primary slurmctld fails.
289
290       > cat /usr/sbin/primary_slurmctld_failure
291       #!/bin/bash
292       # Submit trigger for next primary slurmctld failure event
293       strigger --set --primary_slurmctld_failure \
294                --program=/usr/sbin/primary_slurmctld_failure
295       # Notify the administrator of the failure using by e-mail
296       /bin/mail slurm_admin@site.com -s Primary_SLURMCTLD_FAILURE
297
298       > strigger --set --primary_slurmctld_failure \
299                  --program=/usr/sbin/primary_slurmctld_failure
300
301
302       Execute the program "/usr/sbin/slurm_admin_notify" whenever any node in
303       the  cluster  goes  down.  The subject line will include the node names
304       which have entered the down state (passed as an argument to the  script
305       by Slurm).
306
307       > cat /usr/sbin/slurm_admin_notify
308       #!/bin/bash
309       # Submit trigger for next event
310       strigger --set --node --down \
311                --program=/usr/sbin/slurm_admin_notify
312       # Notify administrator using by e-mail
313       /bin/mail slurm_admin@site.com -s NodesDown:$*
314
315       > strigger --set --node --down \
316                  --program=/usr/sbin/slurm_admin_notify
317
318
319       Execute the program "/usr/sbin/slurm_suspend_node" whenever any node in
320       the cluster remains in the idle state for at least 600 seconds.
321
322       > strigger --set --node --idle --offset=600 \
323                  --program=/usr/sbin/slurm_suspend_node
324
325
326       Execute the program "/home/joe/clean_up" when job  1234  is  within  10
327       minutes of reaching its time limit.
328
329       > strigger --set --jobid=1234 --time --offset=-600 \
330                  --program=/home/joe/clean_up
331
332
333       Execute  the  program  "/home/joe/node_died" when any node allocated to
334       job 1234 enters the DOWN state.
335
336       > strigger --set --jobid=1234 --down \
337                  --program=/home/joe/node_died
338
339
340       Show all triggers associated with job 1235.
341
342       > strigger --get --jobid=1235
343       TRIG_ID RES_TYPE RES_ID TYPE OFFSET USER PROGRAM
344           123      job   1235 time   -600  joe /home/bob/clean_up
345           125      job   1235 down      0  joe /home/bob/node_died
346
347
348       Delete event trigger 125.
349
350       > strigger --clear --id=125
351
352
353       Execute /home/joe/job_fini upon completion of job 1237.
354
355       > strigger --set --jobid=1237 --fini --program=/home/joe/job_fini
356
357

COPYING

359       Copyright (C) 2007 The Regents of the University of  California.   Pro‐
360       duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
361       Copyright (C) 2008-2010 Lawrence Livermore National Security.
362       Copyright (C) 2010-2013 SchedMD LLC.
363
364       This  file  is  part  of  Slurm,  a  resource  management program.  For
365       details, see <https://slurm.schedmd.com/>.
366
367       Slurm is free software; you can redistribute it and/or modify it  under
368       the  terms  of  the GNU General Public License as published by the Free
369       Software Foundation; either version 2  of  the  License,  or  (at  your
370       option) any later version.
371
372       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
373       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
374       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
375       for more details.
376
377

SEE ALSO

379       scontrol(1), sinfo(1), squeue(1)
380
381
382
383
384August 2016                     Slurm Commands                     strigger(1)
Impressum