1strigger(1) Slurm Commands strigger(1)
2
3
4
6 strigger - Used to set, get or clear Slurm trigger information.
7
8
10 strigger --set [OPTIONS...]
11 strigger --get [OPTIONS...]
12 strigger --clear [OPTIONS...]
13
14
16 strigger is used to set, get or clear Slurm trigger information. Trig‐
17 gers include events such as a node failing, a job reaching its time
18 limit or a job terminating. These events can cause actions such as the
19 execution of an arbitrary script. Typical uses include notifying sys‐
20 tem administrators of node failures and gracefully terminating a job
21 when its time limit is approaching. A hostlist expression for the
22 nodelist or job ID is passed as an argument to the program.
23
24 Trigger events are not processed instantly, but a check is performed
25 for trigger events on a periodic basis (currently every 15 seconds).
26 Any trigger events which occur within that interval will be compared
27 against the trigger programs set at the end of the time interval. The
28 trigger program will be executed once for any event occurring in that
29 interval. The record of those events (e.g. nodes which went DOWN in
30 the previous 15 seconds) will then be cleared. The trigger program
31 must set a new trigger before the end of the next interval to ensure
32 that no trigger events are missed OR the trigger must be created with
33 an argument of "--flags=PERM". If desired, multiple trigger programs
34 can be set for the same event.
35
36 IMPORTANT NOTE: This command can only set triggers if run by the user
37 SlurmUser unless SlurmUser is configured as user root. This is re‐
38 quired for the slurmctld daemon to set the appropriate user and group
39 IDs for the executed program. Also note that the trigger program is
40 executed on the same node that the slurmctld daemon uses rather than
41 some allocated compute node. To check the value of SlurmUser, run the
42 command:
43
44 scontrol show config | grep SlurmUser
45
46
48 -C, --backup_slurmctld_assumed_control
49 Trigger event when backup slurmctld assumes control.
50
51
52 -B, --backup_slurmctld_failure
53 Trigger an event when the backup slurmctld fails.
54
55
56 -c, --backup_slurmctld_resumed_operation
57 Trigger an event when the backup slurmctld resumes operation af‐
58 ter failure.
59
60
61 --burst_buffer
62 Trigger event when burst buffer error occurs.
63
64
65 --clear
66 Clear or delete a previously defined event trigger. The --id,
67 --jobid or --user option must be specified to identify the trig‐
68 ger(s) to be cleared. Only user root or the trigger's creator
69 can delete a trigger.
70
71
72 -M, --clusters=<string>
73 Clusters to issue commands to. Note that the SlurmDBD must be
74 up for this option to work properly.
75
76
77 -d, --down
78 Trigger an event if the specified node goes into a DOWN state.
79
80
81 -D, --drained
82 Trigger an event if the specified node goes into a DRAINED
83 state.
84
85
86 -F, --fail
87 Trigger an event if the specified node goes into a FAILING
88 state.
89
90
91 -f, --fini
92 Trigger an event when the specified job completes execution.
93
94
95 --flags=<flag>
96 Associate flags with the reservation. Multiple flags should be
97 comma separated. Valid flags include:
98
99 PERM Make the trigger permanent. Do not purge it after the
100 event occurs.
101
102
103 --front_end
104 Trigger events based upon changes in state of front end nodes
105 rather than compute nodes. Applies to Cray ALPS architectures
106 only, where the slurmd daemon executes on front end nodes rather
107 than the compute nodes. Use this option with either the --up or
108 --down option.
109
110
111 --get Show registered event triggers. Options can be used for filter‐
112 ing purposes.
113
114
115 -i, --id=<id>
116 Trigger ID number.
117
118
119 -I, --idle
120 Trigger an event if the specified node remains in an IDLE state
121 for at least the time period specified by the --offset option.
122 This can be useful to hibernate a node that remains idle, thus
123 reducing power consumption.
124
125
126 -j, --jobid=<id>
127 Job ID of interest. NOTE: The --jobid option can not be used in
128 conjunction with the --node option. When the --jobid option is
129 used in conjunction with the --up or --down option, all nodes
130 allocated to that job will considered the nodes used as a trig‐
131 ger event.
132
133
134 -n, --node[=host]
135 Host name(s) of interest. By default, all nodes associated with
136 the job (if --jobid is specified) or on the system are consid‐
137 ered for event triggers. NOTE: The --node option can not be
138 used in conjunction with the --jobid option. When the --jobid
139 option is used in conjunction with the --up, --down or --drained
140 option, all nodes allocated to that job will considered the
141 nodes used as a trigger event. Since this option's argument is
142 optional, for proper parsing the single letter option must be
143 followed immediately with the value and not include a space be‐
144 tween them. For example "-ntux" and not "-n tux".
145
146
147 -N, --noheader
148 Do not print the header when displaying a list of triggers.
149
150
151 -o, --offset=<seconds>
152 The specified action should follow the event by this time inter‐
153 val. Specify a negative value if action should preceded the
154 event. The default value is zero if no --offset option is spec‐
155 ified. The resolution of this time is about 20 seconds, so to
156 execute a script not less than five minutes prior to a job
157 reaching its time limit, specify --offset=320 (5 minutes plus 20
158 seconds).
159
160
161 -h, --primary_database_failure
162 Trigger an event when the primary database fails. This event is
163 triggered when the accounting plugin tries to open a connection
164 with mysql and it fails and the slurmctld needs the database for
165 some operations.
166
167
168 -H, --primary_database_resumed_operation
169 Trigger an event when the primary database resumes operation af‐
170 ter failure. It happens when the connection to mysql from the
171 accounting plugin is restored.
172
173
174 -g, --primary_slurmdbd_failure
175 Trigger an event when the primary slurmdbd fails. The trigger is
176 launched by slurmctld in the occasions it tries to connect to
177 slurmdbd, but receives no response on the socket.
178
179
180 -G, --primary_slurmdbd_resumed_operation
181 Trigger an event when the primary slurmdbd resumes operation af‐
182 ter failure. This event is triggered when opening the connec‐
183 tion from slurmctld to slurmdbd results in a response. It can
184 happen also in different situations, periodically every 15 sec‐
185 onds when checking the connection status, when saving state,
186 when agent queue is filling, and so on.
187
188
189 -e, --primary_slurmctld_acct_buffer_full
190 Trigger an event when primary slurmctld accounting buffer is
191 full.
192
193
194 -a, --primary_slurmctld_failure
195 Trigger an event when the primary slurmctld fails.
196
197
198 -b, --primary_slurmctld_resumed_control
199 Trigger an event when primary slurmctld resumes control.
200
201
202 -A, --primary_slurmctld_resumed_operation
203 Trigger an event when the primary slurmctld resuming operation
204 after failure.
205
206
207 -p, --program=<path>
208 Execute the program at the specified fully qualified pathname
209 when the event occurs. You may quote the path and include extra
210 program arguments if desired. The program will be executed as
211 the user who sets the trigger. If the program fails to termi‐
212 nate within 5 minutes, it will be killed along with any spawned
213 processes.
214
215
216 -Q, --quiet
217 Do not report non-fatal errors. This can be useful to clear
218 triggers which may have already been purged.
219
220
221 -r, --reconfig
222 Trigger an event when the system configuration changes. This is
223 triggered when the slurmctld daemon reads its configuration file
224 or when a node state changes.
225
226
227 --set Register an event trigger based upon the supplied options.
228 NOTE: An event is only triggered once. A new event trigger must
229 be set established for future events of the same type to be pro‐
230 cessed. Triggers can only be set if the command is run by the
231 user SlurmUser unless SlurmUser is configured as user root.
232
233
234 -t, --time
235 Trigger an event when the specified job's time limit is reached.
236 This must be used in conjunction with the --jobid option.
237
238
239 -u, --up
240 Trigger an event if the specified node is returned to service
241 from a DOWN state.
242
243
244 --user=<user_name_or_id>
245 Clear or get triggers created by the specified user. For exam‐
246 ple, a trigger created by user root for a job created by user
247 adam could be cleared with an option --user=root. Specify ei‐
248 ther a user name or user ID.
249
250
251 -v, --verbose
252 Print detailed event logging. This includes time-stamps on data
253 structures, record counts, etc.
254
255
256 -V , --version
257 Print version information and exit.
258
259
261 TRIG_ID
262 Trigger ID number.
263
264
265 RES_TYPE
266 Resource type: job or node
267
268
269 RES_ID Resource ID: job ID or host names or "*" for any host
270
271
272 TYPE Trigger type: time or fini (for jobs only), down or up (for jobs
273 or nodes), or drained, idle or reconfig (for nodes only)
274
275
276 OFFSET Time offset in seconds. Negative numbers indicated the action
277 should occur before the event (if possible)
278
279
280 USER Name of the user requesting the action
281
282
283 PROGRAM
284 Pathname of the program to execute when the event occurs
285
286
288 Executing strigger sends a remote procedure call to slurmctld. If
289 enough calls from strigger or other Slurm client commands that send re‐
290 mote procedure calls to the slurmctld daemon come in at once, it can
291 result in a degradation of performance of the slurmctld daemon, possi‐
292 bly resulting in a denial of service.
293
294 Do not run strigger or other Slurm client commands that send remote
295 procedure calls to slurmctld from loops in shell scripts or other pro‐
296 grams. Ensure that programs limit calls to strigger to the minimum nec‐
297 essary for the information you are trying to gather.
298
299
301 Some strigger options may be set via environment variables. These envi‐
302 ronment variables, along with their corresponding options, are listed
303 below. (Note: Command line options will always override these set‐
304 tings.)
305
306 SLURM_CONF The location of the Slurm configuration file.
307
308
310 Execute the program "/usr/sbin/primary_slurmctld_failure" whenever the
311 primary slurmctld fails.
312
313 $ cat /usr/sbin/primary_slurmctld_failure
314 #!/bin/bash
315 # Submit trigger for next primary slurmctld failure event
316 strigger --set --primary_slurmctld_failure \
317 --program=/usr/sbin/primary_slurmctld_failure
318 # Notify the administrator of the failure using e-mail
319 /bin/mail slurm_admin@site.com -s Primary_SLURMCTLD_FAILURE
320
321 $ strigger --set --primary_slurmctld_failure \
322 --program=/usr/sbin/primary_slurmctld_failure
323
324
325 Execute the program "/usr/sbin/slurm_admin_notify" whenever any node in
326 the cluster goes down. The subject line will include the node names
327 which have entered the down state (passed as an argument to the script
328 by Slurm).
329
330 $ cat /usr/sbin/slurm_admin_notify
331 #!/bin/bash
332 # Submit trigger for next event
333 strigger --set --node --down \
334 --program=/usr/sbin/slurm_admin_notify
335 # Notify administrator using by e-mail
336 /bin/mail slurm_admin@site.com -s NodesDown:$*
337
338 $ strigger --set --node --down \
339 --program=/usr/sbin/slurm_admin_notify
340
341
342 Execute the program "/usr/sbin/slurm_suspend_node" whenever any node in
343 the cluster remains in the idle state for at least 600 seconds.
344
345 $ strigger --set --node --idle --offset=600 \
346 --program=/usr/sbin/slurm_suspend_node
347
348
349 Execute the program "/home/joe/clean_up" when job 1234 is within 10
350 minutes of reaching its time limit.
351
352 $ strigger --set --jobid=1234 --time --offset=-600 \
353 --program=/home/joe/clean_up
354
355
356 Execute the program "/home/joe/node_died" when any node allocated to
357 job 1234 enters the DOWN state.
358
359 $ strigger --set --jobid=1234 --down \
360 --program=/home/joe/node_died
361
362
363 Show all triggers associated with job 1235.
364
365 $ strigger --get --jobid=1235
366 TRIG_ID RES_TYPE RES_ID TYPE OFFSET USER PROGRAM
367 123 job 1235 time -600 joe /home/bob/clean_up
368 125 job 1235 down 0 joe /home/bob/node_died
369
370
371 Delete event trigger 125.
372
373 $ strigger --clear --id=125
374
375
376 Execute /home/joe/job_fini upon completion of job 1237.
377
378 $ strigger --set --jobid=1237 --fini --program=/home/joe/job_fini
379
380
382 Copyright (C) 2007 The Regents of the University of California. Pro‐
383 duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
384 Copyright (C) 2008-2010 Lawrence Livermore National Security.
385 Copyright (C) 2010-2021 SchedMD LLC.
386
387 This file is part of Slurm, a resource management program. For de‐
388 tails, see <https://slurm.schedmd.com/>.
389
390 Slurm is free software; you can redistribute it and/or modify it under
391 the terms of the GNU General Public License as published by the Free
392 Software Foundation; either version 2 of the License, or (at your op‐
393 tion) any later version.
394
395 Slurm is distributed in the hope that it will be useful, but WITHOUT
396 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
397 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
398 for more details.
399
400
402 scontrol(1), sinfo(1), squeue(1)
403
404
405
406
407October 2021 Slurm Commands strigger(1)