1strigger(1) Slurm Commands strigger(1)
2
3
4
6 strigger - Used to set, get or clear Slurm trigger information.
7
8
10 strigger --set [OPTIONS...]
11 strigger --get [OPTIONS...]
12 strigger --clear [OPTIONS...]
13
14
16 strigger is used to set, get or clear Slurm trigger information. Trig‐
17 gers include events such as a node failing, a job reaching its time
18 limit or a job terminating. These events can cause actions such as the
19 execution of an arbitrary script. Typical uses include notifying sys‐
20 tem administrators of node failures and gracefully terminating a job
21 when its time limit is approaching. A hostlist expression for the
22 nodelist or job ID is passed as an argument to the program.
23
24 Trigger events are not processed instantly, but a check is performed
25 for trigger events on a periodic basis (currently every 15 seconds).
26 Any trigger events which occur within that interval will be compared
27 against the trigger programs set at the end of the time interval. The
28 trigger program will be executed once for any event occurring in that
29 interval. The record of those events (e.g. nodes which went DOWN in
30 the previous 15 seconds) will then be cleared. The trigger program
31 must set a new trigger before the end of the next interval to ensure
32 that no trigger events are missed OR the trigger must be created with
33 an argument of "--flags=PERM". If desired, multiple trigger programs
34 can be set for the same event.
35
36 IMPORTANT NOTE: This command can only set triggers if run by the user
37 SlurmUser unless SlurmUser is configured as user root. This is re‐
38 quired for the slurmctld daemon to set the appropriate user and group
39 IDs for the executed program. Also note that the trigger program is
40 executed on the same node that the slurmctld daemon uses rather than
41 some allocated compute node. To check the value of SlurmUser, run the
42 command:
43
44 scontrol show config | grep SlurmUser
45
46
48 -C, --backup_slurmctld_assumed_control
49 Trigger event when backup slurmctld assumes control.
50
51 -B, --backup_slurmctld_failure
52 Trigger an event when the backup slurmctld fails.
53
54 -c, --backup_slurmctld_resumed_operation
55 Trigger an event when the backup slurmctld resumes operation af‐
56 ter failure.
57
58 --burst_buffer
59 Trigger event when burst buffer error occurs.
60
61 --clear
62 Clear or delete a previously defined event trigger. The --id,
63 --jobid or --user option must be specified to identify the trig‐
64 ger(s) to be cleared. Only user root or the trigger's creator
65 can delete a trigger.
66
67 -M, --clusters=<string>
68 Clusters to issue commands to. Note that the SlurmDBD must be
69 up for this option to work properly.
70
71 -d, --down
72 Trigger an event if the specified node goes into a DOWN state.
73
74 -D, --drained
75 Trigger an event if the specified node goes into a DRAINED
76 state.
77
78 -F, --fail
79 Trigger an event if the specified node goes into a FAILING
80 state.
81
82 -f, --fini
83 Trigger an event when the specified job completes execution.
84
85 --flags=<flag>
86 Associate flags with the reservation. Multiple flags should be
87 comma separated. Valid flags include:
88
89 PERM Make the trigger permanent. Do not purge it after the
90 event occurs.
91
92 --front_end
93 Trigger events based upon changes in state of front end nodes
94 rather than compute nodes. Applies to Cray ALPS architectures
95 only, where the slurmd daemon executes on front end nodes rather
96 than the compute nodes. Use this option with either the --up or
97 --down option.
98
99 --get Show registered event triggers. Options can be used for filter‐
100 ing purposes.
101
102 -i, --id=<id>
103 Trigger ID number.
104
105 -I, --idle
106 Trigger an event if the specified node remains in an IDLE state
107 for at least the time period specified by the --offset option.
108 This can be useful to hibernate a node that remains idle, thus
109 reducing power consumption.
110
111 -j, --jobid=<id>
112 Job ID of interest. NOTE: The --jobid option can not be used in
113 conjunction with the --node option. When the --jobid option is
114 used in conjunction with the --up or --down option, all nodes
115 allocated to that job will considered the nodes used as a trig‐
116 ger event.
117
118 -n, --node[=host]
119 Host name(s) of interest. By default, all nodes associated with
120 the job (if --jobid is specified) or on the system are consid‐
121 ered for event triggers. NOTE: The --node option can not be
122 used in conjunction with the --jobid option. When the --jobid
123 option is used in conjunction with the --up, --down or --drained
124 option, all nodes allocated to that job will considered the
125 nodes used as a trigger event. Since this option's argument is
126 optional, for proper parsing the single letter option must be
127 followed immediately with the value and not include a space be‐
128 tween them. For example "-ntux" and not "-n tux".
129
130 -N, --noheader
131 Do not print the header when displaying a list of triggers.
132
133 -o, --offset=<seconds>
134 The specified action should follow the event by this time inter‐
135 val. Specify a negative value if action should preceded the
136 event. The default value is zero if no --offset option is spec‐
137 ified. The resolution of this time is about 20 seconds, so to
138 execute a script not less than five minutes prior to a job
139 reaching its time limit, specify --offset=320 (5 minutes plus 20
140 seconds).
141
142 -h, --primary_database_failure
143 Trigger an event when the primary database fails. This event is
144 triggered when the accounting plugin tries to open a connection
145 with mysql and it fails and the slurmctld needs the database for
146 some operations.
147
148 -H, --primary_database_resumed_operation
149 Trigger an event when the primary database resumes operation af‐
150 ter failure. It happens when the connection to mysql from the
151 accounting plugin is restored.
152
153 -g, --primary_slurmdbd_failure
154 Trigger an event when the primary slurmdbd fails. The trigger is
155 launched by slurmctld in the occasions it tries to connect to
156 slurmdbd, but receives no response on the socket.
157
158 -G, --primary_slurmdbd_resumed_operation
159 Trigger an event when the primary slurmdbd resumes operation af‐
160 ter failure. This event is triggered when opening the connec‐
161 tion from slurmctld to slurmdbd results in a response. It can
162 happen also in different situations, periodically every 15 sec‐
163 onds when checking the connection status, when saving state,
164 when agent queue is filling, and so on.
165
166 -e, --primary_slurmctld_acct_buffer_full
167 Trigger an event when primary slurmctld accounting buffer is
168 full.
169
170 -a, --primary_slurmctld_failure
171 Trigger an event when the primary slurmctld fails.
172
173 -b, --primary_slurmctld_resumed_control
174 Trigger an event when primary slurmctld resumes control.
175
176 -A, --primary_slurmctld_resumed_operation
177 Trigger an event when the primary slurmctld resuming operation
178 after failure.
179
180 -p, --program=<path>
181 Execute the program at the specified fully qualified pathname
182 when the event occurs. You may quote the path and include extra
183 program arguments if desired. The program will be executed as
184 the user who sets the trigger. If the program fails to termi‐
185 nate within 5 minutes, it will be killed along with any spawned
186 processes.
187
188 -Q, --quiet
189 Do not report non-fatal errors. This can be useful to clear
190 triggers which may have already been purged.
191
192 -r, --reconfig
193 Trigger an event when the system configuration changes. This is
194 triggered when the slurmctld daemon reads its configuration file
195 or when a node state changes.
196
197 --set Register an event trigger based upon the supplied options.
198 NOTE: An event is only triggered once. A new event trigger must
199 be set established for future events of the same type to be pro‐
200 cessed. Triggers can only be set if the command is run by the
201 user SlurmUser unless SlurmUser is configured as user root.
202
203 -t, --time
204 Trigger an event when the specified job's time limit is reached.
205 This must be used in conjunction with the --jobid option.
206
207 -u, --up
208 Trigger an event if the specified node is returned to service
209 from a DOWN state.
210
211 --user=<user_name_or_id>
212 Clear or get triggers created by the specified user. For exam‐
213 ple, a trigger created by user root for a job created by user
214 adam could be cleared with an option --user=root. Specify ei‐
215 ther a user name or user ID.
216
217 -v, --verbose
218 Print detailed event logging. This includes time-stamps on data
219 structures, record counts, etc.
220
221 -V , --version
222 Print version information and exit.
223
225 TRIG_ID
226 Trigger ID number.
227
228 RES_TYPE
229 Resource type: job or node
230
231 RES_ID Resource ID: job ID or host names or "*" for any host
232
233 TYPE Trigger type: time or fini (for jobs only), down or up (for jobs
234 or nodes), or drained, idle or reconfig (for nodes only)
235
236 OFFSET Time offset in seconds. Negative numbers indicated the action
237 should occur before the event (if possible)
238
239 USER Name of the user requesting the action
240
241 PROGRAM
242 Pathname of the program to execute when the event occurs
243
245 Executing strigger sends a remote procedure call to slurmctld. If
246 enough calls from strigger or other Slurm client commands that send re‐
247 mote procedure calls to the slurmctld daemon come in at once, it can
248 result in a degradation of performance of the slurmctld daemon, possi‐
249 bly resulting in a denial of service.
250
251 Do not run strigger or other Slurm client commands that send remote
252 procedure calls to slurmctld from loops in shell scripts or other pro‐
253 grams. Ensure that programs limit calls to strigger to the minimum nec‐
254 essary for the information you are trying to gather.
255
256
258 Some strigger options may be set via environment variables. These envi‐
259 ronment variables, along with their corresponding options, are listed
260 below. (Note: Command line options will always override these set‐
261 tings.)
262
263
264 SLURM_CONF The location of the Slurm configuration file.
265
266 SLURM_DEBUG_FLAGS Specify debug flags for strigger to use. See De‐
267 bugFlags in the slurm.conf(5) man page for a full
268 list of flags. The environment variable takes
269 precedence over the setting in the slurm.conf.
270
272 Execute the program "/usr/sbin/primary_slurmctld_failure" whenever the
273 primary slurmctld fails.
274
275 $ cat /usr/sbin/primary_slurmctld_failure
276 #!/bin/bash
277 # Submit trigger for next primary slurmctld failure event
278 strigger --set --primary_slurmctld_failure \
279 --program=/usr/sbin/primary_slurmctld_failure
280 # Notify the administrator of the failure using e-mail
281 /bin/mail slurm_admin@site.com -s Primary_SLURMCTLD_FAILURE
282
283 $ strigger --set --primary_slurmctld_failure \
284 --program=/usr/sbin/primary_slurmctld_failure
285
286
287 Execute the program "/usr/sbin/slurm_admin_notify" whenever any node in
288 the cluster goes down. The subject line will include the node names
289 which have entered the down state (passed as an argument to the script
290 by Slurm).
291
292 $ cat /usr/sbin/slurm_admin_notify
293 #!/bin/bash
294 # Submit trigger for next event
295 strigger --set --node --down \
296 --program=/usr/sbin/slurm_admin_notify
297 # Notify administrator using by e-mail
298 /bin/mail slurm_admin@site.com -s NodesDown:$*
299
300 $ strigger --set --node --down \
301 --program=/usr/sbin/slurm_admin_notify
302
303
304 Execute the program "/usr/sbin/slurm_suspend_node" whenever any node in
305 the cluster remains in the idle state for at least 600 seconds.
306
307 $ strigger --set --node --idle --offset=600 \
308 --program=/usr/sbin/slurm_suspend_node
309
310
311 Execute the program "/home/joe/clean_up" when job 1234 is within 10
312 minutes of reaching its time limit.
313
314 $ strigger --set --jobid=1234 --time --offset=-600 \
315 --program=/home/joe/clean_up
316
317
318 Execute the program "/home/joe/node_died" when any node allocated to
319 job 1234 enters the DOWN state.
320
321 $ strigger --set --jobid=1234 --down \
322 --program=/home/joe/node_died
323
324
325 Show all triggers associated with job 1235.
326
327 $ strigger --get --jobid=1235
328 TRIG_ID RES_TYPE RES_ID TYPE OFFSET USER PROGRAM
329 123 job 1235 time -600 joe /home/bob/clean_up
330 125 job 1235 down 0 joe /home/bob/node_died
331
332
333 Delete event trigger 125.
334
335 $ strigger --clear --id=125
336
337
338 Execute /home/joe/job_fini upon completion of job 1237.
339
340 $ strigger --set --jobid=1237 --fini --program=/home/joe/job_fini
341
342
344 Copyright (C) 2007 The Regents of the University of California. Pro‐
345 duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
346 Copyright (C) 2008-2010 Lawrence Livermore National Security.
347 Copyright (C) 2010-2022 SchedMD LLC.
348
349 This file is part of Slurm, a resource management program. For de‐
350 tails, see <https://slurm.schedmd.com/>.
351
352 Slurm is free software; you can redistribute it and/or modify it under
353 the terms of the GNU General Public License as published by the Free
354 Software Foundation; either version 2 of the License, or (at your op‐
355 tion) any later version.
356
357 Slurm is distributed in the hope that it will be useful, but WITHOUT
358 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
359 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
360 for more details.
361
362
364 scontrol(1), sinfo(1), squeue(1)
365
366
367
368
369August 2022 Slurm Commands strigger(1)