1SPANK(8) Slurm Component SPANK(8)
2
3
4
6 SPANK - Slurm Plug-in Architecture for Node and job (K)control
7
8
10 This manual briefly describes the capabilities of the Slurm Plug-in ar‐
11 chitecture for Node and job Kontrol (SPANK) as well as the SPANK con‐
12 figuration file: (By default: plugstack.conf.)
13
14 SPANK provides a very generic interface for stackable plug-ins which
15 may be used to dynamically modify the job launch code in Slurm. SPANK
16 plugins may be built without access to Slurm source code. They need
17 only be compiled against Slurm's spank.h header file, added to the
18 SPANK config file plugstack.conf, and they will be loaded at runtime
19 during the next job launch. Thus, the SPANK infrastructure provides ad‐
20 ministrators and other developers a low cost, low effort ability to dy‐
21 namically modify the runtime behavior of Slurm job launch.
22
23 Note: SPANK plugins using the Slurm APIs need to be recompiled when up‐
24 grading Slurm to a new major release.
25
27 SPANK plugins are loaded in up to five separate contexts during a Slurm
28 job. Briefly, the five contexts are:
29
30 local In local context, the plugin is loaded by srun. (i.e. the "lo‐
31 cal" part of a parallel job).
32
33 remote In remote context, the plugin is loaded by slurmstepd. (i.e.
34 the "remote" part of a parallel job).
35
36 allocator
37 In allocator context, the plugin is loaded in one of the job
38 allocation utilities sbatch or salloc.
39
40 slurmd In slurmd context, the plugin is loaded in the slurmd daemon
41 itself. Note: Plugins loaded in slurmd context persist for the
42 entire time slurmd is running, so if configuration is changed
43 or plugins are updated, slurmd must be restarted for the
44 changes to take effect.
45
46 job_script
47 In the job_script context, plugins are loaded in the context of
48 the job prolog or epilog. Note: Plugins are loaded in
49 job_script context on each run on the job prolog or epilog, in
50 a separate address space from plugins in slurmd context. This
51 means there is no state shared between this context and other
52 contexts, or even between one call to slurm_spank_job_prolog or
53 slurm_spank_job_epilog and subsequent calls.
54
55 In local context, only the init, exit, init_post_opt, and lo‐
56 cal_user_init functions are called. In allocator context, only the
57 init, exit, and init_post_opt functions are called. Similarly, in
58 slurmd context, only the init and slurmd_exit callbacks are active, and
59 in the job_script context, only the job_prolog and job_epilog callbacks
60 are used. Plugins may query the context in which they are running with
61 the spank_context and spank_remote functions defined in
62 <slurm/spank.h>.
63
64 SPANK plugins may be called from multiple points during the Slurm job
65 launch. A plugin may define the following functions:
66
67 slurm_spank_init
68 Called just after plugins are loaded. In remote context, this is just
69 after job step is initialized. This function is called before any
70 plugin option processing.
71
72 slurm_spank_job_prolog
73 Called at the same time as the job prolog. If this function returns a
74 negative value and the SPANK plugin that contains it is required in
75 the plugstack.conf, the node that this is run on will be drained.
76
77
78 slurm_spank_init_post_opt
79 Called at the same point as slurm_spank_init, but after all user op‐
80 tions to the plugin have been processed. The reason that the init and
81 init_post_opt callbacks are separated is so that plugins can process
82 system-wide options specified in plugstack.conf in the init callback,
83 then process user options, and finally take some action in
84 slurm_spank_init_post_opt if necessary. In the case of a heteroge‐
85 neous job, slurm_spank_init is invoked once per job component.
86
87 slurm_spank_local_user_init
88 Called in local (srun) context only after all options have been pro‐
89 cessed. This is called after the job ID and step IDs are available.
90 This happens in srun after the allocation is made, but before tasks
91 are launched.
92
93 slurm_spank_user_init
94 Called after privileges are temporarily dropped. (remote context
95 only)
96
97 slurm_spank_task_init_privileged
98 Called for each task just after fork, but before all elevated privi‐
99 leges are dropped. (remote context only)
100
101 slurm_spank_task_init
102 Called for each task just before execve (2). If you are restricing
103 memory with cgroups, memory allocated here will be in the job's
104 cgroup. (remote context only)
105
106 slurm_spank_task_post_fork
107 Called for each task from parent process after fork (2) is complete.
108 Due to the fact that slurmd does not exec any tasks until all tasks
109 have completed fork (2), this call is guaranteed to run before the
110 user task is executed. (remote context only)
111
112 slurm_spank_task_exit
113 Called for each task as its exit status is collected by Slurm. (re‐
114 mote context only)
115
116 slurm_spank_exit
117 Called once just before slurmstepd exits in remote context. In local
118 context, called before srun exits.
119
120 slurm_spank_job_epilog
121 Called at the same time as the job epilog. If this function returns a
122 negative value and the SPANK plugin that contains it is required in
123 the plugstack.conf, the node that this is run on will be drained.
124
125 slurm_spank_slurmd_exit
126 Called in slurmd when the daemon is shut down.
127
128 All of these functions have the same prototype, for example:
129
130 int slurm_spank_init (spank_t spank, int ac, char *argv[])
131
132
133 Where spank is the SPANK handle which must be passed back to Slurm when
134 the plugin calls functions like spank_get_item and spank_getenv. Con‐
135 figured arguments (See CONFIGURATION below) are passed in the argument
136 vector argv with argument count ac.
137
138 SPANK plugins can query the current list of supported slurm_spank sym‐
139 bols to determine if the current version supports a given plugin hook.
140 This may be useful because the list of plugin symbols may grow in the
141 future. The query is done using the spank_symbol_supported function,
142 which has the following prototype:
143
144 int spank_symbol_supported (const char *sym);
145
146
147 The return value is 1 if the symbol is supported, 0 if not.
148
149 SPANK plugins do not have direct access to internally defined Slurm
150 data structures. Instead, information about the currently executing job
151 is obtained via the spank_get_item function call.
152
153 spank_err_t spank_get_item (spank_t spank, spank_item_t item, ...);
154
155 The spank_get_item call must be passed the current SPANK handle as well
156 as the item requested, which is defined by the passed spank_item_t. A
157 variable number of pointer arguments are also passed, depending on
158 which item was requested by the plugin. A list of the valid values for
159 item is kept in the spank.h header file. Some examples are:
160
161 S_JOB_UID
162 User id for running job. (uid_t *) is third arg of spank_get_item
163
164 S_JOB_STEPID
165 Job step id for running job. (uint32_t *) is third arg of
166 spank_get_item.
167
168 S_TASK_EXIT_STATUS
169 Exit status for exited task. Only valid from slurm_spank_task_exit.
170 (int *) is third arg of spank_get_item.
171
172 S_JOB_ARGV
173 Complete job command line. Third and fourth args to spank_get_item
174 are (int *, char ***).
175
176 See spank.h for more details, and EXAMPLES below for an example of
177 spank_get_item usage.
178
179 SPANK functions in the local and allocator environment should use the
180 getenv, setenv, and unsetenv functions to view and modify the job's en‐
181 vironment. SPANK functions in the remote environment should use the
182 spank_getenv, spank_setenv, and spank_unsetenv functions to view and
183 modify the job's environment. spank_getenv searches the job's environ‐
184 ment for the environment variable var and copies the current value into
185 a buffer buf of length len. spank_setenv allows a SPANK plugin to set
186 or overwrite a variable in the job's environment, and spank_unsetenv
187 unsets an environment variable in the job's environment. The prototypes
188 are:
189
190 spank_err_t spank_getenv (spank_t spank, const char *var,
191 char *buf, int len);
192 spank_err_t spank_setenv (spank_t spank, const char *var,
193 const char *val, int overwrite);
194 spank_err_t spank_unsetenv (spank_t spank, const char *var);
195
196 These are only necessary in remote context since modifications of the
197 standard process environment using setenv (3), getenv (3), and unsetenv
198 (3) may be used in local context.
199
200 Functions are also available from within the SPANK plugins to establish
201 environment variables to be exported to the Slurm PrologSlurmctld, Pro‐
202 log, Epilog and EpilogSlurmctld programs (the so-called job control en‐
203 vironment). The name of environment variables established by these
204 calls will be prepended with the string SPANK_ in order to avoid any
205 security implications of arbitrary environment variable control. (After
206 all, the job control scripts do run as root or the Slurm user.).
207
208 These functions are available from local context only.
209
210 spank_err_t spank_job_control_getenv(spank_t spank, const char *var,
211 char *buf, int len);
212 spank_err_t spank_job_control_setenv(spank_t spank, const char *var,
213 const char *val, int overwrite);
214 spank_err_t spank_job_control_unsetenv(spank_t spank, const char *var);
215
216 See spank.h for more information, and EXAMPLES below for an example for
217 spank_getenv usage.
218
219 Many of the described SPANK functions available to plugins return er‐
220 rors via the spank_err_t error type. On success, the return value will
221 be set to ESPANK_SUCCESS, while on failure, the return value will be
222 set to one of many error values defined in slurm/spank.h. The SPANK in‐
223 terface provides a simple function
224
225 const char * spank_strerror(spank_err_t err);
226
227 which may be used to translate a spank_err_t value into its string rep‐
228 resentation.
229
230
231 The slurm_spank_log function can be used to print messages back to the
232 user at an error level. This is to keep users from having to rely on
233 the slurm_error function, which can be confusing because it prepends
234 "error:" to every message.
235
236
238 SPANK plugins also have an interface through which they may define and
239 implement extra job options. These options are made available to the
240 user through Slurm commands such as srun(1), salloc(1), and sbatch(1).
241 If the option is specified by the user, its value is forwarded and reg‐
242 istered with the plugin in slurmd when the job is run. In this way,
243 SPANK plugins may dynamically provide new options and functionality to
244 Slurm.
245
246 Each option registered by a plugin to Slurm takes the form of a struct
247 spank_option which is declared in <slurm/spank.h> as
248
249 struct spank_option {
250 char * name;
251 char * arginfo;
252 char * usage;
253 int has_arg;
254 int val;
255 spank_opt_cb_f cb;
256 };
257
258
259 Where
260
261 name is the name of the option. Its length is limited to SPANK_OP‐
262 TION_MAXLEN defined in <slurm/spank.h>.
263
264 arginfo
265 is a description of the argument to the option, if the option
266 does take an argument.
267
268 usage is a short description of the option suitable for --help output.
269
270 has_arg
271 0 if option takes no argument, 1 if option takes an argument,
272 and 2 if the option takes an optional argument. (See getopt_long
273 (3)).
274
275 val A plugin-local value to return to the option callback function.
276
277 cb A callback function that is invoked when the plugin option is
278 registered with Slurm. spank_opt_cb_f is typedef'd in
279 <slurm/spank.h> as
280
281 typedef int (*spank_opt_cb_f) (int val, const char *optarg,
282 int remote);
283
284 Where val is the value of the val field in the spank_option
285 struct, optarg is the supplied argument if applicable, and re‐
286 mote is 0 if the function is being called from the "local" host
287 (e.g. host where srun or sbatch/salloc are invoked) or 1 from
288 the "remote" host (host where slurmd/slurmstepd run) but only
289 executed by slurmstepd (remote context) if the option was regis‐
290 tered for such context.
291
292 Plugin options may be registered with Slurm using the spank_option_reg‐
293 ister function. This function is only valid when called from the
294 plugin's slurm_spank_init handler, and registers one option at a time.
295 The prototype is
296
297 spank_err_t spank_option_register (spank_t sp,
298 struct spank_option *opt);
299
300 This function will return ESPANK_SUCCESS on successful registration of
301 an option, or ESPANK_BAD_ARG for errors including invalid spank_t han‐
302 dle, or when the function is not called from the slurm_spank_init func‐
303 tion. All options need to be registered from all contexts in which they
304 will be used. For instance, if an option is only used in local (srun)
305 and remote (slurmd) contexts, then spank_option_register should only be
306 called from within those contexts. For example:
307
308 if (spank_context() != S_CTX_ALLOCATOR)
309 spank_option_register (sp, opt);
310
311 If, however, the option is used in all contexts, the spank_option_reg‐
312 ister needs to be called everywhere.
313
314 In addition to spank_option_register, plugins may also export options
315 to Slurm by defining a table of struct spank_option with the symbol
316 name spank_options. This method, however, is not supported for use with
317 sbatch and salloc (allocator context), thus the use of spank_op‐
318 tion_register is preferred. When using the spank_options table, the fi‐
319 nal element in the array must be filled with zeros. A SPANK_OPTIONS_TA‐
320 BLE_END macro is provided in <slurm/spank.h> for this purpose.
321
322 When an option is provided by the user on the local side, either by
323 command line options or by environment variables, Slurm will immedi‐
324 ately invoke the option's callback with remote=0. This is meant for the
325 plugin to do local sanity checking of the option before the value is
326 sent to the remote side during job launch. If the argument the user
327 specified is invalid, the plugin should issue an error and issue a
328 non-zero return code from the callback. The plugin should be able to
329 handle cases where the spank option is set multiple times through envi‐
330 ronment variables and command line options. Environment variables are
331 processed before command line options.
332
333 On the remote side, options and their arguments are registered just af‐
334 ter SPANK plugins are loaded and before the spank_init handler is
335 called. This allows plugins to modify behavior of all plugin function‐
336 ality based on the value of user-provided options. (See EXAMPLES below
337 for a plugin that registers an option with Slurm).
338
339 As an alternative to use of an option callback and global variable,
340 plugins can use the spank_option_getopt option to check for supplied
341 options after option processing. This function has the prototype:
342
343 spank_err_t spank_option_getopt(spank_t sp,
344 struct spank_option *opt, char **optargp);
345
346 This function returns ESPANK_SUCCESS if the option defined in the
347 struct spank_option opt has been used by the user. If optargp
348 is non-NULL then it is set to any option argument passed (if the option
349 takes an argument). The use of this method is required to process
350 options in job_script context (slurm_spank_job_prolog and
351 slurm_spank_job_epilog). This function is valid in the following contexts:
352 slurm_spank_job_prolog, slurm_spank_local_user_init, slurm_spank_user_init,
353 slurm_spank_task_init_privileged, slurm_spank_task_init, slurm_spank_task_exit,
354 and slurm_spank_job_epilog.
355
356
358 The default SPANK plug-in stack configuration file is plugstack.conf in
359 the same directory as slurm.conf(5), though this may be changed via the
360 Slurm config parameter PlugStackConfig. Normally the plugstack.conf
361 file should be identical on all nodes of the cluster. The config file
362 lists SPANK plugins, one per line, along with whether the plugin is re‐
363 quired or optional, and any global arguments that are to be passed to
364 the plugin for runtime configuration. Comments are preceded with '#'
365 and extend to the end of the line. If the configuration file is miss‐
366 ing or empty, it will simply be ignored.
367
368 The format of each non-comment line in the configuration file is:
369
370 required/optional plugin arguments
371
372 For example:
373
374 optional /usr/lib/slurm/test.so
375
376 Tells slurmd to load the plugin test.so passing no arguments. If a
377 SPANK plugin is required, then failure of any of the plugin's functions
378 will cause slurmd to terminate the job, while optional plugins only
379 cause a warning.
380
381 If a fully-qualified path is not specified for a plugin, then the cur‐
382 rently configured PluginDir in slurm.conf(5) is searched.
383
384 SPANK plugins are stackable, meaning that more than one plugin may be
385 placed into the config file. The plugins will simply be called in or‐
386 der, one after the other, and appropriate action taken on failure given
387 that state of the plugin's optional flag.
388
389 Additional config files or directories of config files may be included
390 in plugstack.conf with the include keyword. The include keyword must
391 appear on its own line, and takes a glob as its parameter, so multiple
392 files may be included from one include line. For example, the following
393 syntax will load all config files in the /etc/slurm/plugstack.conf.d
394 directory, in local collation order:
395
396 include /etc/slurm/plugstack.conf.d/*
397
398 which might be considered a more flexible method for building up a
399 spank plugin stack.
400
401 The SPANK config file is re-read on each job launch, so editing the
402 config file will not affect running jobs. However care should be taken
403 so that a partially edited config file is not read by a launching job.
404
405
407 /etc/slurm/plugstack.conf:
408 This example plugstack.conf file shows a configuration that ac‐
409 tivates the renice.so SPANK plugin.
410 #
411 # SPANK config file
412 #
413 # required? plugin parameters
414 #
415 optional /usr/lib/SPANK_renice.so min_prio=-10
416
417 /usr/local/src/renice.c:
418 A sample SPANK plugin to modify the nice value of job tasks.
419 This plugin adds a --renice=[prio] option to srun which users
420 can use to set the priority of all remote tasks. Priority may
421 also be specified via a SLURM_RENICE environment variable. A
422 minimum priority may be established via a "min_prio" parameter
423 in plugstack.conf.
424 #include <sys/types.h>
425 #include <stdio.h>
426 #include <stdlib.h>
427 #include <unistd.h>
428 #include <string.h>
429 #include <sys/resource.h>
430
431 #include <slurm/spank.h>
432
433 /*
434 * All spank plugins must define this macro for the
435 * Slurm plugin loader.
436 */
437 SPANK_PLUGIN(renice, 1);
438
439 #define PRIO_ENV_VAR "SLURM_RENICE"
440 #define PRIO_NOT_SET -1
441
442 /*
443 * Minimum allowable value for priority. May be
444 * set globally via plugin option min_prio=<prio>
445 */
446 static int min_prio = -20;
447
448 static int prio = PRIO_NOT_SET;
449
450 static int _renice_opt_process(int val, const char *optarg, int remote);
451 static int _str2prio(const char *str, int *p2int);
452
453 /*
454 * Provide a --renice=[prio] option to srun:
455 */
456 struct spank_option spank_options[] =
457 {
458 {
459 "renice",
460 "[prio]",
461 "Re-nice job tasks to priority [prio].",
462 2,
463 0,
464 _renice_opt_process
465 },
466 SPANK_OPTIONS_TABLE_END
467 };
468
469 /*
470 * Called from both srun and slurmd.
471 */
472 int slurm_spank_init(spank_t sp, int ac, char **av)
473 {
474 int i;
475
476 /* Don't do anything in sbatch/salloc */
477 if (spank_context () == S_CTX_ALLOCATOR)
478 return ESPANK_SUCCESS;
479
480 for (i = 0; i < ac; i++) {
481 if (!strncmp("min_prio=", av[i], 9)) {
482 const char *optarg = av[i] + 9;
483
484 if (_str2prio(optarg, &min_prio))
485 slurm_error ("Ignoring invalid min_prio value: %s", av[i]);
486 } else {
487 slurm_error ("renice: Invalid option: %s", av[i]);
488 }
489 }
490
491 if (!spank_remote(sp))
492 slurm_verbose("renice: min_prio = %d", min_prio);
493
494 return ESPANK_SUCCESS;
495 }
496
497 int slurm_spank_task_post_fork(spank_t sp, int ac, char **av)
498 {
499 int rc;
500 pid_t pid;
501 int taskid;
502
503 if (prio == PRIO_NOT_SET) {
504 /* See if SLURM_RENICE env var is set by user */
505 char val[1024];
506
507 rc = spank_getenv(sp, PRIO_ENV_VAR, val, sizeof(val));
508
509 if (rc)
510 return rc;
511
512 rc = _str2prio(val, &prio);
513
514 if (rc) {
515 slurm_error("Bad value for %s: %s", PRIO_ENV_VAR, optarg);
516 return rc;
517 }
518
519 if (prio < min_prio) {
520 slurm_error("%s=%d not allowed, using min=%d",
521 PRIO_ENV_VAR, prio, min_prio);
522 }
523 }
524
525 if (prio < min_prio)
526 prio = min_prio;
527
528 spank_get_item(sp, S_TASK_GLOBAL_ID, &taskid);
529 spank_get_item(sp, S_TASK_PID, &pid);
530
531 slurm_info("re-nicing task%d pid %d to %d", taskid, (int) pid, prio);
532
533 if (setpriority(PRIO_PROCESS, (int) pid, (int) prio)) {
534 slurm_error("setpriority: %m");
535 return -ESPANK_ERROR;
536 }
537
538 return ESPANK_SUCCESS;
539 }
540
541 static int _str2prio(const char *str, int *p2int)
542 {
543 long l;
544 char *p = NULL;
545
546 if (!str || str[0] == '\0')
547 return -ESPANK_BAD_ARG;
548
549 l = strtol(str, &p, 10);
550
551 if (!p || (*p != '\0'))
552 return -ESPANK_BAD_ARG;
553
554 if ((l < -20) || (l > 20)) {
555 slurm_error("Specify value between -20 and 20");
556 return -ESPANK_BAD_ARG;
557 }
558
559 *p2int = (int) l;
560
561 return ESPANK_SUCCESS;
562 }
563
564 static int _renice_opt_process(int val, const char *optarg, int remote)
565 {
566 int rc;
567
568 if (optarg == NULL) {
569 slurm_error("renice: invalid NULL argument!");
570 return -ESPANK_BAD_ARG;
571 }
572
573 if ((rc = _str2prio(optarg, &prio))) {
574 slurm_error("Bad value for --renice: %s", optarg);
575 return rc;
576 }
577
578 if (prio < min_prio) {
579 slurm_error("--renice=%d not allowed, will use min=%d",
580 prio, min_prio);
581 }
582
583 return ESPANK_SUCCESS;
584 }
585
586 Compile command:
587 # gcc -ggdb3 -I${SLURM_PATH}/include/ -fPIC -shared -o /usr/lib/SPANK_renice.so /usr/local/src/renice.c
588
589
591 Portions copyright (C) 2010-2018 SchedMD LLC. Copyright (C) 2006 The
592 Regents of the University of California. Produced at Lawrence Liver‐
593 more National Laboratory (cf, DISCLAIMER). CODE-OCEC-09-009. All
594 rights reserved.
595
596 This file is part of Slurm, a resource management program. For de‐
597 tails, see <https://slurm.schedmd.com/>.
598
599 Slurm is free software; you can redistribute it and/or modify it under
600 the terms of the GNU General Public License as published by the Free
601 Software Foundation; either version 2 of the License, or (at your op‐
602 tion) any later version.
603
604 Slurm is distributed in the hope that it will be useful, but WITHOUT
605 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
606 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
607 for more details.
608
610 /etc/slurm/slurm.conf - Slurm configuration file.
611 /etc/slurm/plugstack.conf - SPANK configuration file.
612 /usr/include/slurm/spank.h - SPANK header file.
613
615 srun(1), slurm.conf(5)
616
617
618
619April 2021 Slurm Component SPANK(8)