1SPANK(8) Slurm Component SPANK(8)
2
3
4
6 SPANK - Slurm Plug-in Architecture for Node and job (K)control
7
8
10 This manual briefly describes the capabilities of the Slurm Plug-in
11 architecture for Node and job Kontrol (SPANK) as well as the SPANK con‐
12 figuration file: (By default: plugstack.conf.)
13
14 SPANK provides a very generic interface for stackable plug-ins which
15 may be used to dynamically modify the job launch code in Slurm. SPANK
16 plugins may be built without access to Slurm source code. They need
17 only be compiled against Slurm's spank.h header file, added to the
18 SPANK config file plugstack.conf, and they will be loaded at runtime
19 during the next job launch. Thus, the SPANK infrastructure provides
20 administrators and other developers a low cost, low effort ability to
21 dynamically modify the runtime behavior of Slurm job launch.
22
23 Note: SPANK plugins using the Slurm APIs need to be recompiled when
24 upgrading Slurm to a new major release.
25
27 SPANK plugins are loaded in up to five separate contexts during a Slurm
28 job. Briefly, the five contexts are:
29
30 local In local context, the plugin is loaded by srun. (i.e. the
31 "local" part of a parallel job).
32
33 remote In remote context, the plugin is loaded by slurmstepd. (i.e.
34 the "remote" part of a parallel job).
35
36 allocator
37 In allocator context, the plugin is loaded in one of the job
38 allocation utilities sbatch or salloc.
39
40 slurmd In slurmd context, the plugin is loaded in the
41 slurmd daemon itself. Note: Plugins loaded in slurmd context
42 persist for the entire time slurmd is running, so if configura‐
43 tion is changed or plugins are updated, slurmd must be restarted
44 for the changes to take effect.
45
46 job_script
47 In the job_script context, plugins are loaded in the context of
48 the job prolog or epilog. Note: Plugins are loaded in job_script
49 context on each run on the job prolog or epilog, in a separate
50 address space from plugins in slurmd context. This means there
51 is no state shared between this context and other contexts, or
52 even between one call to slurm_spank_job_prolog or
53 slurm_spank_job_epilog and subsequent calls.
54
55 In local context, only the init, exit, init_post_opt, and
56 local_user_init functions are called. In allocator context, only the
57 init, exit, and init_post_opt functions are called. Similarly, in
58 slurmd context, only the slurmd_init and slurmd_exit callbacks are
59 active, and in the job_script context, only the job_prolog and job_epi‐
60 log callbacks are used. Plugins may query the context in which they
61 are running with the spank_context and spank_remote functions defined
62 in <slurm/spank.h>.
63
64 SPANK plugins may be called from multiple points during the Slurm job
65 launch. A plugin may define the following functions:
66
67 slurm_spank_init
68 Called just after plugins are loaded. In remote context, this is just
69 after job step is initialized. This function is called before any
70 plugin option processing. This function is not called in slurmd con‐
71 text.
72
73 slurm_spank_slurmd_init
74 Called in slurmd just after the daemon is started.
75
76 slurm_spank_job_prolog
77 Called at the same time as the job prolog. If this function returns a
78 negative value and the SPANK plugin that contains it is required in
79 the plugstack.conf, the node that this is run on will be drained.
80
81
82 slurm_spank_init_post_opt
83 Called at the same point as slurm_spank_init, but after all user
84 options to the plugin have been processed. The reason that the init
85 and init_post_opt callbacks are separated is so that plugins can
86 process system-wide options specified in plugstack.conf in the init
87 callback, then process user options, and finally take some action in
88 slurm_spank_init_post_opt if necessary. In the case of a heteroge‐
89 neous job, slurm_spank_init is invoked once per job component.
90
91 slurm_spank_local_user_init
92 Called in local (srun) context only after all options have been pro‐
93 cessed. This is called after the job ID and step IDs are available.
94 This happens in srun after the allocation is made, but before tasks
95 are launched.
96
97 slurm_spank_user_init
98 Called after privileges are temporarily dropped. (remote context
99 only)
100
101 slurm_spank_task_init_privileged
102 Called for each task just after fork, but before all elevated privi‐
103 leges are dropped. (remote context only)
104
105 slurm_spank_task_init
106 Called for each task just before execve (2). (remote context only)
107
108 slurm_spank_task_post_fork
109 Called for each task from parent process after fork (2) is complete.
110 Due to the fact that slurmd does not exec any tasks until all tasks
111 have completed fork (2), this call is guaranteed to run before the
112 user task is executed. (remote context only)
113
114 slurm_spank_task_exit
115 Called for each task as its exit status is collected by Slurm.
116 (remote context only)
117
118 slurm_spank_exit
119 Called once just before slurmstepd exits in remote context. In local
120 context, called before srun exits.
121
122 slurm_spank_job_epilog
123 Called at the same time as the job epilog. If this function returns a
124 negative value and the SPANK plugin that contains it is required in
125 the plugstack.conf, the node that this is run on will be drained.
126
127 slurm_spank_slurmd_exit
128 Called in slurmd when the daemon is shut down.
129
130 All of these functions have the same prototype, for example:
131
132 int slurm_spank_init (spank_t spank, int ac, char *argv[])
133
134
135 Where spank is the SPANK handle which must be passed back to Slurm when
136 the plugin calls functions like spank_get_item and spank_getenv. Con‐
137 figured arguments (See CONFIGURATION below) are passed in the argument
138 vector argv with argument count ac.
139
140 SPANK plugins can query the current list of supported slurm_spank sym‐
141 bols to determine if the current version supports a given plugin hook.
142 This may be useful because the list of plugin symbols may grow in the
143 future. The query is done using the spank_symbol_supported function,
144 which has the following prototype:
145
146 int spank_symbol_supported (const char *sym);
147
148
149 The return value is 1 if the symbol is supported, 0 if not.
150
151 SPANK plugins do not have direct access to internally defined Slurm
152 data structures. Instead, information about the currently executing job
153 is obtained via the spank_get_item function call.
154
155 spank_err_t spank_get_item (spank_t spank, spank_item_t item, ...);
156
157 The spank_get_item call must be passed the current SPANK handle as well
158 as the item requested, which is defined by the passed spank_item_t. A
159 variable number of pointer arguments are also passed, depending on
160 which item was requested by the plugin. A list of the valid values for
161 item is kept in the spank.h header file. Some examples are:
162
163 S_JOB_UID
164 User id for running job. (uid_t *) is third arg of spank_get_item
165
166 S_JOB_STEPID
167 Job step id for running job. (uint32_t *) is third arg of
168 spank_get_item.
169
170 S_TASK_EXIT_STATUS
171 Exit status for exited task. Only valid from slurm_spank_task_exit.
172 (int *) is third arg of spank_get_item.
173
174 S_JOB_ARGV
175 Complete job command line. Third and fourth args to spank_get_item
176 are (int *, char ***).
177
178 See spank.h for more details, and EXAMPLES below for an example of
179 spank_get_item usage.
180
181 SPANK functions in the local and allocator environment should use the
182 getenv, setenv, and unsetenv functions to view and modify the job's
183 environment. SPANK functions in the remote environment should use the
184 spank_getenv, spank_setenv, and spank_unsetenv functions to view and
185 modify the job's environment. spank_getenv searches the job's environ‐
186 ment for the environment variable var and copies the current value into
187 a buffer buf of length len. spank_setenv allows a SPANK plugin to set
188 or overwrite a variable in the job's environment, and spank_unsetenv
189 unsets an environment variable in the job's environment. The prototypes
190 are:
191
192 spank_err_t spank_getenv (spank_t spank, const char *var,
193 char *buf, int len);
194 spank_err_t spank_setenv (spank_t spank, const char *var,
195 const char *val, int overwrite);
196 spank_err_t spank_unsetenv (spank_t spank, const char *var);
197
198 These are only necessary in remote context since modifications of the
199 standard process environment using setenv (3), getenv (3), and unsetenv
200 (3) may be used in local context.
201
202 Functions are also available from within the SPANK plugins to establish
203 environment variables to be exported to the Slurm PrologSlurmctld, Pro‐
204 log, Epilog and EpilogSlurmctld programs (the so-called job control
205 environment). The name of environment variables established by these
206 calls will be prepended with the string SPANK_ in order to avoid any
207 security implications of arbitrary environment variable control. (After
208 all, the job control scripts do run as root or the Slurm user.).
209
210 These functions are available from local context only.
211
212 spank_err_t spank_job_control_getenv(spank_t spank, const char *var,
213 char *buf, int len);
214 spank_err_t spank_job_control_setenv(spank_t spank, const char *var,
215 const char *val, int overwrite);
216 spank_err_t spank_job_control_unsetenv(spank_t spank, const char *var);
217
218 See spank.h for more information, and EXAMPLES below for an example for
219 spank_getenv usage.
220
221 Many of the described SPANK functions available to plugins return
222 errors via the spank_err_t error type. On success, the return value
223 will be set to ESPANK_SUCCESS, while on failure, the return value will
224 be set to one of many error values defined in slurm/spank.h. The SPANK
225 interface provides a simple function
226
227 const char * spank_strerror(spank_err_t err);
228
229 which may be used to translate a spank_err_t value into its string rep‐
230 resentation.
231
232
234 SPANK plugins also have an interface through which they may define and
235 implement extra job options. These options are made available to the
236 user through Slurm commands such as srun(1), salloc(1), and sbatch(1).
237 If the option is specified by the user, its value is forwarded and reg‐
238 istered with the plugin in slurmd when the job is run. In this way,
239 SPANK plugins may dynamically provide new options and functionality to
240 Slurm.
241
242 Each option registered by a plugin to Slurm takes the form of a struct
243 spank_option which is declared in <slurm/spank.h> as
244
245 struct spank_option {
246 char * name;
247 char * arginfo;
248 char * usage;
249 int has_arg;
250 int val;
251 spank_opt_cb_f cb;
252 };
253
254
255 Where
256
257 name is the name of the option. Its length is limited to
258 SPANK_OPTION_MAXLEN defined in <slurm/spank.h>.
259
260 arginfo
261 is a description of the argument to the option, if the option
262 does take an argument.
263
264 usage is a short description of the option suitable for --help output.
265
266 has_arg
267 0 if option takes no argument, 1 if option takes an argument,
268 and 2 if the option takes an optional argument. (See getopt_long
269 (3)).
270
271 val A plugin-local value to return to the option callback function.
272
273 cb A callback function that is invoked when the plugin option is
274 registered with Slurm. spank_opt_cb_f is typedef'd in
275 <slurm/spank.h> as
276
277 typedef int (*spank_opt_cb_f) (int val, const char *optarg,
278 int remote);
279
280 Where val is the value of the val field in the spank_option
281 struct, optarg is the supplied argument if applicable, and
282 remote is 0 if the function is being called from the "local"
283 host (e.g. host where srun or sbatch/salloc are invoked) or 1
284 from the "remote" host (host where slurmd/slurmstepd run) but
285 only executed by slurmstepd (remote context) if the option was
286 registered for such context.
287
288 Plugin options may be registered with Slurm using the spank_option_reg‐
289 ister function. This function is only valid when called from the plug‐
290 in's slurm_spank_init handler, and registers one option at a time. The
291 prototype is
292
293 spank_err_t spank_option_register (spank_t sp,
294 struct spank_option *opt);
295
296 This function will return ESPANK_SUCCESS on successful registration of
297 an option, or ESPANK_BAD_ARG for errors including invalid spank_t han‐
298 dle, or when the function is not called from the slurm_spank_init func‐
299 tion. All options need to be registered from all contexts in which they
300 will be used. For instance, if an option is only used in local (srun)
301 and remote (slurmd) contexts, then spank_option_register should only be
302 called from within those contexts. For example:
303
304 if (spank_context() != S_CTX_ALLOCATOR)
305 spank_option_register (sp, opt);
306
307 If, however, the option is used in all contexts, the spank_option_reg‐
308 ister needs to be called everywhere.
309
310 In addition to spank_option_register, plugins may also export options
311 to Slurm by defining a table of struct spank_option with the symbol
312 name spank_options. This method, however, is not supported for use with
313 sbatch and salloc (allocator context), thus the use of
314 spank_option_register is preferred. When using the spank_options table,
315 the final element in the array must be filled with zeros. A
316 SPANK_OPTIONS_TABLE_END macro is provided in <slurm/spank.h> for this
317 purpose.
318
319 When an option is provided by the user on the local side, either by
320 command line options or by environment variables, Slurm will immedi‐
321 ately invoke the option's callback with remote=0. This is meant for the
322 plugin to do local sanity checking of the option before the value is
323 sent to the remote side during job launch. If the argument the user
324 specified is invalid, the plugin should issue an error and issue a
325 non-zero return code from the callback. The plugin should be able to
326 handle cases where the spank option is set multiple times through envi‐
327 ronment variables and command line options. Environment variables are
328 processed before command line options.
329
330 On the remote side, options and their arguments are registered just
331 after SPANK plugins are loaded and before the spank_init handler is
332 called. This allows plugins to modify behavior of all plugin function‐
333 ality based on the value of user-provided options. (See EXAMPLES below
334 for a plugin that registers an option with Slurm).
335
336 As an alternative to use of an option callback and global variable,
337 plugins can use the spank_option_getopt option to check for supplied
338 options after option processing. This function has the prototype:
339
340 spank_err_t spank_option_getopt(spank_t sp,
341 struct spank_option *opt, char **optargp);
342
343 This function returns ESPANK_SUCCESS if the option defined in the
344 struct spank_option opt has been used by the user. If optargp
345 is non-NULL then it is set to any option argument passed (if the option
346 takes an argument). The use of this method is required to process
347 options in job_script context (slurm_spank_job_prolog and
348 slurm_spank_job_epilog). This function is valid in the following contexts:
349 slurm_spank_job_prolog, slurm_spank_local_user_init, slurm_spank_user_init,
350 slurm_spank_task_init_privileged, slurm_spank_task_init, slurm_spank_task_exit,
351 and slurm_spank_job_epilog.
352
353
355 The default SPANK plug-in stack configuration file is plugstack.conf in
356 the same directory as slurm.conf(5), though this may be changed via the
357 Slurm config parameter PlugStackConfig. Normally the plugstack.conf
358 file should be identical on all nodes of the cluster. The config file
359 lists SPANK plugins, one per line, along with whether the plugin is
360 required or optional, and any global arguments that are to be passed to
361 the plugin for runtime configuration. Comments are preceded with '#'
362 and extend to the end of the line. If the configuration file is miss‐
363 ing or empty, it will simply be ignored.
364
365 The format of each non-comment line in the configuration file is:
366
367 required/optional plugin arguments
368
369 For example:
370
371 optional /usr/lib/slurm/test.so
372
373 Tells slurmd to load the plugin test.so passing no arguments. If a
374 SPANK plugin is required, then failure of any of the plugin's functions
375 will cause slurmd to terminate the job, while optional plugins only
376 cause a warning.
377
378 If a fully-qualified path is not specified for a plugin, then the cur‐
379 rently configured PluginDir in slurm.conf(5) is searched.
380
381 SPANK plugins are stackable, meaning that more than one plugin may be
382 placed into the config file. The plugins will simply be called in
383 order, one after the other, and appropriate action taken on failure
384 given that state of the plugin's optional flag.
385
386 Additional config files or directories of config files may be included
387 in plugstack.conf with the include keyword. The include keyword must
388 appear on its own line, and takes a glob as its parameter, so multiple
389 files may be included from one include line. For example, the following
390 syntax will load all config files in the /etc/slurm/plugstack.conf.d
391 directory, in local collation order:
392
393 include /etc/slurm/plugstack.conf.d/*
394
395 which might be considered a more flexible method for building up a
396 spank plugin stack.
397
398 The SPANK config file is re-read on each job launch, so editing the
399 config file will not affect running jobs. However care should be taken
400 so that a partially edited config file is not read by a launching job.
401
402
404 Simple SPANK config file:
405
406 #
407 # SPANK config file
408 #
409 # required? plugin args
410 #
411 optional renice.so min_prio=-10
412 required /usr/lib/slurm/test.so
413
414
415 The following is a simple SPANK plugin to modify the nice value of job
416 tasks. This plugin adds a --renice=[prio] option to srun which users
417 can use to set the priority of all remote tasks. Priority may also be
418 specified via a SLURM_RENICE environment variable. A minimum priority
419 may be established via a "min_prio" parameter in plugstack.conf (See
420 above for example).
421
422 /*
423 * To compile:
424 * gcc -shared -o renice.so renice.c
425 *
426 */
427 #include <sys/types.h>
428 #include <stdio.h>
429 #include <stdlib.h>
430 #include <unistd.h>
431 #include <string.h>
432 #include <sys/resource.h>
433
434 #include <slurm/spank.h>
435
436 /*
437 * All spank plugins must define this macro for the
438 * Slurm plugin loader.
439 */
440 SPANK_PLUGIN(renice, 1);
441
442 #define PRIO_ENV_VAR "SLURM_RENICE"
443 #define PRIO_NOT_SET 42
444
445 /*
446 * Minimum allowable value for priority. May be
447 * set globally via plugin option min_prio=<prio>
448 */
449 static int min_prio = -20;
450
451 static int prio = PRIO_NOT_SET;
452
453 static int _renice_opt_process (int val,
454 const char *optarg,
455 int remote);
456 static int _str2prio (const char *str, int *p2int);
457
458 /*
459 * Provide a --renice=[prio] option to srun:
460 */
461 struct spank_option spank_options[] =
462 {
463 { "renice", "[prio]",
464 "Re-nice job tasks to priority [prio].", 2, 0,
465 (spank_opt_cb_f) _renice_opt_process
466 },
467 SPANK_OPTIONS_TABLE_END
468 };
469
470 /*
471 * Called from both srun and slurmd.
472 */
473 int slurm_spank_init (spank_t sp, int ac, char **av)
474 {
475 int i;
476
477 /* Don't do anything in sbatch/salloc */
478 if (spank_context () == S_CTX_ALLOCATOR)
479 return (0);
480
481 for (i = 0; i < ac; i++) {
482 if (strncmp ("min_prio=", av[i], 9) == 0) {
483 const char *optarg = av[i] + 9;
484 if (_str2prio (optarg, &min_prio) < 0)
485 slurm_error ("Ignoring invalid min_prio value: %s",
486 av[i]);
487 } else {
488 slurm_error ("renice: Invalid option: %s", av[i]);
489 }
490 }
491
492 if (!spank_remote (sp))
493 slurm_verbose ("renice: min_prio = %d", min_prio);
494
495 return (0);
496 }
497
498
499 int slurm_spank_task_post_fork (spank_t sp, int ac, char **av)
500 {
501 pid_t pid;
502 int taskid;
503
504 if (prio == PRIO_NOT_SET) {
505 /* See if SLURM_RENICE env var is set by user */
506 char val [1024];
507
508 if (spank_getenv (sp, PRIO_ENV_VAR, val, 1024)
509 != ESPANK_SUCCESS)
510 return (0);
511
512 if (_str2prio (val, &prio) < 0) {
513 slurm_error ("Bad value for %s: %s",
514 PRIO_ENV_VAR, optarg);
515 return (-1);
516 }
517
518 if (prio < min_prio) {
519 slurm_error ("%s=%d not allowed, using min=%d",
520 PRIO_ENV_VAR, prio, min_prio);
521 }
522 }
523
524 if (prio < min_prio)
525 prio = min_prio;
526
527 spank_get_item (sp, S_TASK_GLOBAL_ID, &taskid);
528 spank_get_item (sp, S_TASK_PID, &pid);
529
530 slurm_info ("re-nicing task%d pid %ld to %ld",
531 taskid, pid, prio);
532
533 if (setpriority (PRIO_PROCESS, (int) pid,
534 (int) prio) < 0) {
535 slurm_error ("setpriority: %m");
536 return (-1);
537 }
538
539 return (0);
540 }
541
542 static int _str2prio (const char *str, int *p2int)
543 {
544 long int l;
545 char *p;
546
547 l = strtol (str, &p, 10);
548 if ((*p != ' ') || (l < -20) || (l > 20))
549 return (-1);
550
551 *p2int = (int) l;
552
553 return (0);
554 }
555
556 static int _renice_opt_process (int val,
557 const char *optarg,
558 int remote)
559 {
560 if (optarg == NULL) {
561 slurm_error ("renice: invalid argument!");
562 return (-1);
563 }
564
565 if (_str2prio (optarg, &prio) < 0) {
566 slurm_error ("Bad value for --renice: %s",
567 optarg);
568 return (-1);
569 }
570
571 if (prio < min_prio) {
572 slurm_error ("--renice=%d not allowed, will use min=%d",
573 prio, min_prio);
574 }
575
576 return (0);
577 }
578
579
580
582 Portions copyright (C) 2010-2018 SchedMD LLC. Copyright (C) 2006 The
583 Regents of the University of California. Produced at Lawrence Liver‐
584 more National Laboratory (cf, DISCLAIMER). CODE-OCEC-09-009. All
585 rights reserved.
586
587 This file is part of Slurm, a resource management program. For
588 details, see <https://slurm.schedmd.com/>.
589
590 Slurm is free software; you can redistribute it and/or modify it under
591 the terms of the GNU General Public License as published by the Free
592 Software Foundation; either version 2 of the License, or (at your
593 option) any later version.
594
595 Slurm is distributed in the hope that it will be useful, but WITHOUT
596 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
597 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
598 for more details.
599
601 /etc/slurm/slurm.conf - Slurm configuration file.
602 /etc/slurm/plugstack.conf - SPANK configuration file.
603 /usr/include/slurm/spank.h - SPANK header file.
604
606 srun(1), slurm.conf(5)
607
608
609
610August 2017 Slurm Component SPANK(8)