1SPANK(8)                        Slurm Component                       SPANK(8)
2
3
4

NAME

6       SPANK - Slurm Plug-in Architecture for Node and job (K)control
7
8

DESCRIPTION

10       This manual briefly describes the capabilities of the Slurm Plug-in Ar‐
11       chitecture for Node and job Kontrol (SPANK) as well as the  SPANK  con‐
12       figuration file: (By default: plugstack.conf.)
13
14       SPANK  provides  a  very generic interface for stackable plug-ins which
15       may be used to dynamically modify the job launch code in  Slurm.  SPANK
16       plugins  may  be  built  without access to Slurm source code. They need
17       only be compiled against Slurm's spank.h  header  file,  added  to  the
18       SPANK  config  file  plugstack.conf, and they will be loaded at runtime
19       during the next job launch. Thus, the SPANK infrastructure provides ad‐
20       ministrators and other developers a low cost, low effort ability to dy‐
21       namically modify the runtime behavior of Slurm job launch.
22
23       NOTE: All SPANK plugins should be recompiled when upgrading Slurm to  a
24       new major release. The SPANK API is not guaranteed to be ABI compatible
25       between major releases. Any SPANK plugin linking to any  of  the  Slurm
26       libraries should be carefully checked as the Slurm APIs and headers can
27       change between major releases.
28

SPANK PLUGINS

30       SPANK plugins are loaded in up to five separate contexts during a Slurm
31       job. Briefly, the five contexts are:
32
33       local   In  local context, the plugin is loaded by srun. (i.e. the "lo‐
34               cal" part of a parallel job).
35
36       remote  In remote context, the plugin is loaded  by  slurmstepd.  (i.e.
37               the "remote" part of a parallel job).
38
39       allocator
40               In  allocator  context,  the plugin is loaded in one of the job
41               allocation utilities salloc, sbatch or scrontab.
42
43       slurmd  In slurmd context, the plugin is loaded in  the  slurmd  daemon
44               itself.  Note: Plugins loaded in slurmd context persist for the
45               entire time slurmd is running, so if configuration  is  changed
46               or  plugins  are  updated,  slurmd  must  be  restarted for the
47               changes to take effect.
48
49       job_script
50               In the job_script context, plugins are loaded in the context of
51               the   job  prolog  or  epilog.  Note:  Plugins  are  loaded  in
52               job_script context on each run on the job prolog or epilog,  in
53               a  separate  address space from plugins in slurmd context. This
54               means there is no state shared between this context  and  other
55               contexts, or even between one call to slurm_spank_job_prolog or
56               slurm_spank_job_epilog and subsequent calls.
57
58       In  local  context,  only  the  init,  exit,  init_post_opt,  and   lo‐
59       cal_user_init  functions  are  called.  In  allocator context, only the
60       init, exit, and init_post_opt  functions  are  called.   Similarly,  in
61       slurmd context, only the init and slurmd_exit callbacks are active, and
62       in the job_script context, only the job_prolog and job_epilog callbacks
63       are used.  Plugins may query the context in which they are running with
64       the   spank_context   and    spank_remote    functions    defined    in
65       <slurm/spank.h>.
66
67       SPANK  plugins  may be called from multiple points during the Slurm job
68       launch. A plugin may define the following functions:
69
70       slurm_spank_init
71         Called just after plugins are loaded. In remote context, this is just
72         after  job  step  is  initialized. This function is called before any
73         plugin option processing.
74
75       slurm_spank_job_prolog
76         Called at the same time as the job prolog. If this function returns a
77         non-zero  value  and the SPANK plugin that contains it is required in
78         the plugstack.conf, the node that this is run on will be drained.
79
80
81       slurm_spank_init_post_opt
82         Called at the same point as slurm_spank_init, but after all user  op‐
83         tions to the plugin have been processed. The reason that the init and
84         init_post_opt callbacks are separated is so that plugins can  process
85         system-wide options specified in plugstack.conf in the init callback,
86         then  process  user  options,  and  finally  take  some   action   in
87         slurm_spank_init_post_opt  if  necessary.  In the case of a heteroge‐
88         neous job, slurm_spank_init is invoked once per job component.
89
90       slurm_spank_local_user_init
91         Called in local (srun) context only after all options have been  pro‐
92         cessed.   This is called after the job ID and step IDs are available.
93         This happens in srun after the allocation is made, but  before  tasks
94         are launched.
95
96       slurm_spank_user_init
97         Called  after  privileges  are  temporarily  dropped. (remote context
98         only)
99
100       slurm_spank_task_init_privileged
101         Called for each task just after fork, but before all elevated  privi‐
102         leges are dropped. (remote context only)
103
104       slurm_spank_task_init
105         Called  for  each  task just before execve (2). If you are restricing
106         memory with cgroups, memory allocated  here  will  be  in  the  job's
107         cgroup. (remote context only)
108
109       slurm_spank_task_post_fork
110         Called  for each task from parent process after fork (2) is complete.
111         Due to the fact that slurmd does not exec any tasks until  all  tasks
112         have  completed  fork  (2), this call is guaranteed to run before the
113         user task is executed. (remote context only)
114
115       slurm_spank_task_exit
116         Called for each task as its exit status is collected by Slurm.   (re‐
117         mote context only)
118
119       slurm_spank_exit
120         Called once just before slurmstepd exits in remote context.  In local
121         context, called before srun exits.
122
123       slurm_spank_job_epilog
124         Called at the same time as the job epilog. If this function returns a
125         non-zero  value  and the SPANK plugin that contains it is required in
126         the plugstack.conf, the node that this is run on will be drained.
127
128       slurm_spank_slurmd_exit
129         Called in slurmd when the daemon is shut down.
130
131       All of these functions have the same prototype, for example:
132
133          int slurm_spank_init (spank_t spank, int ac, char *argv[])
134
135
136       Where spank is the SPANK handle which must be passed back to Slurm when
137       the  plugin  calls functions like spank_get_item and spank_getenv. Con‐
138       figured arguments (See CONFIGURATION below) are passed in the  argument
139       vector argv with argument count ac.
140
141       SPANK  plugins can query the current list of supported slurm_spank sym‐
142       bols to determine if the current version supports a given plugin  hook.
143       This  may  be useful because the list of plugin symbols may grow in the
144       future. The query is done using  the  spank_symbol_supported  function,
145       which has the following prototype:
146
147           int spank_symbol_supported (const char *sym);
148
149
150       The return value is 1 if the symbol is supported, 0 if not.
151
152       SPANK  plugins  do  not  have direct access to internally defined Slurm
153       data structures. Instead, information about the currently executing job
154       is obtained via the spank_get_item function call.
155
156         spank_err_t spank_get_item (spank_t spank, spank_item_t item, ...);
157
158       The spank_get_item call must be passed the current SPANK handle as well
159       as the item requested, which is defined by the passed  spank_item_t.  A
160       variable  number  of  pointer  arguments  are also passed, depending on
161       which item was requested by the plugin. A list of the valid values  for
162       item is kept in the spank.h header file. Some examples are:
163
164       S_JOB_UID
165         User id for running job. (uid_t *) is third arg of spank_get_item
166
167       S_JOB_STEPID
168         Job   step  id  for  running  job.  (uint32_t  *)  is  third  arg  of
169         spank_get_item.
170
171       S_TASK_EXIT_STATUS
172         Exit status for exited task. Only valid  from  slurm_spank_task_exit.
173         (int *) is third arg of spank_get_item.
174
175       S_JOB_ARGV
176         Complete  job  command  line. Third and fourth args to spank_get_item
177         are (int *, char ***).
178
179       See spank.h for more details, and EXAMPLES  below  for  an  example  of
180       spank_get_item usage.
181
182       SPANK  functions  in the local and allocator environment should use the
183       getenv, setenv, and unsetenv functions to view and modify the job's en‐
184       vironment.   SPANK  functions  in the remote environment should use the
185       spank_getenv, spank_setenv, and spank_unsetenv functions  to  view  and
186       modify  the job's environment. spank_getenv searches the job's environ‐
187       ment for the environment variable var and copies the current value into
188       a  buffer buf of length len.  spank_setenv allows a SPANK plugin to set
189       or overwrite a variable in the job's  environment,  and  spank_unsetenv
190       unsets an environment variable in the job's environment. The prototypes
191       are:
192
193        spank_err_t spank_getenv (spank_t spank, const char *var,
194                            char *buf, int len);
195        spank_err_t spank_setenv (spank_t spank, const char *var,
196                            const char *val, int overwrite);
197        spank_err_t spank_unsetenv (spank_t spank, const char *var);
198
199       These are only necessary in remote context since modifications  of  the
200       standard process environment using setenv (3), getenv (3), and unsetenv
201       (3) may be used in local context.
202
203       Functions are also available from within the SPANK plugins to establish
204       environment variables to be exported to the Slurm PrologSlurmctld, Pro‐
205       log, Epilog and EpilogSlurmctld programs (the so-called job control en‐
206       vironment).   The  name  of  environment variables established by these
207       calls will be prepended with the string SPANK_ in order  to  avoid  any
208       security implications of arbitrary environment variable control. (After
209       all, the job control scripts do run as root or the Slurm user.).
210
211       These functions are available from local context only.
212
213         spank_err_t spank_job_control_getenv(spank_t spank, const char *var,
214                              char *buf, int len);
215         spank_err_t spank_job_control_setenv(spank_t spank, const char *var,
216                              const char *val, int overwrite);
217         spank_err_t spank_job_control_unsetenv(spank_t spank, const char *var);
218
219       See spank.h for more information, and EXAMPLES below for an example for
220       spank_getenv usage.
221
222       Many  of  the described SPANK functions available to plugins return er‐
223       rors via the spank_err_t error type. On success, the return value  will
224       be  set  to  ESPANK_SUCCESS, while on failure, the return value will be
225       set to one of many error values defined in slurm/spank.h. The SPANK in‐
226       terface provides a simple function
227
228         const char * spank_strerror(spank_err_t err);
229
230       which may be used to translate a spank_err_t value into its string rep‐
231       resentation.
232
233
234       The slurm_spank_log function can be used to print messages back to  the
235       user  at  an error level.  This is to keep users from having to rely on
236       the slurm_error function, which can be confusing  because  it  prepends
237       "error:" to every message.
238
239

SPANK OPTIONS

241       SPANK  plugins also have an interface through which they may define and
242       implement extra job options. These options are made  available  to  the
243       user  through Slurm commands such as srun(1), salloc(1), and sbatch(1).
244       If the option is specified by the user, its value is forwarded and reg‐
245       istered  with  the  plugin in slurmd when the job is run.  In this way,
246       SPANK plugins may dynamically provide new options and functionality  to
247       Slurm.
248
249       Each  option registered by a plugin to Slurm takes the form of a struct
250       spank_option which is declared in <slurm/spank.h> as
251
252          struct spank_option {
253             char *         name;
254             char *         arginfo;
255             char *         usage;
256             int            has_arg;
257             int            val;
258             spank_opt_cb_f cb;
259          };
260
261
262       Where
263
264       name   is the name of the option. Its length is  limited  to  SPANK_OP‐
265              TION_MAXLEN defined in <slurm/spank.h>.
266
267       arginfo
268              is  a  description  of the argument to the option, if the option
269              does take an argument.
270
271       usage  is a short description of the option suitable for --help output.
272
273       has_arg
274              0 if option takes no argument, 1 if option  takes  an  argument,
275              and 2 if the option takes an optional argument. (See getopt_long
276              (3)).
277
278       val    A plugin-local value to return to the option callback function.
279
280       cb     A callback function that is invoked when the  plugin  option  is
281              registered   with   Slurm.   spank_opt_cb_f   is   typedef'd  in
282              <slurm/spank.h> as
283
284                typedef int (*spank_opt_cb_f) (int val, const char *optarg,
285                                         int remote);
286
287              Where val is the value of the  val  field  in  the  spank_option
288              struct,  optarg  is the supplied argument if applicable, and re‐
289              mote is 0 if the function is being called from the "local"  host
290              (e.g.  host  where  srun or sbatch/salloc are invoked) or 1 from
291              the "remote" host (host where slurmd/slurmstepd  run)  but  only
292              executed by slurmstepd (remote context) if the option was regis‐
293              tered for such context.
294
295       Plugin options may be registered with Slurm using the spank_option_reg‐
296       ister  function.  This  function  is  only  valid  when called from the
297       plugin's slurm_spank_init handler, and registers one option at a  time.
298       The prototype is
299
300          spank_err_t spank_option_register (spank_t sp,
301                    struct spank_option *opt);
302
303       This  function will return ESPANK_SUCCESS on successful registration of
304       an option, or ESPANK_BAD_ARG for errors including invalid spank_t  han‐
305       dle, or when the function is not called from the slurm_spank_init func‐
306       tion. All options need to be registered from all contexts in which they
307       will  be  used. For instance, if an option is only used in local (srun)
308       and remote (slurmd) contexts, then spank_option_register should only be
309       called from within those contexts. For example:
310
311          if (spank_context() != S_CTX_ALLOCATOR)
312             spank_option_register (sp, opt);
313
314       If,  however, the option is used in all contexts, the spank_option_reg‐
315       ister needs to be called everywhere.
316
317       In addition to spank_option_register, plugins may also  export  options
318       to  Slurm  by  defining  a table of struct spank_option with the symbol
319       name spank_options. This method, however, is not supported for use with
320       sbatch  and  salloc  (allocator  context),  thus  the  use of spank_op‐
321       tion_register is preferred. When using the spank_options table, the fi‐
322       nal element in the array must be filled with zeros. A SPANK_OPTIONS_TA‐
323       BLE_END macro is provided in <slurm/spank.h> for this purpose.
324
325       When an option is provided by the user on the  local  side,  either  by
326       command  line  options  or by environment variables, Slurm will immedi‐
327       ately invoke the option's callback with remote=0. This is meant for the
328       plugin  to  do  local sanity checking of the option before the value is
329       sent to the remote side during job launch. If  the  argument  the  user
330       specified  is  invalid,  the  plugin  should issue an error and issue a
331       non-zero return code from the callback. The plugin should  be  able  to
332       handle cases where the spank option is set multiple times through envi‐
333       ronment variables and command line options. Environment  variables  are
334       processed before command line options.
335
336       On the remote side, options and their arguments are registered just af‐
337       ter SPANK plugins are loaded  and  before  the  spank_init  handler  is
338       called.  This allows plugins to modify behavior of all plugin function‐
339       ality based on the value of user-provided options.  (See EXAMPLES below
340       for a plugin that registers an option with Slurm).
341
342       As  an  alternative  to  use of an option callback and global variable,
343       plugins can use the spank_option_getopt option to  check  for  supplied
344       options after option processing. This function has the prototype:
345
346          spank_err_t spank_option_getopt(spank_t sp,
347              struct spank_option *opt, char **optargp);
348
349       This function returns ESPANK_SUCCESS if the option defined in the
350       struct spank_option opt has been used by the user. If optargp
351       is non-NULL then it is set to any option argument passed (if the option
352       takes an argument). The use of this method is required to process
353       options in job_script context (slurm_spank_job_prolog and
354       slurm_spank_job_epilog). This function is valid in the following contexts:
355       slurm_spank_job_prolog, slurm_spank_local_user_init, slurm_spank_user_init,
356       slurm_spank_task_init_privileged, slurm_spank_task_init, slurm_spank_task_exit,
357       and slurm_spank_job_epilog.
358
359

CONFIGURATION

361       The default SPANK plug-in stack configuration file is plugstack.conf in
362       the same directory as slurm.conf(5), though this may be changed via the
363       Slurm  config  parameter  PlugStackConfig.  Normally the plugstack.conf
364       file should be identical on all nodes of the cluster.  The config  file
365       lists SPANK plugins, one per line, along with whether the plugin is re‐
366       quired or optional, and any global arguments that are to be  passed  to
367       the  plugin  for runtime configuration.  Comments are preceded with '#'
368       and extend to the end of the line.  If the configuration file is  miss‐
369       ing or empty, it will simply be ignored.
370
371       The format of each non-comment line in the configuration file is:
372
373         required/optional   plugin   arguments
374
375        For example:
376
377         optional /usr/lib/slurm/test.so
378
379       Tells  slurmd  to  load  the plugin test.so passing no arguments.  If a
380       SPANK plugin is required, then failure of any of the plugin's functions
381       will  cause  slurmd  to  terminate the job, while optional plugins only
382       cause a warning.
383
384       If a fully-qualified path is not specified for a plugin, then the  cur‐
385       rently configured PluginDir in slurm.conf(5) is searched.
386
387       SPANK  plugins  are stackable, meaning that more than one plugin may be
388       placed into the config file. The plugins will simply be called  in  or‐
389       der, one after the other, and appropriate action taken on failure given
390       that state of the plugin's optional flag.
391
392       Additional config files or directories of config files may be  included
393       in  plugstack.conf  with  the include keyword. The include keyword must
394       appear on its own line, and takes a glob as its parameter, so  multiple
395       files may be included from one include line. For example, the following
396       syntax will load all config files  in  the  /etc/slurm/plugstack.conf.d
397       directory, in local collation order:
398
399         include /etc/slurm/plugstack.conf.d/*
400
401       which  might  be  considered  a  more flexible method for building up a
402       spank plugin stack.
403
404       The SPANK config file is re-read on each job  launch,  so  editing  the
405       config  file will not affect running jobs. However care should be taken
406       so that a partially edited config file is not read by a launching job.
407
408

Errors

410       When SPANK plugin results in a non-zero result, the  following  changes
411       will result:
412
413
414       ┌──────┬────────────────────────────────┬─────────┬────────┬───────────┬────────┐
415       │Command│Function                         │Context   │Exitcode │Drains Node │Fails job│
416       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
417       │srun   │slurm_spank_init                 │local     │1        │no          │  yes   │
418       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
419       │srun   │slurm_spank_init_post_opt        │local     │1        │no          │  yes   │
420       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
421       │srun   │slurm_spank_local_user_init      │local     │1        │no          │  no    │
422       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
423       │srun   │slurm_spank_user_init            │remote    │0        │no          │  no    │
424       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
425       │srun   │slurm_spank_task_init_privileged │remote    │1        │no          │  yes   │
426       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
427       │srun   │slurm_spank_task_post_fork       │remote    │0        │no          │  no    │
428       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
429       │srun   │slurm_spank_task_init            │remote    │1        │no          │  yes   │
430       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
431       │srun   │slurm_spank_task_exit            │remote    │0        │no          │  no    │
432       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
433       │srun   │slurm_spank_exit                 │local     │0        │no          │  yes   │
434       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
435       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
436       │salloc │slurm_spank_init                 │allocator │1        │no          │  yes   │
437       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
438       │salloc │slurm_spank_init_post_opt        │allocator │1        │no          │  yes   │
439       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
440       │salloc │slurm_spank_init                 │local     │1        │no          │  yes   │
441       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
442       │salloc │slurm_spank_init_post_opt        │local     │1        │no          │  yes   │
443       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
444       │salloc │slurm_spank_local_user_init      │local     │1        │no          │  yes   │
445       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
446       │salloc │slurm_spank_user_init            │remote    │0        │no          │  no    │
447       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
448       │salloc │slurm_spank_task_init_privileged │remote    │1        │no          │  yes   │
449       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
450       │salloc │slurm_spank_task_post_fork       │remote    │0        │no          │  no    │
451       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
452       │salloc │slurm_spank_task_init            │remote    │1        │no          │  yes   │
453       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
454       │salloc │slurm_spank_task_exit            │remote    │0        │no          │  no    │
455       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
456       │salloc │slurm_spank_exit                 │local     │0        │no          │  yes   │
457       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
458       │salloc │slurm_spank_exit                 │allocator │0        │no          │  yes   │
459       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
460       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
461       │sbatch │slurm_spank_init                 │allocator │1        │no          │  yes   │
462       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
463       │sbatch │slurm_spank_init_post_opt        │allocator │1        │no          │  yes   │
464       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
465       │sbatch │slurm_spank_init                 │local     │1        │no          │  yes   │
466       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
467       │sbatch │slurm_spank_init_post_opt        │local     │1        │no          │  yes   │
468       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
469       │sbatch │slurm_spank_local_user_init      │local     │1        │no          │  yes   │
470       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
471       │sbatch │slurm_spank_user_init            │remote    │0        │yes         │  no    │
472       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
473       │sbatch │slurm_spank_task_init_privileged │remote    │1        │no          │  yes   │
474       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
475       │sbatch │slurm_spank_task_post_fork       │remote    │0        │yes         │  no    │
476       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
477       │sbatch │slurm_spank_task_init            │remote    │1        │no          │  yes   │
478       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
479       │sbatch │slurm_spank_task_exit            │remote    │0        │no          │  no    │
480       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
481       │sbatch │slurm_spank_exit                 │local     │0        │no          │  no    │
482       ├──────┼────────────────────────────────┼─────────┼────────┼───────────┼────────┤
483       │sbatch │slurm_spank_exit                 │allocator │0        │no          │  no    │
484       └──────┴────────────────────────────────┴─────────┴────────┴───────────┴────────┘
485       NOTE: The behavior for ProctrackType=proctrack/pgid may result in time‐
486       outs for slurm_spank_task_post_fork with remote context on failure.
487
488

EXAMPLE: renice.so

490       /etc/slurm/plugstack.conf:
491              This example plugstack.conf file shows a configuration that  ac‐
492              tivates the renice.so SPANK plugin.
493              #
494              # SPANK config file
495              #
496              # required?       plugin                     parameters
497              #
498              optional          /usr/lib/SPANK_renice.so   min_prio=-10
499
500       /usr/local/src/renice.c:
501              A  sample  SPANK  plugin  to modify the nice value of job tasks.
502              This plugin adds a --renice=[prio] option to  srun  which  users
503              can  use  to  set the priority of all remote tasks. Priority may
504              also be specified via a  SLURM_RENICE  environment  variable.  A
505              minimum  priority  may be established via a "min_prio" parameter
506              in plugstack.conf.
507              #include <sys/types.h>
508              #include <stdio.h>
509              #include <stdlib.h>
510              #include <unistd.h>
511              #include <string.h>
512              #include <sys/resource.h>
513
514              #include <slurm/spank.h>
515
516              /*
517               * All spank plugins must define this macro for the
518               * Slurm plugin loader.
519               */
520              SPANK_PLUGIN(renice, 1);
521
522              #define PRIO_ENV_VAR "SLURM_RENICE"
523              #define PRIO_NOT_SET -1
524
525              /*
526               * Minimum allowable value for priority. May be
527               * set globally via plugin option min_prio=<prio>
528               */
529              static int min_prio = -20;
530
531              static int prio = PRIO_NOT_SET;
532
533              static int _renice_opt_process(int val, const char *optarg, int remote);
534              static int _str2prio(const char *str, int *p2int);
535
536              /*
537               *  Provide a --renice=[prio] option to srun:
538               */
539              struct spank_option spank_options[] =
540              {
541                  {
542                      "renice",
543                      "[prio]",
544                      "Re-nice job tasks to priority [prio].",
545                      2,
546                      0,
547                      _renice_opt_process
548                  },
549                  SPANK_OPTIONS_TABLE_END
550              };
551
552              /*
553               *  Called from both srun and slurmd.
554               */
555              int slurm_spank_init(spank_t sp, int ac, char **av)
556              {
557                  int i;
558
559                  /* Don't do anything in sbatch/salloc */
560                  if (spank_context () == S_CTX_ALLOCATOR)
561                      return ESPANK_SUCCESS;
562
563                  for (i = 0; i < ac; i++) {
564                      if (!strncmp("min_prio=", av[i], 9)) {
565                          const char *optarg = av[i] + 9;
566
567                          if (_str2prio(optarg, &min_prio))
568                              slurm_error ("Ignoring invalid min_prio value: %s", av[i]);
569                      } else {
570                          slurm_error ("renice: Invalid option: %s", av[i]);
571                      }
572                  }
573
574                  if (!spank_remote(sp))
575                      slurm_verbose("renice: min_prio = %d", min_prio);
576
577                  return ESPANK_SUCCESS;
578              }
579
580              int slurm_spank_task_post_fork(spank_t sp, int ac, char **av)
581              {
582                  int rc;
583                  pid_t pid;
584                  int taskid;
585
586                  if (prio == PRIO_NOT_SET) {
587                      /* See if SLURM_RENICE env var is set by user */
588                      char val[1024];
589
590                      rc = spank_getenv(sp, PRIO_ENV_VAR, val, sizeof(val));
591
592                      if (rc)
593                          return rc;
594
595                      rc = _str2prio(val, &prio);
596
597                      if (rc) {
598                          slurm_error("Bad value for %s: %s", PRIO_ENV_VAR, optarg);
599                          return rc;
600                      }
601
602                      if (prio < min_prio) {
603                          slurm_error("%s=%d not allowed, using min=%d",
604                                      PRIO_ENV_VAR, prio, min_prio);
605                      }
606                  }
607
608                  if (prio < min_prio)
609                      prio = min_prio;
610
611                  spank_get_item(sp, S_TASK_GLOBAL_ID, &taskid);
612                  spank_get_item(sp, S_TASK_PID, &pid);
613
614                  slurm_info("re-nicing task%d pid %d to %d", taskid, (int) pid, prio);
615
616                  if (setpriority(PRIO_PROCESS, (int) pid, (int) prio)) {
617                      slurm_error("setpriority: %m");
618                      return ESPANK_ERROR;
619                  }
620
621                  return ESPANK_SUCCESS;
622              }
623
624              static int _str2prio(const char *str, int *p2int)
625              {
626                  long l;
627                  char *p = NULL;
628
629                  if (!str || str[0] == '\0')
630                      return ESPANK_BAD_ARG;
631
632                  l = strtol(str, &p, 10);
633
634                  if (!p || (*p != '\0'))
635                      return ESPANK_BAD_ARG;
636
637                  if ((l < -20) || (l > 20)) {
638                      slurm_error("Specify value between -20 and 20");
639                      return ESPANK_BAD_ARG;
640                  }
641
642                  *p2int = (int) l;
643
644                  return ESPANK_SUCCESS;
645              }
646
647              static int _renice_opt_process(int val, const char *optarg, int remote)
648              {
649                  int rc;
650
651                  if (optarg == NULL) {
652                      slurm_error("renice: invalid NULL argument!");
653                      return ESPANK_BAD_ARG;
654                  }
655
656                  if ((rc = _str2prio(optarg, &prio))) {
657                      slurm_error("Bad value for --renice: %s", optarg);
658                      return rc;
659                  }
660
661                  if (prio < min_prio) {
662                      slurm_error("--renice=%d not allowed, will use min=%d",
663                                  prio, min_prio);
664                  }
665
666                  return ESPANK_SUCCESS;
667              }
668
669       Compile command:
670              # gcc -ggdb3 -I${SLURM_PATH}/include/ -fPIC -shared -o /usr/lib/SPANK_renice.so /usr/local/src/renice.c
671
672

COPYING

674       Portions copyright (C) 2010-2021 SchedMD LLC.  Copyright (C)  2006  The
675       Regents  of  the University of California.  Produced at Lawrence Liver‐
676       more  National  Laboratory  (cf,  DISCLAIMER).   CODE-OCEC-09-009.  All
677       rights reserved.
678
679       This  file  is  part  of Slurm, a resource management program.  For de‐
680       tails, see <https://slurm.schedmd.com/>.
681
682       Slurm is free software; you can redistribute it and/or modify it  under
683       the  terms  of  the GNU General Public License as published by the Free
684       Software Foundation; either version 2 of the License, or (at  your  op‐
685       tion) any later version.
686
687       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
688       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
689       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
690       for more details.
691

FILES

693       /etc/slurm/slurm.conf - Slurm configuration file.
694       /etc/slurm/plugstack.conf - SPANK configuration file.
695       /usr/include/slurm/spank.h - SPANK header file.
696

SEE ALSO

698       srun(1), slurm.conf(5)
699
700
701
702October 2021                    Slurm Component                       SPANK(8)
Impressum