1knl.conf(5)                Slurm Configuration File                knl.conf(5)
2
3
4

NAME

6       knl.conf  -  Slurm configuration file for Intel Knights Landing proces‐
7       sor.
8
9

DESCRIPTION

11       This ASCII file which describes  configuration  information  for  Intel
12       Knights  Landing  processors and it's name may depend upon the NodeFea‐
13       tures plugin configured in Slurm. For example, on Cray systems NodeFea‐
14       tures  should  be  configured  to "knl_cray" and its configuration file
15       will be read from "knl_cray.conf".  The file location can  be  modified
16       at  system build time using the DEFAULT_SLURM_CONF parameter or at exe‐
17       cution time by setting the SLURM_CONF environment  variable.  The  file
18       will  always  be  located in the same directory as the slurm.conf file.
19       This file is optional.
20
21       Parameter names are case insensitive.  Any text following a "#" in  the
22       configuration  file  is  treated  as  a comment through the end of that
23       line.  Changes to the configuration file take effect  upon  restart  of
24       Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
25       command "scontrol reconfigure" unless otherwise noted.
26
27       The overall configuration parameters available include:
28
29
30       AllowMCDRAM
31              Specify the MCDRAM modes which jobs are allowed  to  use.   This
32              may  be  a subset of MCDRAM modes supported by the node.  If not
33              specified, all MCDRAM modes supported by the node are  available
34              for  use.   The comma separated list of allowed MCDRAM modes may
35              include any of the modes listed below.
36
37              cache            All of MCDRAM to be used as cache.
38
39              equal            MCDRAM to be used partly as  cache  and  partly
40                               combined with primary memory.
41
42              flat             MCDRAM  to be combined with primary memory into
43                               a "flat" memory space.
44
45
46       AllowNUMA
47              Specify the NUMA modes which jobs are allowed to use.  This  may
48              be  a subset of NUMA modes supported by the node.  If not speci‐
49              fied, all NUMA modes supported by the  node  are  available  for
50              use.  The comma separated list of allowed NUMA modes may include
51              any of the modes listed below.  Note that Slurm can only support
52              homogeneous nodes (e.g. the same number of cores per NUMA node).
53              KNL scn4 and quad modes are not homogeneous, but each each  NUMA
54              mode will have either 16 or 18 cores.  This will result in Slurm
55              using the lower core count and finding a total  of  256  threads
56              rather  than  272  threads and setting the node to a DOWN state.
57              Therefore it is recommended that  snc4  and  quad  mode  not  be
58              allowed at this time.
59
60              a2a              All to all
61
62              snc2             Sub-NUMA cluster 2
63
64              snc4             Sub-NUMA cluster 4
65
66              hemi             Hemisphere
67
68              quad             Quadrant
69
70
71       AllowUserBoot
72              A  comma  delimited  list  of  users  allowed to modify a node's
73              MCDRAM or NUMA state.  If not specified then any user can change
74              a node's state and reboot it.
75
76
77       BootTime
78              Estimated time to reboot a node in seconds.  Used as a basis for
79              optimizing scheduling decisions.  The default value is 300  sec‐
80              onds  (5  minutes) for the "knl_generic" plugin and 2700 seconds
81              (45 minutes) for the "knl_cray" plugin.
82
83
84       CapmcPath
85              Fully qualified path to the capmc program.  The default value is
86              "/opt/cray/capmc/default/bin/capmc".   This  parameter  is  used
87              only by the "knl_cray" plugin.
88
89
90       CapmcPollFreq
91              Time interval between when the capmc  program  should  poll  for
92              node  state  changes,  in seconds.  The default value is 45 sec‐
93              onds.  This parameter is used only by the "knl_cray" plugin.
94
95
96       CapmcRetries
97              Number of times to retry failed operations of the capmc program.
98              Default value is 4.
99
100
101       CapmcTimeout
102              Time  limit  for  the capmc program to return status information
103              milliseconds.  The default value is 60000 milliseconds  and  the
104              minimum  value  is 1000 milliseconds.  This parameter is used by
105              the "knl_cray" plugin, plus the capmc_suspend  and  capmc_resume
106              programs used for suspending and resuming nodes.
107
108
109       CnselectPath
110              Fully qualified path to the cnselect program.  The default value
111              is "/opt/cray/sdb/default/bin/cnselect".  This parameter is used
112              only by the "knl_cray" plugin.
113
114
115       DefaultMCDRAM
116              Specify  the default MCDRAM modes for job's which do not specify
117              a value.  This is only used when a node is booted  and  the  job
118              which  has  been  allocated  the node does not specify a desired
119              MCDRAM mode.  The value can include one of the  possible  values
120              identified  with  the AllowMCDRAM configuration parameter above.
121              The default value is "cache".
122
123
124       DefaultNUMA
125              Specify the default NUMA modes for job's which do not specify  a
126              value.   This  is  only  used  when a node is booted and the job
127              which has been allocated the node does  not  specify  a  desired
128              NUMA  mode.   The  value  can include one of the possible values
129              identified with the  AllowNUMA  configuration  parameter  above.
130              The default value is "a2a".
131
132
133       Force  If  set  to a non-zero value then load the node_features/generic
134              plugin even on non-KNL nodes.  Used primarily for  testing  pur‐
135              poses.
136
137
138       LogFile
139              Fully qualified path to a log file.  The default value is Slurm‐
140              ctldLogFile from the slurm.conf  configuration  file.   This  is
141              option  is  used only by the campc_suspend and campc_resume pro‐
142              grams (which power down and reboot nodes in the appropriate con‐
143              figuration).
144
145
146       McPath Fully qualified path to memory controller device file directory.
147              Children  of   this   directory   with   names   of   the   form
148              "mc#/csrow#/ue_count"  (i.e.  the  count of unrecoverable memory
149              errors) will be monitored for non-zero values.  If  such  errors
150              are  detected,  the  node  will  be  set to a DOWN state and the
151              slurmd   daemon   will   shutdown.    The   default   value   is
152              "/sys/devices/system/edac/mc".  See also UmeCheckInterval.
153
154
155       NodeRebootWeight
156              If  a  compute node requires a reboot to be usable for a pending
157              job, then reset the node's weight to the specified  value.   The
158              default  value is 4,294,967,294 (0xfffffffe).  See also "Weight"
159              in the node configuration specification of slurm.conf.
160
161
162       NumaCpuBind
163              Contains pairs of NUMA modes and the CpuBind mode to set a  node
164              to  for  that  mode.   Any compute node found with or set to the
165              specified NUMA mode will have that node's CpuBind field  set  to
166              the  configured  value.   The  NUMA  node will be followed by an
167              equal sign the desired CpuBind mode for that NUMA mode. Multiple
168              NUMA  mode  and CpuBind modes should be in a semicolon separated
169              list.  By default changes to a node's NUMA mode will not  effect
170              that node's CpuBind mode.  See the example below.
171
172
173       SyscfgPath
174              Fully qualified path to Intel's syscfg program, which identifies
175              current KNL configuration by  viewing  BIOS  settings.   If  not
176              defined,  the  current  BIOS setting will not be available.  The
177              default value is "/usr/bin/syscfg".  This parameter is used only
178              by the "knl_generic" plugin.
179
180
181       SyscfgTimeout
182              Timeout  for  syscfg  program in milliseconds.  Default value is
183              1000 milliseconds.  For Dell KNL systems, experience  has  shown
184              that a higher value of 10000 milliseconds is more appropriate.
185
186
187       SystemType
188              Used to distinguish the flavor of knl we are dealing with.  Pos‐
189              sible options are "Dell" and  "Intel".   The  default  value  is
190              "Intel".  This parameter is used only by the "knl_generic" plug‐
191              in.
192
193
194       UmeCheckInterval
195              Interval, in microseconds, between checks for Uncorrectable Mem‐
196              ory Errors (UME).  If such errors are detected, the node will be
197              set to a DOWN state and the slurmd daemon  will  shutdown.   The
198              default value is 0 (disabled).  See also McPath.
199
200
201       ValidateMode
202              If  set to 1 then validate, but do not modify the node's config‐
203              ured MCDRAM and NUMA modes from  the  slurm.conf  file.  If  the
204              actual modes do not match configured values the node will be set
205              to a DOWN state. Every KNL nodes MCDRAM  and  NUMA  states  must
206              both  be  listed in the slurm.conf file.  This parameter is used
207              only by the "knl_cray" plugin.
208
209

EXAMPLE

211       ###################################################################
212       # knl_cray.conf
213       # Slurm configuration file for Intel Knights Landing on Cray system
214       ###################################################################
215       CapmcPath=/opt/cray/capmc/default/bin/capmc
216       CapmcTimeout=6000
217       DefaultMCDRAM=flat
218       DefaultNUMA=a2a
219       NumaCpuBind=a2a=core;snc2=thread;snc4=thread
220       LogFile=/var/tmp/slurm_node_feature.log
221       SyscfgPath=/usr/sbin/syscfg
222
223

COPYING

225       Copyright (C) 2015-2017 SchedMD LLC.
226
227       This file is  part  of  Slurm,  a  resource  management  program.   For
228       details, see <https://slurm.schedmd.com/>.
229
230       Slurm  is free software; you can redistribute it and/or modify it under
231       the terms of the GNU General Public License as published  by  the  Free
232       Software  Foundation;  either  version  2  of  the License, or (at your
233       option) any later version.
234
235       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
236       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
237       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
238       for more details.
239
240

SEE ALSO

242       slurm.conf(5)
243
244
245
246May 2018                   Slurm Configuration File                knl.conf(5)
Impressum