1knl.conf(5)                Slurm Configuration File                knl.conf(5)
2
3
4

NAME

6       knl.conf  -  Slurm configuration file for Intel Knights Landing proces‐
7       sor.
8
9

DESCRIPTION

11       This ASCII file which describes  configuration  information  for  Intel
12       Knights  Landing  processors  and its name may depend upon the NodeFea‐
13       tures plugin configured in Slurm. For example, on Cray systems NodeFea‐
14       tures  should  be  configured  to "knl_cray" and its configuration file
15       will be read from "knl_cray.conf".  The file location can  be  modified
16       at  system build time using the DEFAULT_SLURM_CONF parameter or at exe‐
17       cution time by setting the SLURM_CONF environment  variable.  The  file
18       will  always  be  located in the same directory as the slurm.conf file.
19       This file is optional.
20
21       Parameter names are case insensitive.  Any text following a "#" in  the
22       configuration  file  is  treated  as  a comment through the end of that
23       line.  Changes to the configuration file take effect  upon  restart  of
24       Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
25       command "scontrol reconfigure" unless otherwise noted.
26
27       The overall configuration parameters available include:
28
29
30       AllowMCDRAM
31              Specify the MCDRAM modes which jobs are allowed  to  use.   This
32              may  be  a subset of MCDRAM modes supported by the node.  If not
33              specified, all MCDRAM modes supported by the node are  available
34              for  use.   The comma separated list of allowed MCDRAM modes may
35              include any of the modes listed below.
36
37              cache            All of MCDRAM to be used as cache.
38
39              equal            MCDRAM to be used partly as  cache  and  partly
40                               combined with primary memory.
41
42              flat             MCDRAM  to be combined with primary memory into
43                               a "flat" memory space.
44
45
46       AllowNUMA
47              Specify the NUMA modes which jobs are allowed to use.  This  may
48              be  a subset of NUMA modes supported by the node.  If not speci‐
49              fied, all NUMA modes supported by the  node  are  available  for
50              use.  The comma separated list of allowed NUMA modes may include
51              any of the modes listed below.  Note that Slurm can only support
52              homogeneous nodes (e.g. the same number of cores per NUMA node).
53              KNL scn4 and quad modes are not homogeneous, but each NUMA  mode
54              will  have either 16 or 18 cores.  This will result in Slurm us‐
55              ing the lower core count and finding  a  total  of  256  threads
56              rather  than  272  threads and setting the node to a DOWN state.
57              Therefore it is recommended that snc4 and quad mode not  be  al‐
58              lowed at this time.
59
60              a2a              All to all
61
62              snc2             Sub-NUMA cluster 2
63
64              snc4             Sub-NUMA cluster 4
65
66              hemi             Hemisphere
67
68              quad             Quadrant
69
70
71       AllowUserBoot
72              A  comma-delimited  list of users allowed to modify a node's MC‐
73              DRAM or NUMA state.  If not specified then any user can change a
74              node's state and reboot it.
75
76
77       BootTime
78              Estimated time to reboot a node in seconds.  Used as a basis for
79              optimizing scheduling decisions.  The default value is 300  sec‐
80              onds  (5  minutes) for the "knl_generic" plugin and 2700 seconds
81              (45 minutes) for the "knl_cray" plugin.
82
83
84       CapmcPath
85              Fully qualified path to the capmc program.  The default value is
86              "/opt/cray/capmc/default/bin/capmc".   This  parameter  is  used
87              only by the "knl_cray" plugin.
88
89
90       CapmcPollFreq
91              Time interval between when the capmc  program  should  poll  for
92              node  state  changes,  in seconds.  The default value is 45 sec‐
93              onds.  This parameter is used only by the "knl_cray" plugin.
94
95
96       CapmcRetries
97              Number of times to retry failed operations of the capmc program.
98              Default value is 4.
99
100
101       CapmcTimeout
102              Time  limit  for  the capmc program to return status information
103              milliseconds.  The default value is 60000 milliseconds  and  the
104              minimum  value  is 1000 milliseconds.  This parameter is used by
105              the "knl_cray" plugin, plus the capmc_suspend  and  capmc_resume
106              programs used for suspending and resuming nodes.
107
108
109       CnselectPath
110              Fully qualified path to the cnselect program.  The default value
111              is "/opt/cray/sdb/default/bin/cnselect".  This parameter is used
112              only by the "knl_cray" plugin.
113
114
115       DefaultMCDRAM
116              Specify  the default MCDRAM modes for job's which do not specify
117              a value.  This is only used when a node is booted  and  the  job
118              which has been allocated the node does not specify a desired MC‐
119              DRAM mode.  The value can include one  of  the  possible  values
120              identified  with  the AllowMCDRAM configuration parameter above.
121              The default value is "cache".
122
123
124       DefaultNUMA
125              Specify the default NUMA modes for job's which do not specify  a
126              value.   This  is  only  used  when a node is booted and the job
127              which has been allocated the node does  not  specify  a  desired
128              NUMA  mode.   The  value  can include one of the possible values
129              identified with the  AllowNUMA  configuration  parameter  above.
130              The default value is "a2a".
131
132
133       Force  If  set  to a non-zero value then load the node_features/generic
134              plugin even on non-KNL nodes.  Used primarily for  testing  pur‐
135              poses.
136
137
138       LogFile
139              Fully qualified path to a log file.  The default value is Slurm‐
140              ctldLogFile from the slurm.conf configuration file.  This is op‐
141              tion is used only by the campc_suspend and campc_resume programs
142              (which power down and reboot nodes in the appropriate configura‐
143              tion).
144
145
146       McPath Fully qualified path to memory controller device file directory.
147              Children  of   this   directory   with   names   of   the   form
148              "mc#/csrow#/ue_count"  (i.e.  the  count of unrecoverable memory
149              errors) will be monitored for non-zero values.  If  such  errors
150              are  detected,  the  node  will  be  set to a DOWN state and the
151              slurmd daemon will shutdown.  The  default  value  is  "/sys/de‐
152              vices/system/edac/mc".  See also UmeCheckInterval.
153
154
155       NodeRebootWeight
156              If  a  compute node requires a reboot to be usable for a pending
157              job, then reset the node's weight to the specified  value.   The
158              default  value is 4,294,967,294 (0xfffffffe).  See also "Weight"
159              in the node configuration specification of slurm.conf.
160
161
162       NumaCpuBind
163              Contains pairs of NUMA modes and the CpuBind mode to set a  node
164              to  for  that  mode.   Any compute node found with or set to the
165              specified NUMA mode will have that node's CpuBind field  set  to
166              the  configured  value.   The  NUMA  node will be followed by an
167              equal sign the desired CpuBind mode for that NUMA mode. Multiple
168              NUMA  mode  and CpuBind modes should be in a semicolon separated
169              list.  By default changes to a node's NUMA mode will not  effect
170              that node's CpuBind mode.  See the example below.
171
172
173       SyscfgPath
174              Fully qualified path to Intel's syscfg program, which identifies
175              current KNL configuration by viewing BIOS settings.  If not  de‐
176              fined,  the current BIOS setting will not be available.  The de‐
177              fault value is "/usr/bin/syscfg".  This parameter is  used  only
178              by the "knl_generic" plugin.
179
180
181       SyscfgTimeout
182              Timeout  for  syscfg  program in milliseconds.  Default value is
183              1000 milliseconds.  For Dell KNL systems, experience  has  shown
184              that a higher value of 10000 milliseconds is more appropriate.
185
186
187       SystemType
188              Used to distinguish the flavor of knl we are dealing with.  Pos‐
189              sible options are "Dell" and "Intel".  The default value is "In‐
190              tel".  This parameter is used only by the "knl_generic" plugin.
191
192
193       UmeCheckInterval
194              Interval, in microseconds, between checks for Uncorrectable Mem‐
195              ory Errors (UME).  If such errors are detected, the node will be
196              set  to  a  DOWN state and the slurmd daemon will shutdown.  The
197              default value is 0 (disabled).  See also McPath.
198
199
200       ValidateMode
201              If set to 1 then validate, but do not modify the node's  config‐
202              ured  MCDRAM and NUMA modes from the slurm.conf file. If the ac‐
203              tual modes do not match configured values the node will  be  set
204              to  a  DOWN  state.  Every KNL nodes MCDRAM and NUMA states must
205              both be listed in the slurm.conf file.  This parameter  is  used
206              only by the "knl_cray" plugin.
207
208

EXAMPLE

210       ###################################################################
211       # knl_cray.conf
212       # Slurm configuration file for Intel Knights Landing on Cray system
213       ###################################################################
214       CapmcPath=/opt/cray/capmc/default/bin/capmc
215       CapmcTimeout=6000
216       DefaultMCDRAM=flat
217       DefaultNUMA=a2a
218       NumaCpuBind=a2a=core;snc2=thread;snc4=thread
219       LogFile=/var/tmp/slurm_node_feature.log
220       SyscfgPath=/usr/sbin/syscfg
221
222

COPYING

224       Copyright (C) 2015-2021 SchedMD LLC.
225
226       This  file  is  part  of Slurm, a resource management program.  For de‐
227       tails, see <https://slurm.schedmd.com/>.
228
229       Slurm is free software; you can redistribute it and/or modify it  under
230       the  terms  of  the GNU General Public License as published by the Free
231       Software Foundation; either version 2 of the License, or (at  your  op‐
232       tion) any later version.
233
234       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
235       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
236       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
237       for more details.
238
239

SEE ALSO

241       slurm.conf(5)
242
243
244
245June 2021                  Slurm Configuration File                knl.conf(5)
Impressum