1knl.conf(5) Slurm Configuration File knl.conf(5)
2
3
4
6 knl.conf - Slurm configuration file for Intel Knights Landing proces‐
7 sor.
8
9
11 This ASCII file which describes configuration information for Intel
12 Knights Landing processors and its name may depend upon the NodeFea‐
13 tures plugin configured in Slurm. For example, on Cray systems NodeFea‐
14 tures should be configured to "knl_cray" and its configuration file
15 will be read from "knl_cray.conf". The file location can be modified
16 at system build time using the DEFAULT_SLURM_CONF parameter or at exe‐
17 cution time by setting the SLURM_CONF environment variable. The file
18 will always be located in the same directory as the slurm.conf file.
19 This file is optional.
20
21 Parameter names are case insensitive. Any text following a "#" in the
22 configuration file is treated as a comment through the end of that
23 line. Changes to the configuration file take effect upon restart of
24 Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
25 command "scontrol reconfigure" unless otherwise noted.
26
27 The overall configuration parameters available include:
28
29
30 AllowMCDRAM
31 Specify the MCDRAM modes which jobs are allowed to use. This
32 may be a subset of MCDRAM modes supported by the node. If not
33 specified, all MCDRAM modes supported by the node are available
34 for use. The comma separated list of allowed MCDRAM modes may
35 include any of the modes listed below.
36
37 cache All of MCDRAM to be used as cache.
38
39 equal MCDRAM to be used partly as cache and partly
40 combined with primary memory.
41
42 flat MCDRAM to be combined with primary memory into
43 a "flat" memory space.
44
45
46 AllowNUMA
47 Specify the NUMA modes which jobs are allowed to use. This may
48 be a subset of NUMA modes supported by the node. If not speci‐
49 fied, all NUMA modes supported by the node are available for
50 use. The comma separated list of allowed NUMA modes may include
51 any of the modes listed below. Note that Slurm can only support
52 homogeneous nodes (e.g. the same number of cores per NUMA node).
53 KNL scn4 and quad modes are not homogeneous, but each NUMA mode
54 will have either 16 or 18 cores. This will result in Slurm
55 using the lower core count and finding a total of 256 threads
56 rather than 272 threads and setting the node to a DOWN state.
57 Therefore it is recommended that snc4 and quad mode not be
58 allowed at this time.
59
60 a2a All to all
61
62 snc2 Sub-NUMA cluster 2
63
64 snc4 Sub-NUMA cluster 4
65
66 hemi Hemisphere
67
68 quad Quadrant
69
70
71 AllowUserBoot
72 A comma delimited list of users allowed to modify a node's
73 MCDRAM or NUMA state. If not specified then any user can change
74 a node's state and reboot it.
75
76
77 BootTime
78 Estimated time to reboot a node in seconds. Used as a basis for
79 optimizing scheduling decisions. The default value is 300 sec‐
80 onds (5 minutes) for the "knl_generic" plugin and 2700 seconds
81 (45 minutes) for the "knl_cray" plugin.
82
83
84 CapmcPath
85 Fully qualified path to the capmc program. The default value is
86 "/opt/cray/capmc/default/bin/capmc". This parameter is used
87 only by the "knl_cray" plugin.
88
89
90 CapmcPollFreq
91 Time interval between when the capmc program should poll for
92 node state changes, in seconds. The default value is 45 sec‐
93 onds. This parameter is used only by the "knl_cray" plugin.
94
95
96 CapmcRetries
97 Number of times to retry failed operations of the capmc program.
98 Default value is 4.
99
100
101 CapmcTimeout
102 Time limit for the capmc program to return status information
103 milliseconds. The default value is 60000 milliseconds and the
104 minimum value is 1000 milliseconds. This parameter is used by
105 the "knl_cray" plugin, plus the capmc_suspend and capmc_resume
106 programs used for suspending and resuming nodes.
107
108
109 CnselectPath
110 Fully qualified path to the cnselect program. The default value
111 is "/opt/cray/sdb/default/bin/cnselect". This parameter is used
112 only by the "knl_cray" plugin.
113
114
115 DefaultMCDRAM
116 Specify the default MCDRAM modes for job's which do not specify
117 a value. This is only used when a node is booted and the job
118 which has been allocated the node does not specify a desired
119 MCDRAM mode. The value can include one of the possible values
120 identified with the AllowMCDRAM configuration parameter above.
121 The default value is "cache".
122
123
124 DefaultNUMA
125 Specify the default NUMA modes for job's which do not specify a
126 value. This is only used when a node is booted and the job
127 which has been allocated the node does not specify a desired
128 NUMA mode. The value can include one of the possible values
129 identified with the AllowNUMA configuration parameter above.
130 The default value is "a2a".
131
132
133 Force If set to a non-zero value then load the node_features/generic
134 plugin even on non-KNL nodes. Used primarily for testing pur‐
135 poses.
136
137
138 LogFile
139 Fully qualified path to a log file. The default value is Slurm‐
140 ctldLogFile from the slurm.conf configuration file. This is
141 option is used only by the campc_suspend and campc_resume pro‐
142 grams (which power down and reboot nodes in the appropriate con‐
143 figuration).
144
145
146 McPath Fully qualified path to memory controller device file directory.
147 Children of this directory with names of the form
148 "mc#/csrow#/ue_count" (i.e. the count of unrecoverable memory
149 errors) will be monitored for non-zero values. If such errors
150 are detected, the node will be set to a DOWN state and the
151 slurmd daemon will shutdown. The default value is
152 "/sys/devices/system/edac/mc". See also UmeCheckInterval.
153
154
155 NodeRebootWeight
156 If a compute node requires a reboot to be usable for a pending
157 job, then reset the node's weight to the specified value. The
158 default value is 4,294,967,294 (0xfffffffe). See also "Weight"
159 in the node configuration specification of slurm.conf.
160
161
162 NumaCpuBind
163 Contains pairs of NUMA modes and the CpuBind mode to set a node
164 to for that mode. Any compute node found with or set to the
165 specified NUMA mode will have that node's CpuBind field set to
166 the configured value. The NUMA node will be followed by an
167 equal sign the desired CpuBind mode for that NUMA mode. Multiple
168 NUMA mode and CpuBind modes should be in a semicolon separated
169 list. By default changes to a node's NUMA mode will not effect
170 that node's CpuBind mode. See the example below.
171
172
173 SyscfgPath
174 Fully qualified path to Intel's syscfg program, which identifies
175 current KNL configuration by viewing BIOS settings. If not
176 defined, the current BIOS setting will not be available. The
177 default value is "/usr/bin/syscfg". This parameter is used only
178 by the "knl_generic" plugin.
179
180
181 SyscfgTimeout
182 Timeout for syscfg program in milliseconds. Default value is
183 1000 milliseconds. For Dell KNL systems, experience has shown
184 that a higher value of 10000 milliseconds is more appropriate.
185
186
187 SystemType
188 Used to distinguish the flavor of knl we are dealing with. Pos‐
189 sible options are "Dell" and "Intel". The default value is
190 "Intel". This parameter is used only by the "knl_generic" plug‐
191 in.
192
193
194 UmeCheckInterval
195 Interval, in microseconds, between checks for Uncorrectable Mem‐
196 ory Errors (UME). If such errors are detected, the node will be
197 set to a DOWN state and the slurmd daemon will shutdown. The
198 default value is 0 (disabled). See also McPath.
199
200
201 ValidateMode
202 If set to 1 then validate, but do not modify the node's config‐
203 ured MCDRAM and NUMA modes from the slurm.conf file. If the
204 actual modes do not match configured values the node will be set
205 to a DOWN state. Every KNL nodes MCDRAM and NUMA states must
206 both be listed in the slurm.conf file. This parameter is used
207 only by the "knl_cray" plugin.
208
209
211 ###################################################################
212 # knl_cray.conf
213 # Slurm configuration file for Intel Knights Landing on Cray system
214 ###################################################################
215 CapmcPath=/opt/cray/capmc/default/bin/capmc
216 CapmcTimeout=6000
217 DefaultMCDRAM=flat
218 DefaultNUMA=a2a
219 NumaCpuBind=a2a=core;snc2=thread;snc4=thread
220 LogFile=/var/tmp/slurm_node_feature.log
221 SyscfgPath=/usr/sbin/syscfg
222
223
225 Copyright (C) 2015-2017 SchedMD LLC.
226
227 This file is part of Slurm, a resource management program. For
228 details, see <https://slurm.schedmd.com/>.
229
230 Slurm is free software; you can redistribute it and/or modify it under
231 the terms of the GNU General Public License as published by the Free
232 Software Foundation; either version 2 of the License, or (at your
233 option) any later version.
234
235 Slurm is distributed in the hope that it will be useful, but WITHOUT
236 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
237 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
238 for more details.
239
240
242 slurm.conf(5)
243
244
245
246May 2018 Slurm Configuration File knl.conf(5)