1knl.conf(5) Slurm Configuration File knl.conf(5)
2
3
4
6 knl.conf - Slurm configuration file for Intel Knights Landing proces‐
7 sor.
8
9
11 This ASCII file which describes configuration information for Intel
12 Knights Landing processors and its name may depend upon the NodeFea‐
13 tures plugin configured in Slurm. For example, on Cray systems NodeFea‐
14 tures should be configured to "knl_cray" and its configuration file
15 will be read from "knl_cray.conf". The file will always be located in
16 the same directory as the slurm.conf. This file is optional.
17
18 Parameter names are case insensitive. Any text following a "#" in the
19 configuration file is treated as a comment through the end of that
20 line. Changes to the configuration file take effect upon restart of
21 Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
22 command "scontrol reconfigure" unless otherwise noted.
23
24 The overall configuration parameters available include:
25
26
27 AllowMCDRAM
28 Specify the MCDRAM modes which jobs are allowed to use. This
29 may be a subset of MCDRAM modes supported by the node. If not
30 specified, all MCDRAM modes supported by the node are available
31 for use. The comma separated list of allowed MCDRAM modes may
32 include any of the modes listed below.
33
34 cache All of MCDRAM to be used as cache.
35
36 equal MCDRAM to be used partly as cache and partly
37 combined with primary memory.
38
39 flat MCDRAM to be combined with primary memory into
40 a "flat" memory space.
41
42 AllowNUMA
43 Specify the NUMA modes which jobs are allowed to use. This may
44 be a subset of NUMA modes supported by the node. If not speci‐
45 fied, all NUMA modes supported by the node are available for
46 use. The comma separated list of allowed NUMA modes may include
47 any of the modes listed below. Note that Slurm can only support
48 homogeneous nodes (e.g. the same number of cores per NUMA node).
49 KNL scn4 and quad modes are not homogeneous, but each NUMA mode
50 will have either 16 or 18 cores. This will result in Slurm us‐
51 ing the lower core count and finding a total of 256 threads
52 rather than 272 threads and setting the node to a DOWN state.
53 Therefore it is recommended that snc4 and quad mode not be al‐
54 lowed at this time.
55
56 a2a All to all
57
58 snc2 Sub-NUMA cluster 2
59
60 snc4 Sub-NUMA cluster 4
61
62 hemi Hemisphere
63
64 quad Quadrant
65
66 AllowUserBoot
67 A comma-delimited list of users allowed to modify a node's MC‐
68 DRAM or NUMA state. If not specified then any user can change a
69 node's state and reboot it.
70
71 BootTime
72 Estimated time to reboot a node in seconds. Used as a basis for
73 optimizing scheduling decisions. The default value is 300 sec‐
74 onds (5 minutes) for the "knl_generic" plugin and 2700 seconds
75 (45 minutes) for the "knl_cray" plugin.
76
77 CapmcPath
78 Fully qualified path to the capmc program. The default value is
79 "/opt/cray/capmc/default/bin/capmc". This parameter is used
80 only by the "knl_cray" plugin.
81
82 CapmcPollFreq
83 Time interval between when the capmc program should poll for
84 node state changes, in seconds. The default value is 45 sec‐
85 onds. This parameter is used only by the "knl_cray" plugin.
86
87 CapmcRetries
88 Number of times to retry failed operations of the capmc program.
89 Default value is 4.
90
91 CapmcTimeout
92 Time limit for the capmc program to return status information
93 milliseconds. The default value is 60000 milliseconds and the
94 minimum value is 1000 milliseconds. This parameter is used by
95 the "knl_cray" plugin, plus the capmc_suspend and capmc_resume
96 programs used for suspending and resuming nodes.
97
98 CnselectPath
99 Fully qualified path to the cnselect program. The default value
100 is "/opt/cray/sdb/default/bin/cnselect". This parameter is used
101 only by the "knl_cray" plugin.
102
103 DefaultMCDRAM
104 Specify the default MCDRAM modes for job's which do not specify
105 a value. This is only used when a node is booted and the job
106 which has been allocated the node does not specify a desired MC‐
107 DRAM mode. The value can include one of the possible values
108 identified with the AllowMCDRAM configuration parameter above.
109 The default value is "cache".
110
111 DefaultNUMA
112 Specify the default NUMA modes for job's which do not specify a
113 value. This is only used when a node is booted and the job
114 which has been allocated the node does not specify a desired
115 NUMA mode. The value can include one of the possible values
116 identified with the AllowNUMA configuration parameter above.
117 The default value is "a2a".
118
119 Force If set to a non-zero value then load the node_features/generic
120 plugin even on non-KNL nodes. Used primarily for testing pur‐
121 poses.
122
123 LogFile
124 Fully qualified path to a log file. The default value is Slurm‐
125 ctldLogFile from the slurm.conf configuration file. This is op‐
126 tion is used only by the campc_suspend and campc_resume programs
127 (which power down and reboot nodes in the appropriate configura‐
128 tion).
129
130 McPath Fully qualified path to memory controller device file directory.
131 Children of this directory with names of the form
132 "mc#/csrow#/ue_count" (i.e. the count of unrecoverable memory
133 errors) will be monitored for non-zero values. If such errors
134 are detected, the node will be set to a DOWN state and the
135 slurmd daemon will shutdown. The default value is "/sys/de‐
136 vices/system/edac/mc". See also UmeCheckInterval.
137
138 NodeRebootWeight
139 If a compute node requires a reboot to be usable for a pending
140 job, then reset the node's weight to the specified value. The
141 default value is 4,294,967,294 (0xfffffffe). See also "Weight"
142 in the node configuration specification of slurm.conf.
143
144 NumaCpuBind
145 Contains pairs of NUMA modes and the CpuBind mode to set a node
146 to for that mode. Any compute node found with or set to the
147 specified NUMA mode will have that node's CpuBind field set to
148 the configured value. The NUMA node will be followed by an
149 equal sign the desired CpuBind mode for that NUMA mode. Multiple
150 NUMA mode and CpuBind modes should be in a semicolon separated
151 list. By default changes to a node's NUMA mode will not effect
152 that node's CpuBind mode. See the example below.
153
154 SyscfgPath
155 Fully qualified path to Intel's syscfg program, which identifies
156 current KNL configuration by viewing BIOS settings. If not de‐
157 fined, the current BIOS setting will not be available. The de‐
158 fault value is "/usr/bin/syscfg". This parameter is used only
159 by the "knl_generic" plugin.
160
161 SyscfgTimeout
162 Timeout for syscfg program in milliseconds. Default value is
163 1000 milliseconds. For Dell KNL systems, experience has shown
164 that a higher value of 10000 milliseconds is more appropriate.
165
166 SystemType
167 Used to distinguish the flavor of knl we are dealing with. Pos‐
168 sible options are "Dell" and "Intel". The default value is "In‐
169 tel". This parameter is used only by the "knl_generic" plugin.
170
171 UmeCheckInterval
172 Interval, in microseconds, between checks for Uncorrectable Mem‐
173 ory Errors (UME). If such errors are detected, the node will be
174 set to a DOWN state and the slurmd daemon will shutdown. The
175 default value is 0 (disabled). See also McPath.
176
177 ValidateMode
178 If set to 1 then validate, but do not modify the node's config‐
179 ured MCDRAM and NUMA modes from the slurm.conf file. If the ac‐
180 tual modes do not match configured values the node will be set
181 to a DOWN state. Every KNL nodes MCDRAM and NUMA states must
182 both be listed in the slurm.conf file. This parameter is used
183 only by the "knl_cray" plugin.
184
186 ###################################################################
187 # knl_cray.conf
188 # Slurm configuration file for Intel Knights Landing on Cray system
189 ###################################################################
190 CapmcPath=/opt/cray/capmc/default/bin/capmc
191 CapmcTimeout=6000
192 DefaultMCDRAM=flat
193 DefaultNUMA=a2a
194 NumaCpuBind=a2a=core;snc2=thread;snc4=thread
195 LogFile=/var/tmp/slurm_node_feature.log
196 SyscfgPath=/usr/sbin/syscfg
197
198
200 Copyright (C) 2015-2022 SchedMD LLC.
201
202 This file is part of Slurm, a resource management program. For de‐
203 tails, see <https://slurm.schedmd.com/>.
204
205 Slurm is free software; you can redistribute it and/or modify it under
206 the terms of the GNU General Public License as published by the Free
207 Software Foundation; either version 2 of the License, or (at your op‐
208 tion) any later version.
209
210 Slurm is distributed in the hope that it will be useful, but WITHOUT
211 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
212 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
213 for more details.
214
215
217 slurm.conf(5)
218
219
220
221January 2022 Slurm Configuration File knl.conf(5)