1HWLOC-CALC(1) hwloc HWLOC-CALC(1)
2
3
4
6 hwloc-calc - Operate on cpu mask strings and objects
7
9 hwloc-calc [topology options] [options] <location1> [<location2> [...]
10 ]
11
12 Note that hwloc(7) provides a detailed explanation of the hwloc system
13 and of valid <location> formats; it should be read before reading this
14 man page.
15
17 All topology options must be given before all other options.
18
19 --no-smt, --no-smt=<N>
20 Only keep the first PU per core in the input locations. If
21 <N> is specified, keep the <N>-th instead, if any. PUs are
22 ordered by physical index during this filtering.
23
24 Note that this option is applied after searching locations.
25 Hence --no-smt pu:2-5 will first select the PUs #2 to #5 in
26 the machine before keeping one of them per core. To rather
27 get PUs #2 to #5 after filtering one per core, you should
28 combine invocations:
29
30 hwloc-calc --restrict $(hwloc-calc --no-smt all) pu:2-5
31
32
33 --cpukind <n>, --cpukind <infoname>=<infovalue>
34 Only keep PUs whose CPU kind match. Either a single CPU kind
35 is specified as an index, or the info attribute name-value
36 will select matching kinds.
37
38 When specified by index, it corresponds to hwloc ranking of
39 CPU kinds which returns energy-efficient cores first, and
40 high-performance power-hungry cores last. The full list of
41 CPU kinds may be seen with lstopo --cpukinds.
42
43 Note that this option is applied after searching locations.
44 Hence --cpukind 0 core:1 will return the second core of the
45 machine if it is of kind 0, and nothing otherwise. To rather
46 get the second core among those of kind 0, you should combine
47 invocations:
48
49 hwloc-calc --restrict $(hwloc-calc --cpukind 0 all) core:1
50
51
52 --restrict <cpuset>
53 Restrict the topology to the given cpuset. This removes some
54 PUs and their now-child-less parents.
55
56 This is useful when combining invocations to filter some ob‐
57 jects before selecting among them.
58
59 Beware that restricting the PUs in a topology may change the
60 logical indexes of many objects, including NUMA nodes.
61
62 --restrict nodeset=<nodeset>
63 Restrict the topology to the given nodeset (unless --re‐
64 strict-flags specifies something different). This removes
65 some NUMA nodes and their now-child-less parents.
66
67 Beware that restricting the NUMA nodes in a topology may
68 change the logical indexes of many objects, including PUs.
69
70 --restrict-flags <flags>
71 Enforce flags when restricting the topology. Flags may be
72 given as numeric values or as a comma-separated list of flag
73 names that are passed to hwloc_topology_restrict(). Those
74 names may be substrings of actual flag names as long as a
75 single one matches, for instance bynodeset,memless. The de‐
76 fault is 0 (or none).
77
78 --disallowed
79 Include objects disallowed by administrative limitations.
80
81 -i <path>, --input <path>
82 Read the topology from <path> instead of discovering the
83 topology of the local machine.
84
85 If <path> is a file, it may be a XML file exported by a pre‐
86 vious hwloc program. If <path> is "-", the standard input
87 may be used as a XML file.
88
89 On Linux, <path> may be a directory containing the topology
90 files gathered from another machine topology with hwloc-
91 gather-topology.
92
93 On x86, <path> may be a directory containing a cpuid dump
94 gathered with hwloc-gather-cpuid.
95
96 When the archivemount program is available, <path> may also
97 be a tarball containing such Linux or x86 topology files.
98
99 -i <specification>, --input <specification>
100 Simulate a fake hierarchy (instead of discovering the topol‐
101 ogy on the local machine). If <specification> is "node:2
102 pu:3", the topology will contain two NUMA nodes with 3 pro‐
103 cessing units in each of them. The <specification> string
104 must end with a number of PUs.
105
106 --if <format>, --input-format <format>
107 Enforce the input in the given format, among xml, fsroot,
108 cpuid and synthetic.
109
111 All these options must be given after all topology options above.
112
113 -p --physical
114 Use OS/physical indexes instead of logical indexes for both
115 input and output.
116
117 -l --logical
118 Use logical indexes instead of physical/OS indexes for both
119 input and output (default).
120
121 --pi --physical-input
122 Use OS/physical indexes instead of logical indexes for input.
123
124 --li --logical-input
125 Use logical indexes instead of physical/OS indexes for input
126 (default).
127
128 --po --physical-output
129 Use OS/physical indexes instead of logical indexes for out‐
130 put.
131
132 --lo --logical-output
133 Use logical indexes instead of physical/OS indexes for output
134 (default, except for cpusets which are always physical).
135
136 -n --nodeset
137 Interpret both input and output sets as nodesets instead of
138 CPU sets. See --nodeset-output and --nodeset-input below for
139 details.
140
141 --no --nodeset-output
142 Report nodesets instead of CPU sets. This output is more
143 precise than the default CPU set output when memory locality
144 matters because it properly describes CPU-less NUMA nodes, as
145 well as NUMA-nodes that are local to multiple CPUs.
146
147 --ni --nodeset-input
148 Interpret input sets as nodesets instead of CPU sets.
149
150 --oo --object-output
151 When reporting object indexes (e.g. with -I or --local-mem‐
152 ory), this option prefixes these indexes with types (e.g.
153 Core:0 instead of 0).
154
155 -N --number-of <type|depth>
156 Report the number of objects of the given type or depth that
157 intersect the CPU set. This is convenient for finding how
158 many cores, NUMA nodes or PUs are available in a machine.
159
160 When combined with --nodeset or --nodeset-output, the nodeset
161 is considered instead of the CPU set for finding matching ob‐
162 jects. This is useful when reporting the output as a number
163 or set of NUMA nodes.
164
165 <type may contain a filter to select specific objects among
166 the type. For instance -N "numa[hbm]" counts NUMA nodes
167 marked with subtype "HBM", while -N "numa[mcdram]" only
168 counts MCDRAM NUMA nodes on KNL.
169
170 If an OS device subtype such as gpu is given instead of os‐
171 dev, only the os devices of that subtype will be counted.
172
173 -I --intersect <type|depth>
174 Find the list of objects of the given type or depth that in‐
175 tersect the CPU set and report the comma-separated list of
176 their indexes instead of the cpu mask string. This may be
177 used for determining the list of objects above or below the
178 input objects.
179
180 When combined with --physical, the list is convenient to pass
181 to external tools such as taskset or numactl --physcpubind or
182 --membind. This is different from --largest since the latter
183 requires that all reported objects are strictly included in‐
184 side the input objects.
185
186 When combined with --nodeset or --nodeset-output, the nodeset
187 is considered instead of the CPU set for finding matching ob‐
188 jects. This is useful when reporting the output as a number
189 or set of NUMA nodes.
190
191 <type may contain a filter to select specific objects among
192 the type. For instance -N "numa[hbm]" lists NUMA nodes marked
193 with subtype "HBM", while -N "numa[mcdram]" only lists MCDRAM
194 NUMA nodes on KNL.
195
196 If an OS device subtype such as gpu is given instead of os‐
197 dev, only the os devices of that subtype will be returned.
198
199 If combined with --object-output, object indexes are prefixed
200 with types (e.g. Core:0 instead of 0).
201
202 -H --hierarchical <type1>.<type2>...
203 Find the list of objects of type <type2> that intersect the
204 CPU set and report the space-separated list of their hierar‐
205 chical indexes with respect to <type1>, <type2>, etc. For
206 instance, if package.core is given, the output would be Pack‐
207 age:1.Core:2 Package:2.Core:3 if the input contains the third
208 core of the second package and the fourth core of the third
209 package.
210
211 Only normal CPU-side object types should be used.
212
213 NUMA nodes may be used but they may cause redundancy in the
214 output on heterogeneous memory platform. For instance, on a
215 platform with both DRAM and HBM memory on a package, the
216 first core will be considered both as first core of first
217 NUMA node (DRAM) and as first core of second NUMA node (HBM).
218
219 --largest Report (in a human readable format) the list of largest ob‐
220 jects which exactly include all input objects (by looking at
221 their CPU sets). None of these output objects intersect each
222 other, and the sum of them is exactly equivalent to the in‐
223 put. No larger object is included in the input.
224
225 This is different from --intersect where reported objects may
226 not be strictly included in the input.
227
228 --local-memory
229 Report the list of NUMA nodes that are local to the input ob‐
230 jects.
231
232 This option is similar to -I numa but the way nodes are se‐
233 lected is different: The selection performed by --local-mem‐
234 ory may be precisely configured with --local-memory-flags,
235 while -I numa just selects all nodes that are somehow local
236 to any of the input objects.
237
238 If combined with --object-output, object indexes are prefixed
239 with types (e.g. NUMANode:0 instead of 0).
240
241 --local-memory-flags
242 Change the flags used to select local NUMA nodes. Flags may
243 be given as numeric values or as a comma-separated list of
244 flag names that are passed to hwloc_get_local_numan‐
245 ode_objs(). Those names may be substrings of actual flag
246 names as long as a single one matches. The default is 3 (or
247 smaller,larger) which means NUMA nodes are displayed if their
248 locality either contains or is contained in the locality of
249 the given object.
250
251 This option enables --local-memory.
252
253 --best-memattr <name>
254 Enable the listing of local memory nodes with --local-memory,
255 but only display the local node that has the best value for
256 the memory attribute given by <name> (or as an index).
257
258 If the memory attribute values depend on the initiator, the
259 hwloc-calc input objects are used as the initiator.
260
261 Standard attribute names are Capacity, Locality, Bandwidth,
262 and Latency. All existing attributes in the current topology
263 may be listed with
264
265 $ lstopo --memattrs
266
267 If combined with --object-output, the object index is pre‐
268 fixed with its type (e.g. NUMANode:0 instead of 0).
269
270 --sep <sep>
271 Change the field separator in the output. By default, a
272 space is used to separate output objects (for instance when
273 --hierarchical or --largest is given) while a comma is used
274 to separate indexes (for instance when --intersect is given).
275
276 --single Singlify the output to a single CPU.
277
278 --taskset Display CPU set strings in the format recognized by the
279 taskset command-line program instead of hwloc-specific CPU
280 set string format. This option has no impact on the format
281 of input CPU set strings, both formats are always accepted.
282
283 -q --quiet
284 Hide non-fatal error messages. It mostly includes locations
285 pointing to non-existing objects.
286
287 -v --verbose
288 Verbose output.
289
290 --version Report version and exit.
291
292 -h --help Display help message and exit.
293
295 hwloc-calc generates and manipulates CPU mask strings or objects. Both
296 input and output may be either objects (with physical or logical in‐
297 dexes), CPU lists (with physical or logical indexes), or CPU mask
298 strings (always physically indexed). Input location specification is
299 described in hwloc(7).
300
301 If objects or CPU mask strings are given on the command-line, they are
302 combined and a single output is printed. If no object or CPU mask
303 strings are given on the command-line, the program will read the stan‐
304 dard input. It will combine multiple objects or CPU mask strings that
305 are given on the same line of the standard input line with spaces as
306 separators. Different input lines will be processed separately.
307
308 Command-line arguments and options are processed in order. First
309 topology configuration options should be given. Then, for instance,
310 changing the type of input indexes with --li or changing the input
311 topology with -i only affects the processing the following arguments.
312
313 NOTE: It is highly recommended that you read the hwloc(7) overview page
314 before reading this man page. Most of the concepts described in
315 hwloc(7) directly apply to the hwloc-calc utility.
316
318 hwloc-calc's operation is best described through several examples.
319
320 To display the (physical) CPU mask corresponding to the second package:
321
322 $ hwloc-calc package:1
323 0x000000f0
324
325 To display the (physical) CPU mask corresponding to the third pacakge,
326 excluding its even numbered logical processors:
327
328 $ hwloc-calc package:2 ~PU:even
329 0x00000c00
330
331 To convert a cpu mask to human-readable output, the -H option can be
332 used to emit a space-delimited list of locations:
333
334 $ echo 0x000000f0 | hwloc-calc -H package.core
335 Package:1.Core1 Package:1.Core:1 Package:1.Core:2 Package:1.Core:3
336
337 To use some other character (e.g., a comma) instead of spaces in out‐
338 put, use the --sep option:
339
340 $ echo 0x000000f0 | hwloc-calc -H package.core --sep ,
341 Package:1.Core1,Package:1.Core:1,Package:1.Core:2,Package:1.Core:3
342
343 To combine two (physical) CPU masks:
344
345 $ hwloc-calc 0x0000ffff 0xff000000
346 0xff00ffff
347
348 To display the list of logical numbers of processors included in the
349 second package:
350
351 $ hwloc-calc --intersect PU package:1
352 4,5,6,7
353
354 To bind GNU OpenMP threads logically over the whole machine, we need to
355 use physical number output instead:
356
357 $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --inter‐
358 sect PU all`
359 $ echo $GOMP_CPU_AFFINITY
360 0,4,1,5,2,6,3,7
361
362 To display the list of NUMA nodes, by physical indexes, that intersect
363 a given (physical) CPU mask:
364
365 $ hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0
366 0,2
367
368 To find how many cores are in the second CPU kind (those cores are
369 likely higher-performance and more power-hungry than cores of the first
370 kind):
371
372 $ hwloc-calc --cpukind 1 -N core all
373 4
374
375 To display the list of NUMA nodes, by physical indexes, whose locality
376 is exactly equal to a Package:
377
378 $ hwloc-calc --local-memory-flags 0 --physical-output pack:1
379 4,7
380
381 To display the best-capacity NUMA node, by physical indexes, whose lo‐
382 cality is exactly equal to a Package:
383
384 $ hwloc-calc --local-memory-flags 0 --best-memattr capacity --phys‐
385 ical-output pack:1
386 4
387
388 To find the number of NUMA nodes with subtype "HBM":
389
390 $ hwloc-calc -N "numa[hbm]" all
391 4
392
393 To find the number of NUMA nodes in memory tier 1 (DRAM nodes on a
394 server with HBM and DRAM):
395
396 $ hwloc-calc -N "numa[tier=1]" all
397 4
398
399 To find the NUMA node of subtype MCDRAM (on KNL) near a PU:
400
401 $ hwloc-calc -I "numa[mcdram]" pu:157
402 1
403
404 Converting object logical indexes (default) from/to physical/OS indexes
405 may be performed with --intersect combined with either --physical-out‐
406 put (logical to physical conversion) or --physical-input (physical to
407 logical):
408
409 $ hwloc-calc --physical-output PU:2 --intersect PU
410 3
411 $ hwloc-calc --physical-input PU:3 --intersect PU
412 2
413
414 One should add --nodeset when converting indexes of memory objects to
415 make sure a single NUMA node index is returned on platforms with het‐
416 erogeneous memory:
417
418 $ hwloc-calc --nodeset --physical-output node:2 --intersect node
419 3
420 $ hwloc-calc --nodeset --physical-input node:3 --intersect node
421 2
422
423 To display the set of CPUs near network interface eth0:
424
425 $ hwloc-calc os=eth0
426 0x00005555
427
428 To display the indexes of packages near PCI device whose bus ID is
429 0000:01:02.0:
430
431 $ hwloc-calc pci=0000:01:02.0 --intersect Package
432 1
433
434 To display the list of per-package cores that intersect the input:
435
436 $ hwloc-calc 0x00003c00 --hierarchical package.core
437 Package:2.Core:1 Package:3.Core:0
438
439 To display the (physical) CPU mask of the entire topology except the
440 third package:
441
442 $ hwloc-calc all ~package:3
443 0x0000f0ff
444
445 To combine both physical and logical indexes as input:
446
447 $ hwloc-calc PU:2 --physical-input PU:3
448 0x0000000c
449
450 To synthetize a set of cores into largest objects on a 2-node 2-package
451 2-core machine:
452
453 $ hwloc-calc core:0 --largest
454 Core:0
455 $ hwloc-calc core:0-1 --largest
456 Package:0
457 $ hwloc-calc core:4-7 --largest
458 NUMANode:1
459 $ hwloc-calc core:2-6 --largest
460 Package:1 Package:2 Core:6
461 $ hwloc-calc pack:2 --largest
462 Package:2
463 $ hwloc-calc package:2-3 --largest
464 NUMANode:1
465
466 To get the set of first threads of all cores:
467
468 $ hwloc-calc core:all.pu:0
469 $ hwloc-calc --no-smt all
470
471 This can also be very useful in order to make GNU OpenMP use exactly
472 one thread per core, and in logical core order:
473
474 $ export OMP_NUM_THREADS=`hwloc-calc --number-of core all`
475 $ echo $OMP_NUM_THREADS
476 4
477 $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --inter‐
478 sect PU --no-smt all`
479 $ echo $GOMP_CPU_AFFINITY
480 0,2,1,3
481
482 To export bitmask in a format that is acceptable by the resctrl Linux
483 subsystem (for configuring cache partitioning, etc), apply a sed regexp
484 to the output of hwloc-calc:
485
486 $ hwloc-calc pack:all.core:7-9.pu:0
487 0x00000380,,0x00000380 <this format cannot be given to resctrl>
488 $ hwloc-calc pack:all.core:7-9.pu:0 | sed -e 's/0x//g' -e
489 's/,,/,0,/g' -e 's/,,/,0,/g'
490 00000380,0,00000380
491 # echo 00000380,0,00000380 > /sys/fs/resctrl/test/cpus
492 # cat /sys/fs/resctrl/test/cpus
493 00000000,00000380,00000000,00000380 <the modified bitmask was
494 corrected parsed by resctrl>
495
496 OS devices may also be filtered by subtype. In this example, there are
497 8 OS devices in the system, 4 of them are near NUMA node #1, and only 2
498 of these are CoProcessors:
499
500 $ utils/hwloc/hwloc-calc -I osdev all
501 0,1,2,3,4,5,6,7,8
502 $ utils/hwloc/hwloc-calc -I osdev node:1
503 5,6,7,8
504 $ utils/hwloc/hwloc-calc -I coproc node:1
505 7,8
506
507
509 Upon successful execution, hwloc-calc displays the (physical) CPU mask
510 string, (physical or logical) object list, or (physical or logical) ob‐
511 ject number list. The return value is 0.
512
513 hwloc-calc will return nonzero if any kind of error occurs, such as
514 (but not limited to): failure to parse the command line.
515
517 hwloc(7), lstopo(1), hwloc-info(1)
518
519
520
521
5222.10.0 Dec 04, 2023 HWLOC-CALC(1)