1HWLOC-CALC(1)                        hwloc                       HWLOC-CALC(1)
2
3
4

NAME

6       hwloc-calc - Operate on cpu mask strings and objects
7

SYNOPSIS

9       hwloc-calc  [topology options] [options] <location1> [<location2> [...]
10       ]
11
12       Note that hwloc(7) provides a detailed explanation of the hwloc  system
13       and  of valid <location> formats; it should be read before reading this
14       man page.
15

TOPOLOGY OPTIONS

17       All topology options must be given before all other options.
18
19       --no-smt, --no-smt=<N>
20                 Only keep the first PU per core in the input  locations.   If
21                 <N>  is  specified, keep the <N>-th instead, if any.  PUs are
22                 ordered by physical index during this filtering.
23
24                 Note that this option is applied after  searching  locations.
25                 Hence  --no-smt  pu:2-5 will first select the PUs #2 to #5 in
26                 the machine before keeping one of them per core.   To  rather
27                 get  PUs  #2  to  #5 after filtering one per core, you should
28                 combine invocations:
29
30                   hwloc-calc --restrict $(hwloc-calc --no-smt all) pu:2-5
31
32
33       --cpukind <n>, --cpukind <infoname>=<infovalue>
34                 Only keep PUs whose CPU kind match.  Either a single CPU kind
35                 is  specified  as  an index, or the info attribute name-value
36                 will select matching kinds.
37
38                 When specified by index, it corresponds to hwloc  ranking  of
39                 CPU  kinds  which  returns  energy-efficient cores first, and
40                 high-performance power-hungry cores last.  The full  list  of
41                 CPU kinds may be seen with lstopo --cpukinds.
42
43                 Note  that  this option is applied after searching locations.
44                 Hence --cpukind 0 core:1 will return the second core  of  the
45                 machine if it is of kind 0, and nothing otherwise.  To rather
46                 get the second core among those of kind 0, you should combine
47                 invocations:
48
49                   hwloc-calc --restrict $(hwloc-calc --cpukind 0 all) core:1
50
51
52       --restrict <cpuset>
53                 Restrict the topology to the given cpuset.  This removes some
54                 PUs and their now-child-less parents.
55
56                 This is useful when combining invocations to filter some  ob‐
57                 jects before selecting among them.
58
59                 Beware  that restricting the PUs in a topology may change the
60                 logical indexes of many objects, including NUMA nodes.
61
62       --restrict nodeset=<nodeset>
63                 Restrict the topology to  the  given  nodeset  (unless  --re‐
64                 strict-flags  specifies  something  different).  This removes
65                 some NUMA nodes and their now-child-less parents.
66
67                 Beware that restricting the NUMA  nodes  in  a  topology  may
68                 change the logical indexes of many objects, including PUs.
69
70       --restrict-flags <flags>
71                 Enforce  flags  when  restricting the topology.  Flags may be
72                 given as numeric values or as a comma-separated list of  flag
73                 names  that  are  passed to hwloc_topology_restrict().  Those
74                 names may be substrings of actual flag names  as  long  as  a
75                 single  one matches, for instance bynodeset,memless.  The de‐
76                 fault is 0 (or none).
77
78       --disallowed
79                 Include objects disallowed by administrative limitations.
80
81       -i <path>, --input <path>
82                 Read the topology from  <path>  instead  of  discovering  the
83                 topology of the local machine.
84
85                 If  <path> is a file, it may be a XML file exported by a pre‐
86                 vious hwloc program.  If <path> is "-",  the  standard  input
87                 may be used as a XML file.
88
89                 On  Linux,  <path> may be a directory containing the topology
90                 files gathered from  another  machine  topology  with  hwloc-
91                 gather-topology.
92
93                 On  x86,  <path>  may  be a directory containing a cpuid dump
94                 gathered with hwloc-gather-cpuid.
95
96                 When the archivemount program is available, <path>  may  also
97                 be a tarball containing such Linux or x86 topology files.
98
99       -i <specification>, --input <specification>
100                 Simulate  a fake hierarchy (instead of discovering the topol‐
101                 ogy on the local  machine).  If  <specification>  is  "node:2
102                 pu:3",  the  topology will contain two NUMA nodes with 3 pro‐
103                 cessing units in each of them.   The  <specification>  string
104                 must end with a number of PUs.
105
106       --if <format>, --input-format <format>
107                 Enforce  the  input  in  the given format, among xml, fsroot,
108                 cpuid and synthetic.
109

OPTIONS

111       All these options must be given after all topology options above.
112
113       -p --physical
114                 Use OS/physical indexes instead of logical indexes  for  both
115                 input and output.
116
117       -l --logical
118                 Use  logical  indexes instead of physical/OS indexes for both
119                 input and output (default).
120
121       --pi --physical-input
122                 Use OS/physical indexes instead of logical indexes for input.
123
124       --li --logical-input
125                 Use logical indexes instead of physical/OS indexes for  input
126                 (default).
127
128       --po --physical-output
129                 Use  OS/physical  indexes instead of logical indexes for out‐
130                 put.
131
132       --lo --logical-output
133                 Use logical indexes instead of physical/OS indexes for output
134                 (default, except for cpusets which are always physical).
135
136       -n --nodeset
137                 Interpret  both  input and output sets as nodesets instead of
138                 CPU sets.  See --nodeset-output and --nodeset-input below for
139                 details.
140
141       --no --nodeset-output
142                 Report  nodesets  instead  of  CPU sets.  This output is more
143                 precise than the default CPU set output when memory  locality
144                 matters because it properly describes CPU-less NUMA nodes, as
145                 well as NUMA-nodes that are local to multiple CPUs.
146
147       --ni --nodeset-input
148                 Interpret input sets as nodesets instead of CPU sets.
149
150       --oo --object-output
151                 When reporting object indexes (e.g. with -I  or  --local-mem‐
152                 ory),  this  option  prefixes  these indexes with types (e.g.
153                 Core:0 instead of 0).
154
155       -N --number-of <type|depth>
156                 Report the number of objects of the given type or depth  that
157                 intersect  the  CPU  set.  This is convenient for finding how
158                 many cores, NUMA nodes or PUs are available in a machine.
159
160                 When combined with --nodeset or --nodeset-output, the nodeset
161                 is considered instead of the CPU set for finding matching ob‐
162                 jects.  This is useful when reporting the output as a  number
163                 or set of NUMA nodes.
164
165                 <type  may  contain a filter to select specific objects among
166                 the type. For  instance  -N  "numa[hbm]"  counts  NUMA  nodes
167                 marked  with  subtype  "HBM",  while  -N  "numa[mcdram]" only
168                 counts MCDRAM NUMA nodes on KNL.
169
170                 If an OS device subtype such as gpu  is given instead of  os‐
171                 dev, only the os devices of that subtype will be counted.
172
173       -I --intersect <type|depth>
174                 Find  the list of objects of the given type or depth that in‐
175                 tersect the CPU set and report the  comma-separated  list  of
176                 their  indexes  instead  of the cpu mask string.  This may be
177                 used for determining the list of objects above or  below  the
178                 input objects.
179
180                 When combined with --physical, the list is convenient to pass
181                 to external tools such as taskset or numactl --physcpubind or
182                 --membind.  This is different from --largest since the latter
183                 requires that all reported objects are strictly included  in‐
184                 side the input objects.
185
186                 When combined with --nodeset or --nodeset-output, the nodeset
187                 is considered instead of the CPU set for finding matching ob‐
188                 jects.   This is useful when reporting the output as a number
189                 or set of NUMA nodes.
190
191                 <type may contain a filter to select specific  objects  among
192                 the type. For instance -N "numa[hbm]" lists NUMA nodes marked
193                 with subtype "HBM", while -N "numa[mcdram]" only lists MCDRAM
194                 NUMA nodes on KNL.
195
196                 If  an  OS device subtype such as gpu is given instead of os‐
197                 dev, only the os devices of that subtype will be returned.
198
199                 If combined with --object-output, object indexes are prefixed
200                 with types (e.g. Core:0 instead of 0).
201
202       -H --hierarchical <type1>.<type2>...
203                 Find  the  list of objects of type <type2> that intersect the
204                 CPU set and report the space-separated list of their  hierar‐
205                 chical  indexes  with  respect to <type1>, <type2>, etc.  For
206                 instance, if package.core is given, the output would be Pack‐
207                 age:1.Core:2 Package:2.Core:3 if the input contains the third
208                 core of the second package and the fourth core of  the  third
209                 package.
210
211                 Only normal CPU-side object types should be used.
212
213                 NUMA  nodes  may be used but they may cause redundancy in the
214                 output on heterogeneous memory platform. For instance,  on  a
215                 platform  with  both  DRAM  and  HBM memory on a package, the
216                 first core will be considered both as  first  core  of  first
217                 NUMA node (DRAM) and as first core of second NUMA node (HBM).
218
219       --largest Report  (in  a human readable format) the list of largest ob‐
220                 jects which exactly include all input objects (by looking  at
221                 their CPU sets).  None of these output objects intersect each
222                 other, and the sum of them is exactly equivalent to  the  in‐
223                 put. No larger object is included in the input.
224
225                 This is different from --intersect where reported objects may
226                 not be strictly included in the input.
227
228       --local-memory
229                 Report the list of NUMA nodes that are local to the input ob‐
230                 jects.
231
232                 This  option  is similar to -I numa but the way nodes are se‐
233                 lected is different: The selection performed by  --local-mem‐
234                 ory  may  be  precisely configured with --local-memory-flags,
235                 while -I numa just selects all nodes that are  somehow  local
236                 to any of the input objects.
237
238                 If combined with --object-output, object indexes are prefixed
239                 with types (e.g. NUMANode:0 instead of 0).
240
241       --local-memory-flags
242                 Change the flags used to select local NUMA nodes.  Flags  may
243                 be  given  as  numeric values or as a comma-separated list of
244                 flag  names  that  are   passed   to   hwloc_get_local_numan‐
245                 ode_objs().   Those  names  may  be substrings of actual flag
246                 names as long as a single one matches.  The default is 3  (or
247                 smaller,larger) which means NUMA nodes are displayed if their
248                 locality either contains or is contained in the  locality  of
249                 the given object.
250
251                 This option enables --local-memory.
252
253       --best-memattr <name>
254                 Enable the listing of local memory nodes with --local-memory,
255                 but only display the local node that has the best  value  for
256                 the memory attribute given by <name> (or as an index).
257
258                 If  the  memory attribute values depend on the initiator, the
259                 hwloc-calc input objects are used as the initiator.
260
261                 Standard attribute names are Capacity,  Locality,  Bandwidth,
262                 and Latency.  All existing attributes in the current topology
263                 may be listed with
264
265                     $ lstopo --memattrs
266
267                 If combined with --object-output, the object  index  is  pre‐
268                 fixed with its type (e.g. NUMANode:0 instead of 0).
269
270       --sep <sep>
271                 Change  the  field  separator  in  the output.  By default, a
272                 space is used to separate output objects (for  instance  when
273                 --hierarchical  or  --largest is given) while a comma is used
274                 to separate indexes (for instance when --intersect is given).
275
276       --single  Singlify the output to a single CPU.
277
278       --taskset Display CPU set strings  in  the  format  recognized  by  the
279                 taskset  command-line  program  instead of hwloc-specific CPU
280                 set string format.  This option has no impact on  the  format
281                 of input CPU set strings, both formats are always accepted.
282
283       -q --quiet
284                 Hide  non-fatal error messages.  It mostly includes locations
285                 pointing to non-existing objects.
286
287       -v --verbose
288                 Verbose output.
289
290       --version Report version and exit.
291
292       -h --help Display help message and exit.
293

DESCRIPTION

295       hwloc-calc generates and manipulates CPU mask strings or objects.  Both
296       input  and  output  may be either objects (with physical or logical in‐
297       dexes), CPU lists (with physical  or  logical  indexes),  or  CPU  mask
298       strings  (always  physically indexed).  Input location specification is
299       described in hwloc(7).
300
301       If objects or CPU mask strings are given on the command-line, they  are
302       combined  and  a  single  output  is printed.  If no object or CPU mask
303       strings are given on the command-line, the program will read the  stan‐
304       dard  input.  It will combine multiple objects or CPU mask strings that
305       are given on the same line of the standard input line  with  spaces  as
306       separators.  Different input lines will be processed separately.
307
308       Command-line  arguments  and  options  are  processed  in order.  First
309       topology configuration options should be given.   Then,  for  instance,
310       changing  the  type  of  input  indexes with --li or changing the input
311       topology with -i only affects the processing the following arguments.
312
313       NOTE: It is highly recommended that you read the hwloc(7) overview page
314       before  reading  this  man  page.   Most  of  the concepts described in
315       hwloc(7) directly apply to the hwloc-calc utility.
316

EXAMPLES

318       hwloc-calc's operation is best described through several examples.
319
320       To display the (physical) CPU mask corresponding to the second package:
321
322           $ hwloc-calc package:1
323           0x000000f0
324
325       To display the (physical) CPU mask corresponding to the third  pacakge,
326       excluding its even numbered logical processors:
327
328           $ hwloc-calc package:2 ~PU:even
329           0x00000c00
330
331       To  convert  a  cpu mask to human-readable output, the -H option can be
332       used to emit a space-delimited list of locations:
333
334           $ echo 0x000000f0 | hwloc-calc -H package.core
335           Package:1.Core1 Package:1.Core:1 Package:1.Core:2 Package:1.Core:3
336
337       To use some other character (e.g., a comma) instead of spaces  in  out‐
338       put, use the --sep option:
339
340           $ echo 0x000000f0 | hwloc-calc -H package.core --sep ,
341           Package:1.Core1,Package:1.Core:1,Package:1.Core:2,Package:1.Core:3
342
343       To combine two (physical) CPU masks:
344
345           $ hwloc-calc 0x0000ffff 0xff000000
346           0xff00ffff
347
348       To  display  the  list of logical numbers of processors included in the
349       second package:
350
351           $ hwloc-calc --intersect PU package:1
352           4,5,6,7
353
354       To bind GNU OpenMP threads logically over the whole machine, we need to
355       use physical number output instead:
356
357           $  export  GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --inter‐
358       sect PU all`
359           $ echo $GOMP_CPU_AFFINITY
360           0,4,1,5,2,6,3,7
361
362       To display the list of NUMA nodes, by physical indexes, that  intersect
363       a given (physical) CPU mask:
364
365           $ hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0
366           0,2
367
368       To  find  how  many  cores  are in the second CPU kind (those cores are
369       likely higher-performance and more power-hungry than cores of the first
370       kind):
371
372           $ hwloc-calc --cpukind 1 -N core all
373           4
374
375       To  display the list of NUMA nodes, by physical indexes, whose locality
376       is exactly equal to a Package:
377
378           $ hwloc-calc --local-memory-flags 0 --physical-output pack:1
379           4,7
380
381       To display the best-capacity NUMA node, by physical indexes, whose  lo‐
382       cality is exactly equal to a Package:
383
384           $ hwloc-calc --local-memory-flags 0 --best-memattr capacity --phys‐
385       ical-output pack:1
386           4
387
388       To find the number of NUMA nodes with subtype "HBM":
389
390           $ hwloc-calc -N "numa[hbm]" all
391           4
392
393       To find the number of NUMA nodes in memory tier  1  (DRAM  nodes  on  a
394       server with HBM and DRAM):
395
396           $ hwloc-calc -N "numa[tier=1]" all
397           4
398
399       To find the NUMA node of subtype MCDRAM (on KNL) near a PU:
400
401           $ hwloc-calc -I "numa[mcdram]" pu:157
402           1
403
404       Converting object logical indexes (default) from/to physical/OS indexes
405       may be performed with --intersect combined with either  --physical-out‐
406       put  (logical  to physical conversion) or --physical-input (physical to
407       logical):
408
409           $ hwloc-calc --physical-output PU:2 --intersect PU
410           3
411           $ hwloc-calc --physical-input PU:3 --intersect PU
412           2
413
414       One should add --nodeset when converting indexes of memory  objects  to
415       make  sure  a single NUMA node index is returned on platforms with het‐
416       erogeneous memory:
417
418           $ hwloc-calc --nodeset --physical-output node:2 --intersect node
419           3
420           $ hwloc-calc --nodeset --physical-input node:3 --intersect node
421           2
422
423       To display the set of CPUs near network interface eth0:
424
425           $ hwloc-calc os=eth0
426           0x00005555
427
428       To display the indexes of packages near PCI  device  whose  bus  ID  is
429       0000:01:02.0:
430
431           $ hwloc-calc pci=0000:01:02.0 --intersect Package
432           1
433
434       To display the list of per-package cores that intersect the input:
435
436           $ hwloc-calc 0x00003c00 --hierarchical package.core
437           Package:2.Core:1 Package:3.Core:0
438
439       To  display  the  (physical) CPU mask of the entire topology except the
440       third package:
441
442           $ hwloc-calc all ~package:3
443           0x0000f0ff
444
445       To combine both physical and logical indexes as input:
446
447           $ hwloc-calc PU:2 --physical-input PU:3
448           0x0000000c
449
450       To synthetize a set of cores into largest objects on a 2-node 2-package
451       2-core machine:
452
453           $ hwloc-calc core:0 --largest
454           Core:0
455           $ hwloc-calc core:0-1 --largest
456           Package:0
457           $ hwloc-calc core:4-7 --largest
458           NUMANode:1
459           $ hwloc-calc core:2-6 --largest
460           Package:1 Package:2 Core:6
461           $ hwloc-calc pack:2 --largest
462           Package:2
463           $ hwloc-calc package:2-3 --largest
464           NUMANode:1
465
466       To get the set of first threads of all cores:
467
468           $ hwloc-calc core:all.pu:0
469           $ hwloc-calc --no-smt all
470
471       This  can  also  be very useful in order to make GNU OpenMP use exactly
472       one thread per core, and in logical core order:
473
474           $ export OMP_NUM_THREADS=`hwloc-calc --number-of core all`
475           $ echo $OMP_NUM_THREADS
476           4
477           $ export GOMP_CPU_AFFINITY=`hwloc-calc  --physical-output  --inter‐
478       sect PU --no-smt all`
479           $ echo $GOMP_CPU_AFFINITY
480           0,2,1,3
481
482       To  export  bitmask in a format that is acceptable by the resctrl Linux
483       subsystem (for configuring cache partitioning, etc), apply a sed regexp
484       to the output of hwloc-calc:
485
486           $ hwloc-calc pack:all.core:7-9.pu:0
487           0x00000380,,0x00000380   <this format cannot be given to resctrl>
488           $   hwloc-calc   pack:all.core:7-9.pu:0   |  sed  -e  's/0x//g'  -e
489       's/,,/,0,/g' -e 's/,,/,0,/g'
490           00000380,0,00000380
491           # echo 00000380,0,00000380 > /sys/fs/resctrl/test/cpus
492           # cat /sys/fs/resctrl/test/cpus
493           00000000,00000380,00000000,00000380    <the  modified  bitmask  was
494       corrected parsed by resctrl>
495
496       OS  devices may also be filtered by subtype. In this example, there are
497       8 OS devices in the system, 4 of them are near NUMA node #1, and only 2
498       of these are CoProcessors:
499
500           $ utils/hwloc/hwloc-calc -I osdev all
501           0,1,2,3,4,5,6,7,8
502           $ utils/hwloc/hwloc-calc -I osdev node:1
503           5,6,7,8
504           $ utils/hwloc/hwloc-calc -I coproc node:1
505           7,8
506
507

RETURN VALUE

509       Upon  successful execution, hwloc-calc displays the (physical) CPU mask
510       string, (physical or logical) object list, or (physical or logical) ob‐
511       ject number list.  The return value is 0.
512
513       hwloc-calc  will  return  nonzero  if any kind of error occurs, such as
514       (but not limited to): failure to parse the command line.
515

SEE ALSO

517       hwloc(7), lstopo(1), hwloc-info(1)
518
519
520
521
5222.10.0                           Dec 04, 2023                    HWLOC-CALC(1)
Impressum