1OPENSM(8)                      OpenIB Management                     OPENSM(8)
2
3
4

NAME

6       opensm - InfiniBand subnet manager and administration (SM/SA)
7
8

SYNOPSIS

10       opensm  [--version]]  [-F  |  --config  <file_name>]  [-c(reate-config)
11       <file_name>] [-g(uid) <GUID in hex>] [-l(mc) <LMC>] [-p(riority)  <PRI‐
12       ORITY>]  [--subnet_prefix  <PREFIX in hex>] [--smkey <SM_Key>] [--sm_sl
13       <SL number>] [-r(eassign_lids)] [-R <engine name(s)> | --routing_engine
14       <engine  name(s)>]  [--do_mesh_analysis]  [--lash_start_vl <vl number>]
15       [--nue_max_num_vls <vl number>]  [-A  |  --ucast_cache]  [-z  |  --con‐
16       nect_roots]  [-M <file name> | --lid_matrix_file <file name>] [-U <file
17       name> | --lfts_file <file name>] [-S | --sadb_file <file name>]  [-a  |
18       --root_guid_file  <path  to file>] [-u | --cn_guid_file <path to file>]
19       [-G | --io_guid_file <path to file>] [--port-shifting] [--scatter-ports
20       <random seed>] [-H | --max_reverse_hops <max reverse hops allowed>] [-X
21       | --guid_routing_order_file <path to file>] [-m | --ids_guid_file <path
22       to  file>]  [-o(nce)] [-s(weep) <interval>] [-t(imeout) <milliseconds>]
23       [--retries <number>] [--maxsmps <number>] [--console  [off  |  local  |
24       socket  |  loopback]]  [--console-port  <port>]  [-i  |  --ignore_guids
25       <equalize-ignore-guids-file>] [-w | --hop_weights_file <path to  file>]
26       [-O    |    --port_search_ordering_file   <path   to   file>]   [-O   |
27       --dimn_ports_file <path to file>] (DEPRECATED) [-f <log  file  path>  |
28       --log_file   <log  file  path>  ]  [-L  |  --log_limit  <size  in  MB>]
29       [-e(rase_log_file)]  [-P(config)  <partition  config  file>  ]  [-N   |
30       --no_part_enforce] (DEPRECATED) [-Z | --part_enforce [both | in | out |
31       off]] [-W | --allow_both_pkeys] [-Q |  --qos  [-Y  |  --qos_policy_file
32       <file    name>]]   [--congestion-control]   [--cckey   <key>]   [-y   |
33       --stay_on_fatal] [-B | --daemon] [-J |  --pidfile  <file_name>]  [-I  |
34       --inactive]   [--perfmgr]  [--perfmgr_sweep_time_s  <seconds>]  [--pre‐
35       fix_routes_file  <path>]   [--consolidate_ipv6_snm_req]   [--log_prefix
36       <prefix  text>]  [--torus_config  <path to file>] [-v(erbose)] [-V] [-D
37       <flags>] [-d(ebug) <number>] [-h(elp)] [-?]
38
39

DESCRIPTION

41       opensm is an InfiniBand compliant Subnet  Manager  and  Administration,
42       and runs on top of OpenIB.
43
44       opensm  provides  an implementation of an InfiniBand Subnet Manager and
45       Administration. Such a software entity is required to run for in  order
46       to initialize the InfiniBand hardware (at least one per each InfiniBand
47       subnet).
48
49       opensm also now contains an experimental version of a performance  man‐
50       ager as well.
51
52       opensm defaults were designed to meet the common case usage on clusters
53       with up to a few hundred nodes. Thus, in this default mode, opensm will
54       scan the IB fabric, initialize it, and sweep occasionally for changes.
55
56       opensm  attaches to a specific IB port on the local machine and config‐
57       ures only the fabric connected to it. (If the local machine  has  other
58       IB  ports,  opensm  will  ignore  the  fabrics connected to those other
59       ports). If no port is specified, it will select the first "best" avail‐
60       able port.
61
62       opensm  can present the available ports and prompt for a port number to
63       attach to.
64
65       By default, the run is  logged  to  two  files:  /var/log/messages  and
66       /var/log/opensm.log.   The  first file will register only general major
67       events, whereas the second will include details of reported errors. All
68       errors  reported in this second file should be treated as indicators of
69       IB fabric health issues.  (Note that when a fatal  and  non-recoverable
70       error  occurs,  opensm  will  exit.)  Both log files should include the
71       message "SUBNET UP" if opensm was able to setup the subnet correctly.
72
73

OPTIONS

75       --version
76              Prints OpenSM version and exits.
77
78       -F, --config <config file>
79              The name  of  the  OpenSM  config  file.  When  not  specified
80              /etc/rdma/opensm.conf will be used (if exists).
81
82       -c, --create-config <file name>
83              OpenSM  will  dump  its  configuration to the specified file and
84              exit.  This is a way to generate OpenSM configuration file  tem‐
85              plate.
86
87       -g, --guid <GUID in hex>
88              This  option  specifies  the  local  port  GUID value with which
89              OpenSM should bind.  OpenSM may be bound to 1 port  at  a  time.
90              If  GUID  given  is  0,  OpenSM displays a list of possible port
91              GUIDs and waits for user input.  Without -g, OpenSM tries to use
92              the default port.
93
94       -l, --lmc <LMC value>
95              This  option  specifies  the  subnet's LMC value.  The number of
96              LIDs assigned to each port is 2^LMC.  The LMC value must  be  in
97              the  range  0-7.   LMC  values  > 0 allow multiple paths between
98              ports.  LMC values > 0 should only be used if the subnet  topol‐
99              ogy  actually provides multiple paths between ports, i.e. multi‐
100              ple interconnects between switches.  Without -l, OpenSM defaults
101              to LMC = 0, which allows one path between any two ports.
102
103       -p, --priority <Priority value>
104              This  option  specifies the SM´s PRIORITY.  This will effect the
105              handover cases, where master is chosen  by  priority  and  GUID.
106              Range goes from 0 (default and lowest priority) to 15 (highest).
107
108       --subnet_prefix <PREFIX in hex>
109              This option specifies the subnet prefix to use in on the fabric.
110              The default prefix is 0xfe80000000000000.  OpenMPI in particular
111              requires  separate  fabrics plugged into different ports to have
112              different prefixes or else it won't run.
113
114       --smkey <SM_Key value>
115              This option specifies the SM´s  SM_Key  (64  bits).   This  will
116              effect  SM  authentication.   Note that OpenSM version 3.2.1 and
117              below used the default value '1' in a host  byte  order,  it  is
118              fixed  now but you may need this option to interoperate with old
119              OpenSM running on a little endian machine.
120
121       --sm_sl <SL number>
122              This option sets the SL to use for communication with the SM/SA.
123              Defaults to 0.
124
125       -r, --reassign_lids
126              This  option  causes  OpenSM  to reassign LIDs to all end nodes.
127              Specifying -r on a running subnet may  disrupt  subnet  traffic.
128              Without -r, OpenSM attempts to preserve existing LID assignments
129              resolving multiple use of same LID.
130
131       -R, --routing_engine <Routing engine names>
132              This option chooses routing engine(s) to use instead of Min  Hop
133              algorithm  (default).  Multiple routing engines can be specified
134              separated by commas so that specific ordering of  routing  algo‐
135              rithms  will  be  tried if earlier routing engines fail.  If all
136              configured routing engines fail, OpenSM will always  attempt  to
137              route  with Min Hop unless 'no_fallback' is included in the list
138              of routing engines.   Supported  engines:  minhop,  updn,  dnup,
139              file, ftree, lash, dor, torus-2QoS, nue, dfsssp, sssp.
140
141       --do_mesh_analysis
142              This  option  enables  additional  analysis for the lash routing
143              engine to precondition switch port assignments in regular carte‐
144              sian  meshes which may reduce the number of SLs required to give
145              a deadlock free routing.
146
147       --lash_start_vl <vl number>
148              This option sets the starting VL to use  for  the  lash  routing
149              algorithm.  Defaults to 0.
150
151       --nue_max_num_vls <vl number>
152              This  option  sets  the maximum number of VLs to use for the Nue
153              routing engine.  Every number greater or equal to 0 is  allowed,
154              and  the default is 1 to enforce deadlock-freedom even if QoS is
155              not enabled. If set to 0, then Nue  routing  will  automatically
156              determine and choose maximum supported by the fabric. And if set
157              to   any   interger   >=   1,   then   Nue   uses   min(max_sup‐
158              ported,nue_max_num_vls).     Rule    of    thumb    is:   higher
159              nue_max_num_vls results in better path balancing.
160
161       -A, --ucast_cache
162              This option enables unicast routing cache and  prevents  routing
163              recalculation  (which  is  a heavy task in a large cluster) when
164              there was no topology change detected during the heavy sweep, or
165              when  the  topology change does not require new routing calcula‐
166              tion, e.g. when one or more CAs/RTRs/leaf switches  going  down,
167              or  one  or more of these nodes coming back after being down.  A
168              very common case that is handled by the unicast routing cache is
169              host reboot, which otherwise would cause two full routing recal‐
170              culations: one when the host goes down, and the other  when  the
171              host comes back online.
172
173       -z, --connect_roots
174              This  option  enforces routing engines (up/down and fat-tree) to
175              make connectivity between root switches and in this  way  to  be
176              fully IBA compliant. In many cases this can violate "pure" dead‐
177              lock free algorithm, so use it carefully.
178
179       -M, --lid_matrix_file <file name>
180              This option specifies the name of the lid matrix dump file  from
181              where switch lid matrices (min hops tables) will be loaded.
182
183       -U, --lfts_file <file name>
184              This  option  specifies  the  name  of  the LFTs file from where
185              switch forwarding tables will be loaded when using "file"  rout‐
186              ing engine.
187
188       -S, --sadb_file <file name>
189              This option specifies the name of the SA DB dump file from where
190              SA database will be loaded.
191
192       -a, --root_guid_file <file name>
193              Set the root nodes for the Up/Down or Fat-Tree routing algorithm
194              to the guids provided in the given file (one to a line).
195
196       -u, --cn_guid_file <file name>
197              Set  the  compute  nodes for the Fat-Tree or DFSSSP/SSSP routing
198              algorithms to the port GUIDs provided in the given file (one  to
199              a line).
200
201       -G, --io_guid_file <file name>
202              Set  the I/O nodes for the Fat-Tree or DFSSSP/SSSP routing algo‐
203              rithms to the port GUIDs provided in the given file  (one  to  a
204              line).
205              In the case of Fat-Tree routing:
206              I/O nodes are non-CN nodes allowed to use up to max_reverse_hops
207              switches the wrong way around to improve connectivity.
208              In the case of (DF)SSSP routing:
209              Providing guids of compute and/or I/O  nodes  will  ensure  that
210              paths  towards  those  nodes  are  as much separated as possible
211              within their node category, i.e., I/O traffic will not share the
212              same link if multiple links are available.
213
214       --port-shifting
215              This  option  enables  a  feature called port shifting.  In some
216              fabrics,  particularly  cluster  environments,  routes  commonly
217              align  and  congest  with  other  routes  due to algorithmically
218              unchanging traffic patterns.  This routing option  will  "shift"
219              routing around in an attempt to alleviate this problem.
220
221       --scatter-ports <random seed>
222              This  option  is  used  to  randomize  port selection in routing
223              rather  than  using  a  round-robin  algorithm  (which  is   the
224              default).  Value  supplied with option is used as a random seed.
225              If value is 0, which is the default, the scatter ports option is
226              disabled.
227
228       -H, --max_reverse_hops <max reverse hops allowed>
229              Set the maximum number of reverse hops an I/O node is allowed to
230              make. A reverse hop is the use of a switch the wrong way around.
231
232       -m, --ids_guid_file <file name>
233              Name of the map file with set of the IDs which will be  used  by
234              Up/Down  routing algorithm instead of node GUIDs (format: <guid>
235              <id> per line).
236
237       -X, --guid_routing_order_file <file name>
238              Set the order port guids will  be  routed  for  the  MinHop  and
239              Up/Down  routing  algorithms  to the guids provided in the given
240              file (one to a line).
241
242       -o, --once
243              This option causes OpenSM to configure  the  subnet  once,  then
244              exit.  Ports remain in the ACTIVE state.
245
246       -s, --sweep <interval value>
247              This  option  specifies  the  number  of  seconds between subnet
248              sweeps.  Specifying -s 0 disables sweeping.  Without -s,  OpenSM
249              defaults to a sweep interval of 10 seconds.
250
251       -t, --timeout <value>
252              This option specifies the time in milliseconds used for transac‐
253              tion timeouts.  Timeout values  should  be  >  0.   Without  -t,
254              OpenSM defaults to a timeout value of 200 milliseconds.
255
256       --retries <number>
257              This  option  specifies  the number of retries used for transac‐
258              tions.  Without --retries, OpenSM  defaults  to  3  retries  for
259              transactions.
260
261       --maxsmps <number>
262              This option specifies the number of VL15 SMP MADs allowed on the
263              wire at any one time.  Specifying --maxsmps 0  allows  unlimited
264              outstanding SMPs.  Without --maxsmps, OpenSM defaults to a maxi‐
265              mum of 4 outstanding SMPs.
266
267       --console [off | local | loopback | socket]
268              This option brings up the OpenSM console (default  off).   Note,
269              loopback  and  socket  open  a  socket which can be connected to
270              WITHOUT CREDENTIALS.  Loopback is safer if  access  to  your  SM
271              host  is  controlled.  tcp_wrappers (hosts.[allow|deny]) is used
272              with loopback and socket.  loopback  and  socket  will  only  be
273              available  if  OpenSM  was  built with --enable-console-loopback
274              (default yes) and --enable-console-socket (default  no)  respec‐
275              tively.
276
277       --console-port <port>
278              Specify an alternate telnet port for the socket console (default
279              10000).  Note that this option only appears if OpenSM was  built
280              with --enable-console-socket.
281
282       -i, --ignore_guids <equalize-ignore-guids-file>
283              This option provides the means to define a set of ports (by node
284              guid and port number) that will be  ignored  by  the  link  load
285              equalization algorithm.
286
287       -w, --hop_weights_file <path to file>
288              This  option  provides weighting factors per port representing a
289              hop cost in computing the lid  matrix.   The  file  consists  of
290              lines  containing  a switch port GUID (specified as a 64 bit hex
291              number, with leading 0x), output port number, and weighting fac‐
292              tor.   Any  port  not listed in the file defaults to a weighting
293              factor of 1.  Lines  starting  with  #  are  comments.   Weights
294              affect  only the output route from the port, so many useful con‐
295              figurations will require weights to be specified in pairs.
296
297       -O, --port_search_ordering_file <path to file>
298              This option tweaks the routing. It suitable for  two  cases:  1.
299              While  using DOR routing algorithm.  This option provides a map‐
300              ping between hypercube dimensions and  ports  on  a  per  switch
301              basis  for  the  DOR routing engine.  The file consists of lines
302              containing a switch node GUID (specified as a 64 bit hex number,
303              with  leading  0x)  followed by a list of non-zero port numbers,
304              separated by spaces, one switch per line.   The  order  for  the
305              port  numbers is in one to one correspondence to the dimensions.
306              Ports not listed on a line are assigned to the remaining  dimen‐
307              sions,  in  port  order.   Anything  after a # is a comment.  2.
308              While using general routing algorithm.  This option provides the
309              order  of  the ports that would be chosen for routing, from each
310              switch rather than searching for an appropriate port from port 1
311              to  N.  The file consists of lines containing a switch node GUID
312              (specified as a 64 bit hex number, with leading 0x) followed  by
313              a list of non-zero port numbers, separated by spaces, one switch
314              per line.  In case of DOR, the order for the port numbers is  in
315              one  to  one correspondence to the dimensions.  Ports not listed
316              on a line are assigned to  the  remaining  dimensions,  in  port
317              order.  Anything after a # is a comment.
318
319       -O, --dimn_ports_file <path to file> (DEPRECATED)
320              This  is  a  deprecated  flag.  Please  use --port_search_order‐
321              ing_file instead.  This option provides a mapping between hyper‐
322              cube  dimensions  and  ports  on  a per switch basis for the DOR
323              routing engine.  The file consists of lines containing a  switch
324              node  GUID  (specified  as a 64 bit hex number, with leading 0x)
325              followed by a list of non-zero port numbers, separated  by  spa‐
326              ces,  one switch per line.  The order for the port numbers is in
327              one to one correspondence to the dimensions.  Ports  not  listed
328              on  a  line  are  assigned  to the remaining dimensions, in port
329              order.  Anything after a # is a comment.
330
331       -x, --honor_guid2lid
332              This option forces OpenSM to honor the guid2lid  file,  when  it
333              comes   out   of  Standby  state,  if  such  file  exists  under
334              OSM_CACHE_DIR, and is valid.  By default, this is FALSE.
335
336       -f, --log_file <file name>
337              This option defines the log to be the given file.   By  default,
338              the log goes to /var/log/opensm.log.  For the log to go to stan‐
339              dard output use -f stdout.
340
341       -L, --log_limit <size in MB>
342              This option defines maximal log file size in MB. When  specified
343              the log file will be truncated upon reaching this limit.
344
345       -e, --erase_log_file
346              This  option  will  cause deletion of the log file (if it previ‐
347              ously exists). By default, the log file is accumulative.
348
349       -P, --Pconfig <partition config file>
350              This option defines the optional partition  configuration  file.
351              The default name is /etc/rdma/partitions.conf.
352
353       --prefix_routes_file <file name>
354              Prefix routes control how the SA responds to path record queries
355              for off-subnet DGIDs.  By default, the SA  fails  such  queries.
356              The PREFIX ROUTES section below describes the format of the con‐
357              figuration      file.       The      default       path       is
358              /etc/rdma/prefix-routes.conf.
359
360       -Q, --qos
361              This option enables QoS setup. It is disabled by default.
362
363       -Y, --qos_policy_file <file name>
364              This  option  defines  the optional QoS policy file. The default
365              name    is    /etc/rdma/qos-policy.conf.     See     QoS_manage‐
366              ment_in_OpenSM.txt in opensm doc for more information on config‐
367              uring QoS policy via this file.
368
369       --congestion_control
370              (EXPERIMENTAL) This option enables congestion control configura‐
371              tion.   It  is disabled by default.  See config file for conges‐
372              tion control configuration options.  --cc_key <key>  (EXPERIMEN‐
373              TAL)  This  option  configures the CCkey to use when configuring
374              congestion control.  Note that this option does not configure  a
375              new CCkey into switches and CAs.  Defaults to 0.
376
377       -N, --no_part_enforce (DEPRECATED)
378              This  is  a  deprecated flag. Please use --part_enforce instead.
379              This option disables partition enforcement  on  switch  external
380              ports.
381
382       -Z, --part_enforce [both | in | out | off]
383              This  option  indicates  the  partition  enforcement  type  (for
384              switches).  Enforcement type can be inbound only (in),  outbound
385              only (out), both or disabled (off). Default is both.
386
387       -W, --allow_both_pkeys
388              This  option  indicates whether both full and limited membership
389              on the same  partition  can  be  configured  in  the  PKeyTable.
390              Default is not to allow both pkeys.
391
392       -y, --stay_on_fatal
393              This  option  will  cause SM not to exit on fatal initialization
394              issues: if SM discovers duplicated guids or a 12x link with lane
395              reversal  badly  configured.   By  default,  the SM will exit on
396              these errors.
397
398       -B, --daemon
399              Run in daemon mode - OpenSM will run in the background.
400
401       -J, --pidfile <file_name>
402              Makes the SM write its  own  PID  to  the  specified  file  when
403              started in daemon mode.
404
405       -I, --inactive
406              Start SM in inactive rather than init SM state.  This option can
407              be used in conjunction with the perfmgr so as to  run  a  stand‐
408              alone  performance  manager without SM/SA.  However, this is NOT
409              currently implemented in the performance manager.
410
411       --perfmgr
412              Enable the perfmgr.  Only takes effect if  --enable-perfmgr  was
413              specified  at configure time.  See performance-manager-HOWTO.txt
414              in opensm doc for more information on running perfmgr.
415
416       --perfmgr_sweep_time_s <seconds>
417              Specify the sweep time for the performance  manager  in  seconds
418              (default is 180 seconds).  Only takes effect if --enable-perfmgr
419              was specified at configure time.
420
421       --consolidate_ipv6_snm_req
422              Use shared MLID for IPv6 Solicited  Node  Multicast  groups  per
423              MGID scope and P_Key.
424
425       --log_prefix <prefix text>
426              This  option  specifies  the  prefix to the syslog messages from
427              OpenSM.  A suitable prefix can be used to identify the IB subnet
428              in syslog messages when two or more instances of OpenSM run in a
429              single node to manage multiple fabrics. For example, in a  dual-
430              fabric  (or dual-rail) IB cluster, the prefix for the first fab‐
431              ric could be "mpi" and the other fabric could be "storage".
432
433       --torus_config <path to torus-2QoS config file>
434              This option defines the file name for  the  extra  configuration
435              information  needed  for  the  torus-2QoS  routing engine.   The
436              default name is /etc/rdma/torus-2QoS.conf
437
438       -v, --verbose
439              This option increases the log verbosity level.   The  -v  option
440              may  be  specified  multiple  times to further increase the ver‐
441              bosity level.  See the -D option for more information about  log
442              verbosity.
443
444       -V     This  option  sets  the  maximum  verbosity level and forces log
445              flushing.  The -V option is equivalent to ´-D 0xFF -d  2´.   See
446              the -D option for more information about log verbosity.
447
448       -D <value>
449              This  option  sets  the log verbosity level.  A flags field must
450              follow the -D option.  A bit set/clear in the flags enables/dis‐
451              ables a specific log level as follows:
452
453               BIT    LOG LEVEL ENABLED
454               ----   -----------------
455               0x01 - ERROR (error messages)
456               0x02 - INFO (basic messages, low volume)
457               0x04 - VERBOSE (interesting stuff, moderate volume)
458               0x08 - DEBUG (diagnostic, high volume)
459               0x10 - FUNCS (function entry/exit, very high volume)
460               0x20 - FRAMES (dumps all SMP and GMP frames)
461               0x40 - ROUTING (dump FDB routing information)
462               0x80 - SYS (syslog at LOG_INFO level in addition to OpenSM log‐
463              ging)
464
465              Without -D, OpenSM defaults to ERROR + INFO  (0x3).   Specifying
466              -D 0 disables all messages.  Specifying -D 0xFF enables all mes‐
467              sages (see -V).  High verbosity levels  may  require  increasing
468              the transaction timeout with the -t option.
469
470       -d, --debug <value>
471              This  option  specifies  a  debug option.  These options are not
472              normally needed.  The number  following  -d  selects  the  debug
473              option to enable as follows:
474
475               OPT   Description
476               ---    -----------------
477               -d0  - Ignore other SM nodes
478               -d1  - Force single threaded dispatching
479               -d2  - Force log flushing after each log message
480               -d3  - Disable multicast support
481
482       -h, --help
483              Display this usage info then exit.
484
485       -?     Display this usage info then exit.
486
487

ENVIRONMENT VARIABLES

489       The following environment variables control opensm behavior:
490
491       OSM_TMP_DIR  - controls the directory in which the temporary files gen‐
492       erated by opensm  are  created.  These  files  are:  opensm-subnet.lst,
493       opensm.fdbs, and opensm.mcfdbs. By default, this directory is /var/log.
494
495       OSM_CACHE_DIR - opensm stores certain data to the disk such that subse‐
496       quent  runs   are   consistent.   The   default   directory   used   is
497       /var/cache/opensm.  The following files are included in it:
498
499        guid2lid  - stores the LID range assigned to each GUID
500        guid2mkey - stores the MKey previously assiged to each GUID
501        neighbors - stores a map of the GUIDs at either end of each link
502                    in the fabric
503
504

NOTES

506       When  opensm receives a HUP signal, it starts a new heavy sweep as if a
507       trap was received or a topology change was found.
508
509       Also, SIGUSR1 can be used to trigger a  reopen  of  /var/log/opensm.log
510       for logrotate purposes.
511
512

PARTITION CONFIGURATION

514       The   default   name   of   OpenSM  partitions  configuration  file  is
515       /etc/rdma/partitions.conf. The default may  be  changed  by  using  the
516       --Pconfig (-P) option with OpenSM.
517
518       The  default  partition  will be created by OpenSM unconditionally even
519       when partition configuration file does not exist or cannot be accessed.
520
521       The default partition has P_Key value 0x7fff. OpenSM´s port will always
522       have  full  membership  in  default partition. All other end ports will
523       have full membership if the partition configuration file is  not  found
524       or cannot be accessed, or limited membership if the file exists and can
525       be accessed but there is no rule for the Default partition.
526
527       Effectively, this amounts to the same as if one of the following  rules
528       below appear in the partition configuration file.
529
530       In the case of no rule for the Default partition:
531
532       Default=0x7fff : ALL=limited, SELF=full ;
533
534       In  the  case  of  no  partition  configuration  file or file cannot be
535       accessed:
536
537       Default=0x7fff : ALL=full ;
538
539
540       File Format
541
542       Comments:
543
544       Line content followed after ´#´ character is  comment  and  ignored  by
545       parser.
546
547       General file format:
548
549       <Partition Definition>:[<newline>]<Partition Properties>;
550
551            Partition Definition:
552              [PartitionName][=PKey][,indx0][,ipoib_bc_flags][,defmem‐
553       ber=full|limited]
554
555               PartitionName  - string, will be used with logging. When
556                                omitted, empty string will be used.
557               PKey           - P_Key value for this partition. Only low 15
558                                bits will be used. When omitted will be
559                                autogenerated.
560               indx0          - indicates that this pkey should be inserted in
561                                block 0 index 0.
562               ipoib_bc_flags - used to indicate/specify IPoIB capability of
563                                this partition.
564
565               defmember=full|limited|both - specifies default membership for
566                                port guid list. Default is limited.
567
568            ipoib_bc_flags:
569               ipoib_flag|[mgroup_flag]*
570
571               ipoib_flag:
572                   ipoib  - indicates that this partition may be used for
573                            IPoIB, as a result the IPoIB broadcast group will
574                            be created with the mgroup_flag flags given,
575                            if any.
576
577            Partition Properties:
578              [<Port list>|<MCast Group>]* | <Port list>
579
580            Port list:
581               <Port Specifier>[,<Port Specifier>]
582
583            Port Specifier:
584               <PortGUID>[=[full|limited|both]]
585
586               PortGUID         - GUID of partition member EndPort.
587                                  Hexadecimal numbers should start from
588                                  0x, decimal numbers are accepted too.
589               full, limited,   - indicates full and/or limited membership for
590               both               this port.  When omitted (or unrecognized)
591                                  limited membership is assumed.  Both
592                                  indicates both full and limited membership
593                                  for this port.
594
595            MCast Group:
596               mgid=gid[,mgroup_flag]*<newline>
597
598                                - gid specified is verified to be a Multicast
599                                  address.  IP groups are verified to match
600                                  the rate and mtu of the broadcast group.
601                                  The P_Key bits of the mgid for IP groups are
602                                  verified to either match the P_Key specified
603                                  in by "Partition Definition" or if they are
604                                  0x0000 the P_Key will be copied into those
605                                  bits.
606
607            mgroup_flag:
608               rate=<val>  - specifies rate for this MC group
609                             (default is 3 (10GBps))
610               mtu=<val>   - specifies MTU for this MC group
611                             (default is 4 (2048))
612               sl=<val>    - specifies SL for this MC group
613                             (default is 0)
614               scope=<val> - specifies scope for this MC group
615                             (default is 2 (link local)).  Multiple scope
616                             settings are permitted for a partition.
617                             NOTE: This overwrites the scope nibble of the
618                                   specified mgid.  Furthermore specifying
619                                   multiple scope settings will result in
620                                   multiple MC groups being created.
621               Q_Key=<val>     - specifies the Q_Key for this MC group
622                                 (default: 0x0b1b for IP groups, 0 for other
623                                  groups)
624                                 WARNING: changing this for the broadcast
625                                          group may break IPoIB on client
626                                          nodes!!
627               TClass=<val>    - specifies tclass for this MC group
628                                 (default is 0)
629               FlowLabel=<val> - specifies FlowLabel for this MC group
630                                 (default is  0)       NOTE:  All  mgroup_flag
631       flags MUST be separated by comma (,).
632
633       Note that values for rate, mtu, and scope, for both partitions and mul‐
634       ticast groups, should be specified as defined in the IBTA specification
635       (for example, mtu=4 for 2048).
636
637       There are several useful keywords for PortGUID definition:
638
639        - 'ALL' means all end ports in this subnet.
640        - 'ALL_CAS' means all Channel Adapter end ports in this subnet.
641        - 'ALL_SWITCHES' means all Switch end ports in this subnet.
642        - 'ALL_ROUTERS' means all Router end ports in this subnet.
643        - 'SELF' means subnet manager's port.
644
645       Empty list means no ports in this partition.
646
647       Notes:
648
649       White space is permitted between delimiters ('=', ',',':',';').
650
651       PartitionName  does not need to be unique, PKey does need to be unique.
652       If PKey is repeated then those partition configurations will be  merged
653       and first PartitionName will be used (see also next note).
654
655       It  is possible to split partition configuration in more than one defi‐
656       nition, but then PKey should be explicitly specified (otherwise differ‐
657       ent PKey values will be generated for those definitions).
658
659       Examples:
660
661        Default=0x7fff : ALL, SELF=full ;
662        Default=0x7fff : ALL, ALL_SWITCHES=full, SELF=full ;
663
664        NewPartition  , ipoib : 0x123456=full, 0x3456789034=limi, 0x2134af2306
665       ;
666
667        YetAnotherOne = 0x300 : SELF=full ;
668        YetAnotherOne = 0x300 : ALL=limited ;
669
670        ShareIO = 0x80 , defmember=full : 0x123451, 0x123452;
671        # 0x123453, 0x123454 will be limited
672        ShareIO = 0x80 : 0x123453, 0x123454, 0x123455=full;
673        # 0x123456, 0x123457 will be limited
674        ShareIO   =   0x80   :   defmember=limited   :   0x123456,   0x123457,
675       0x123458=full;
676        ShareIO = 0x80 , defmember=full : 0x123459, 0x12345a;
677        ShareIO   =   0x80  ,  defmember=full  :  0x12345b,  0x12345c=limited,
678       0x12345d;
679
680        # multicast groups added to default
681        Default=0x7fff,ipoib:
682               mgid=ff12:401b::0707,sl=1 # random IPv4 group
683               mgid=ff12:601b::16    # MLDv2-capable routers
684               mgid=ff12:401b::16    # IGMP
685               mgid=ff12:601b::2     # All routers
686               mgid=ff12::1,sl=1,Q_Key=0xDEADBEEF,rate=3,mtu=2 # random group
687               ALL=full;
688
689
690       Note:
691
692       The following rule is equivalent to how OpenSM used to run prior to the
693       partition manager:
694
695        Default=0x7fff,ipoib:ALL=full;
696
697

QOS CONFIGURATION

699       There are a set of QoS related low-level configuration parameters.  All
700       these parameter names are prefixed by "qos_" string.  Here  is  a  full
701       list of these parameters:
702
703        qos_max_vls    - The maximum number of VLs that will be on the subnet
704        qos_high_limit - The limit of High Priority component of VL
705                         Arbitration table (IBA 7.6.9)
706        qos_vlarb_low  - Low priority VL Arbitration table (IBA 7.6.9)
707                         template
708        qos_vlarb_high - High priority VL Arbitration table (IBA 7.6.9)
709                         template
710                         Both VL arbitration templates are pairs of
711                         VL and weight
712        qos_sl2vl      - SL2VL Mapping table (IBA 7.6.6) template. It is
713                         a list of VLs corresponding to SLs 0-15 (Note
714                         that VL15 used here means drop this SL)
715
716       Typical default values (hard-coded in OpenSM initialization) are:
717
718        qos_max_vls 15
719        qos_high_limit 0
720        qos_vlarb_low
721       0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4
722        qos_vlarb_high
723       0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0
724        qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7
725
726       The  syntax is compatible with rest of OpenSM configuration options and
727       values may be stored in OpenSM config file (cached options file).
728
729       In addition to the above, we  may  define  separate  QoS  configuration
730       parameters sets for various target types. As targets, we currently sup‐
731       port CAs, routers, switch external ports, and switch's enhanced port 0.
732       The  names of such specialized parameters are prefixed by "qos_<type>_"
733       string. Here is a full list of the currently supported sets:
734
735        qos_ca_  - QoS configuration parameters set for CAs.
736        qos_rtr_ - parameters set for routers.
737        qos_sw0_ - parameters set for switches' port 0.
738        qos_swe_ - parameters set for switches' external ports.
739
740       Examples:
741        qos_sw0_max_vls=2
742        qos_ca_sl2vl=0,1,2,3,5,5,5,12,12,0,
743        qos_swe_high_limit=0
744
745

PREFIX ROUTES

747       Prefix routes control how the SA responds to path  record  queries  for
748       off-subnet  DGIDs.   By  default, the SA fails such queries.  Note that
749       IBA does not specify how the SA should obtain  off-subnet  path  record
750       information.   The  prefix  routes configuration is meant as a stop-gap
751       until the specification is completed.
752
753       Each line in the configuration file is a 64-bit prefix  followed  by  a
754       64-bit  GUID,  separated by white space.  The GUID specifies the router
755       port on the local subnet that will handle the prefix.  Blank lines  are
756       ignored,  as is anything between a # character and the end of the line.
757       The prefix and GUID are both  in  hex,  the  leading  0x  is  optional.
758       Either,  or  both, can be wild-carded by specifying an asterisk instead
759       of an explicit prefix or GUID.
760
761       When responding to a path record query for an off-subnet  DGID,  opensm
762       searches  for the first prefix match in the configuration file.  There‐
763       fore, the order of the lines in the configuration file is important:  a
764       wild-carded  prefix  at the beginning of the configuration file renders
765       all subsequent lines useless.  If there is no match, then opensm  fails
766       the  query.   It is legal to repeat prefixes in the configuration file,
767       opensm will return the path to the first available matching router.   A
768       configuration  file  with  a single line where both prefix and GUID are
769       wild-carded means that a path record query  specifying  any  off-subnet
770       DGID should return a path to the first available router.  This configu‐
771       ration yields the same behavior formerly achieved by  compiling  opensm
772       with -DROUTER_EXP which has been obsoleted.
773
774

MKEY CONFIGURATION

776       OpenSM  supports  configuring  a  single  management key (MKey) for use
777       across the subnet.
778
779       The following configuration options are available:
780
781        m_key                  - the 64-bit MKey to be used on the subnet
782                                 (IBA 14.2.4)
783        m_key_protection_level - the numeric value of the MKey ProtectBits
784                                 (IBA 14.2.4.1)
785        m_key_lease_period     - the number of seconds a CA will wait for a
786                                 response from the SM before resetting the
787                                 protection level to 0 (IBA 14.2.4.2).
788
789       OpenSM will configure all ports  with  the  MKey  specified  by  m_key,
790       defaulting to a value of 0. A m_key value of 0 disables MKey protection
791       on the subnet.  Switches and HCAs with a non-zero MKey will not  accept
792       requests  to change their configuration unless the request includes the
793       proper MKey.
794
795       MKey Protection Levels
796
797       MKey protection levels modify how switches  and  CAs  respond  to  SMPs
798       lacking a valid MKey.  OpenSM will configure each port's ProtectBits to
799       support the level defined by the m_key_protection_level parameter.   If
800       no  parameter  is specified, OpenSM defaults to operating at protection
801       level 0.
802
803       There are currently 4 protection levels defined by the IBA:
804
805        0 - Queries return valid data, including MKey.  Configuration changes
806            are not allowed unless the request contains a valid MKey.
807        1 - Like level 0, but the MKey is set to 0 (0x00000000) in queries,
808            unless the request contains a valid MKey.
809        2 - Neither queries nor configuration changes are allowed, unless the
810            request contains a valid MKey.
811        3 - Identical to 2.  Maintained for backwards compatibility.
812
813       MKey Lease Period
814
815       InfiniBand supports a MKey lease timeout, which is  intended  to  allow
816       administrators or a new SM to recover/reset lost MKeys on a fabric.
817
818       If  MKeys  are  enabled  on  the  subnet  and a switch or CA receives a
819       request that requires a valid MKey but does not contain one,  it  warns
820       the  SM  by  sending  a  trap (Bad M_Key, Trap 256).  If the MKey lease
821       period is non-zero, it also starts a countdown timer for the time spec‐
822       ified  by the lease period.  If a SM (or other agent) responds with the
823       correct MKey, the timer is stopped and reset.  Should the  timer  reach
824       zero,  the  switch  or  CA  will  reset its MKey protection level to 0,
825       exposing the MKey and allowing recovery.
826
827       OpenSM will initialize all ports to use a mkey lease period of the num‐
828       ber  of  seconds specified in the config file.  If no mkey_lease_period
829       is specified, a default of 0 will be used.
830
831       OpenSM normally quickly responds to all Bad_M_Key traps, resetting  the
832       lease  timers.   Additionally,  OpenSM's subnet sweeps will also cancel
833       any running  timers.   For  maximum  protection  against  accidentally-
834       exposed  MKeys,  the  MKey  lease time should be a few multiples of the
835       subnet sweep time.  If OpenSM detects at startup that your sweep inter‐
836       val  is  greater  than  your MKey lease period, it will reset the lease
837       period to be greater than the sweep interval.  Similarly,  if  sweeping
838       is  disabled  at  startup,  it will be re-enabled with an interval less
839       than the Mkey lease period.
840
841       If OpenSM is required to recover a  subnet  for  which  it  is  missing
842       mkeys,  it  must  do so one switch level at a time.  As such, the total
843       time to recover the subnet may be as long as the mkey lease period mul‐
844       tiplied  by  the maximum number of hops between the SM and an endpoint,
845       plus one.
846
847       MKey Effects on Diagnostic Utilities
848
849       Setting a MKey may have a detrimental effect on diagnostic software run
850       on  the  subnet,  unless  your  diagnostic software is able to retrieve
851       MKeys from the SA or can be explicitly configured with the proper MKey.
852       This  is particularly true at protection level 2, where CAs will ignore
853       queries for management information that do not contain the proper MKey.
854
855

ROUTING

857       OpenSM now offers ten routing engines:
858
859       1.  Min Hop Algorithm - based on the minimum hops to  each  node  where
860       the path length is optimized.
861
862       2.   UPDN Unicast routing algorithm - also based on the minimum hops to
863       each node, but it is  constrained  to  ranking  rules.  This  algorithm
864       should be chosen if the subnet is not a pure Fat Tree, and deadlock may
865       occur due to a loop in the subnet.
866
867       3. DNUP Unicast routing algorithm - similar to UPDN but allows  routing
868       in  fabrics  which have some CA nodes attached closer to the roots than
869       some switch nodes.
870
871       4.  Fat Tree Unicast routing algorithm - this algorithm optimizes rout‐
872       ing  for  congestion-free  "shift" communication pattern.  It should be
873       chosen if a subnet is a symmetrical or almost symmetrical  fat-tree  of
874       various  types,  not  just  K-ary-N-Trees:  non-constant  K,  not fully
875       staffed, any Constant Bisectional Bandwidth (CBB)  ratio.   Similar  to
876       UPDN, Fat Tree routing is constrained to ranking rules.
877
878       5. LASH unicast routing algorithm - uses InfiniBand virtual layers (SL)
879       to provide deadlock-free shortest-path routing while also  distributing
880       the  paths  between layers. LASH is an alternative deadlock-free topol‐
881       ogy-agnostic routing algorithm to the non-minimal UPDN algorithm avoid‐
882       ing the use of a potentially congested root node.
883
884       6.  DOR Unicast routing algorithm - based on the Min Hop algorithm, but
885       avoids port equalization except for redundant links  between  the  same
886       two  switches.   This provides deadlock free routes for hypercubes when
887       the fabric is cabled as a hypercube and for meshes  when  cabled  as  a
888       mesh (see details below).
889
890       7. Torus-2QoS unicast routing algorithm - a DOR-based routing algorithm
891       specialized for 2D/3D torus topologies.  Torus-2QoS provides  deadlock-
892       free  routing while supporting two quality of service (QoS) levels.  In
893       addition it is able to route around multiple failed fabric links  or  a
894       single  failed fabric switch without introducing deadlocks, and without
895       changing path SL values granted before the failure.
896
897       8. DFSSSP unicast routing algorithm -  a  deadlock-free  single-source-
898       shortest-path routing, which uses the SSSP algorithm (see algorithm 9.)
899       as the base to optimize link utilization and  uses  InfiniBand  virtual
900       lanes (SL) to provide deadlock-freedom.
901
902       9. SSSP unicast routing algorithm - a single-source-shortest-path rout‐
903       ing algorithm, which globally balances the number of routes per link to
904       optimize  link  utilization. This routing algorithm has no restrictions
905       in terms of the underlying topology.
906
907       10. Nue unicast routing algorithm - a 100%-applicable and deadlock-free
908       routing  which can be used for any arbitrary or faulty network topology
909       and any number of virtual lanes (this includes the absense  of  VLs  as
910       well). Paths are globally balanced w.r.t the number of routes per link,
911       and are kept as short  as  possible  while  enforcing  deadlock-freedom
912       within the VL constraint.
913
914       OpenSM  also supports a file method which can load routes from a table.
915       See ´Modular Routing Engine´ for more information on this.
916
917       The basic routing algorithm is comprised of two stages:
918
919       1. MinHop matrix calculation
920          How many hops are required to get from each port to each LID ?
921          The algorithm to fill these tables is different if you run  standard
922       (min hop) or Up/Down.
923          For  standard routing, a "relaxation" algorithm is used to propagate
924       min hop from every destination LID through neighbor switches
925          For Up/Down routing, a BFS from every target is used. The BFS tracks
926       link  direction (up or down) and avoid steps that will perform up after
927       a down step was used.
928
929       2. Once MinHop matrices exist, each switch is visited and for each tar‐
930       get  LID  a  decision  is made as to what port should be used to get to
931       that LID.
932          This step is common to standard and Up/Down routing. Each port has a
933       counter counting the number of target LIDs going through it.
934          When there are multiple alternative ports with same MinHop to a LID,
935       the one with less previously assigned LIDs is selected.
936          If LMC > 0, more  checks  are  added:  Within  each  group  of  LIDs
937       assigned to same target port,
938          a. use only ports which have same MinHop
939          b.  first prefer the ones that go to different systemImageGuid (then
940       the previous LID of the same LMC group)
941          c. if none - prefer those which go through another NodeGuid
942          d. fall back to the number of paths method (if all go to same node).
943
944       Effect of Topology Changes
945
946       OpenSM will preserve existing routing in any case  where  there  is  no
947       change in the fabric switches unless the -r (--reassign_lids) option is
948       specified.
949
950       -r
951       --reassign_lids
952                 This option causes OpenSM to reassign LIDs to all
953                 end nodes. Specifying -r on a running subnet
954                 may disrupt subnet traffic.
955                 Without -r, OpenSM attempts to preserve existing
956                 LID assignments resolving multiple use of same LID.
957
958       If a link is added or removed, OpenSM does not recalculate  the  routes
959       that  do  not  have  to change. A route has to change if the port is no
960       longer UP or no longer the MinHop. When routing changes are  performed,
961       the same algorithm for balancing the routes is invoked.
962
963       In  the  case of using the file based routing, any topology changes are
964       currently ignored The 'file' routing engine just loads  the  LFTs  from
965       the  file specified, with no reaction to real topology. Obviously, this
966       will not be able to recheck LIDs (by GUID) for disconnected nodes,  and
967       LFTs  for  non-existent  switches  will  be  skipped.  Multicast is not
968       affected by 'file' routing engine (this uses min hop tables).
969
970
971       Min Hop Algorithm
972
973       The Min Hop algorithm is invoked by default if no routing algorithm  is
974       specified.  It can also be invoked by specifying '-R minhop'.
975
976       The  Min  Hop algorithm is divided into two stages: computation of min-
977       hop tables on every switch and LFT output port  assignment.  Link  sub‐
978       scription  is also equalized with the ability to override based on port
979       GUID. The latter is supplied by:
980
981       -i <equalize-ignore-guids-file>
982       --ignore_guids <equalize-ignore-guids-file>
983                 This option provides the means to define a set of ports
984                 (by guid) that will be ignored by the link load
985                 equalization algorithm. Note that only endports (CA,
986                 switch port 0, and router ports) and not switch external
987                 ports are supported.
988
989       LMC awareness routes based on (remote) system or switch basis.
990
991
992       Purpose of UPDN Algorithm
993
994       The UPDN algorithm is designed to prevent deadlocks from  occurring  in
995       loops  of  the subnet. A loop-deadlock is a situation in which it is no
996       longer possible to send data between any two  hosts  connected  through
997       the  loop.  As  such,  the UPDN routing algorithm should be used if the
998       subnet is not a pure Fat Tree, and one of its loops  may  experience  a
999       deadlock (due, for example, to high pressure).
1000
1001       The UPDN algorithm is based on the following main stages:
1002
1003       1.  Auto-detect root nodes - based on the CA hop length from any switch
1004       in the subnet, a statistical histogram is built for  each  switch  (hop
1005       num  vs  number  of  occurrences). If the histogram reflects a specific
1006       column (higher than others) for a certain node, then it is marked as  a
1007       root node. Since the algorithm is statistical, it may not find any root
1008       nodes. The list of the root nodes found by this  auto-detect  stage  is
1009       used by the ranking process stage.
1010
1011           Note 1: The user can override the node list manually.
1012           Note 2: If this stage cannot find any root nodes, and the user did
1013                   not specify a guid list file, OpenSM defaults back to the
1014                   Min Hop routing algorithm.
1015
1016       2.   Ranking  process  -  All  root switch nodes (found in stage 1) are
1017       assigned a rank of 0. Using the BFS algorithm, the rest of  the  switch
1018       nodes  in the subnet are ranked incrementally. This ranking aids in the
1019       process of enforcing rules that ensure loop-free paths.
1020
1021       3.  Min Hop Table setting - after ranking is done, a BFS  algorithm  is
1022       run  from  each  (CA  or  switch)  node  in  the subnet. During the BFS
1023       process, the FDB table of each switch node traversed by BFS is updated,
1024       in  reference to the starting node, based on the ranking rules and guid
1025       values.
1026
1027       At the end of the process, the  updated  FDB  tables  ensure  loop-free
1028       paths through the subnet.
1029
1030       Note:  Up/Down routing does not allow LID routing communication between
1031       switches that are located inside spine "switch systems".  The reason is
1032       that  there  is  no way to allow a LID route between them that does not
1033       break the Up/Down rule.  One ramification of this is  that  you  cannot
1034       run SM on switches other than the leaf switches of the fabric.
1035
1036
1037       UPDN Algorithm Usage
1038
1039       Activation through OpenSM
1040
1041       Use  '-R  updn' option (instead of old '-u') to activate the UPDN algo‐
1042       rithm.  Use '-a <root_guid_file>' for adding an  UPDN  guid  file  that
1043       contains  the  root nodes for ranking.  If the `-a' option is not used,
1044       OpenSM uses its auto-detect root nodes algorithm.
1045
1046       Notes on the guid list file:
1047
1048       1.   A valid guid file specifies one guid in each line. Lines  with  an
1049       invalid format will be discarded.
1050       2.   The user should specify the root switch guids. However, it is also
1051       possible to specify CA guids; OpenSM will use the guid  of  the  switch
1052       (if it exists) that connects the CA to the subnet as a root node.
1053
1054       Purpose of DNUP Algorithm
1055
1056       The DNUP algorithm is designed to serve a similar purpose to UPDN. How‐
1057       ever it is intended to work in network topologies which are unsuited to
1058       UPDN  due to nodes being connected closer to the roots than some of the
1059       switches.  An example would  be  a  fabric  which  contains  nodes  and
1060       uplinks connected to the same switch. The operation of DNUP is the same
1061       as UPDN with the exception of the ranking process.  In DNUP all  switch
1062       nodes  are  ranked  based  solely  on their distance from CA Nodes, all
1063       switch nodes directly connected to at least one CA are assigned a value
1064       of  1  all other switch nodes are assigned a value of one more than the
1065       minimum rank of all neighbor switch nodes.
1066
1067       Fat-tree Routing Algorithm
1068
1069       The fat-tree algorithm optimizes routing for "shift" communication pat‐
1070       tern.   It should be chosen if a subnet is a symmetrical or almost sym‐
1071       metrical fat-tree of various types.   It  supports  not  just  K-ary-N-
1072       Trees,  by handling for non-constant K, cases where not all leafs (CAs)
1073       are present, any CBB ratio.  As in UPDN, fat-tree also prevents credit-
1074       loop-deadlocks.
1075
1076       If  the  root  guid  file  is  not provided ('-a' or '--root_guid_file'
1077       options), the topology has to be pure fat-tree that complies  with  the
1078       following rules:
1079         - Tree rank should be between two and eight (inclusively)
1080         - Switches of the same rank should have the same number
1081           of UP-going port groups*, unless they are root switches,
1082           in which case the shouldn't have UP-going ports at all.
1083         - Switches of the same rank should have the same number
1084           of DOWN-going port groups, unless they are leaf switches.
1085         - Switches of the same rank should have the same number
1086           of ports in each UP-going port group.
1087         - Switches of the same rank should have the same number
1088           of ports in each DOWN-going port group.
1089         - All the CAs have to be at the same tree level (rank).
1090
1091       If the root guid file is provided, the topology doesn't have to be pure
1092       fat-tree, and it should only comply with the following rules:
1093         - Tree rank should be between two and eight (inclusively)
1094         - All the Compute Nodes** have to be at the same tree level (rank).
1095           Note that non-compute node CAs are allowed here to be at different
1096           tree ranks.
1097
1098       * ports that are connected to the same remote switch are referenced  as
1099       ´port group´.
1100
1101       **   list   of  compute  nodes  (CNs)  can  be  specified  by  ´-u´  or
1102       ´--cn_guid_file´ OpenSM options.
1103
1104       Topologies that do not comply cause a  fallback  to  min  hop  routing.
1105       Note that this can also occur on link failures which cause the topology
1106       to no longer be "pure" fat-tree.
1107
1108       Note that although fat-tree algorithm supports trees  with  non-integer
1109       CBB  ratio,  the  routing will not be as balanced as in case of integer
1110       CBB ratio.  In addition to this, although  the  algorithm  allows  leaf
1111       switches  to have any number of CAs, the closer the tree is to be fully
1112       populated, the more effective the "shift"  communication  pattern  will
1113       be.   In  general,  even  if  the root list is provided, the closer the
1114       topology to a pure and symmetrical fat-tree, the more optimal the rout‐
1115       ing will be.
1116
1117       The  algorithm  also dumps compute node ordering file (opensm-ftree-ca-
1118       order.dump) in the same directory where the OpenSM  log  resides.  This
1119       ordering  file  provides  the CN order that may be used to create effi‐
1120       cient communication pattern, that will match the routing tables.
1121
1122       Routing between non-CN nodes
1123
1124       The use of the cn_guid_file option allows non-CN nodes to be located on
1125       different  levels  in the fat tree.  In such case, it is not guaranteed
1126       that the Fat Tree algorithm will route between two  non-CN  nodes.   To
1127       solve  this problem, a list of non-CN nodes can be specified by ´-G´ or
1128       ´--io_guid_file´ option.  Theses nodes will be allowed to use  switches
1129       the  wrong  way  round a specific number of times (specified by ´-H´ or
1130       ´--max_reverse_hops´.    With   the   proper    max_reverse_hops    and
1131       io_guid_file values, you can ensure full connectivity in the Fat Tree.
1132
1133       Please  note  that  using  max_reverse_hops creates routes that use the
1134       switch in a counter-stream way.  This option should never  be  used  to
1135       connect nodes with high bandwidth traffic between them ! It should only
1136       be used to allow connectivity for HA purposes or similar.  Also  having
1137       routes the other way around can in theory cause credit loops.
1138
1139       Use these options with extreme care !
1140
1141       Activation through OpenSM
1142
1143       Use  '-R  ftree'  option  to  activate the fat-tree algorithm.  Use '-a
1144       <root_guid_file>' to provide root nodes for ranking. If the `-a' option
1145       is  not  used,  routing algorithm will detect roots automatically.  Use
1146       '-u <root_cn_file>' to provide the list of compute nodes. If  the  `-u'
1147       option is not used, all the CAs are considered as compute nodes.
1148
1149       Note:  LMC  > 0 is not supported by fat-tree routing. If this is speci‐
1150       fied, the default routing algorithm is invoked instead.
1151
1152
1153       LASH Routing Algorithm
1154
1155       LASH is an acronym for LAyered SHortest Path Routing. It is a determin‐
1156       istic  shortest  path  routing algorithm that enables topology agnostic
1157       deadlock-free routing within communication networks.
1158
1159       When computing the routing function, LASH analyzes the network topology
1160       for  the  shortest-path  routes between all pairs of sources / destina‐
1161       tions and groups these paths into virtual layers in such a  way  as  to
1162       avoid deadlock.
1163
1164       Note  LASH  analyzes routes and ensures deadlock freedom between switch
1165       pairs. The link from HCA between and switch does not need virtual  lay‐
1166       ers as deadlock will not arise between switch and HCA.
1167
1168       In more detail, the algorithm works as follows:
1169
1170       1) LASH determines the shortest-path between all pairs of source / des‐
1171       tination switches. Note, LASH ensures the  same  SL  is  used  for  all
1172       SRC/DST  - DST/SRC pairs and there is no guarantee that the return path
1173       for a given DST/SRC will be the reverse of the route SRC/DST.
1174
1175       2) LASH then begins an SL assignment process where a route is  assigned
1176       to  a  layer (SL) if the addition of that route does not cause deadlock
1177       within that layer. This is achieved  by  maintaining  and  analysing  a
1178       channel dependency graph for each layer. Once the potential addition of
1179       a path could lead to deadlock, LASH opens a new layer and continues the
1180       process.
1181
1182       3)  Once  this  stage  has been completed, it is highly likely that the
1183       first layers processed will contain more paths than  the  latter  ones.
1184       To better balance the use of layers, LASH moves paths from one layer to
1185       another so that the number of paths in each layer averages out.
1186
1187       Note, the implementation of LASH in opensm attempts to use as few  lay‐
1188       ers as possible. This number can be less than the number of actual lay‐
1189       ers available.
1190
1191       In general LASH is a very flexible  algorithm.  It  can,  for  example,
1192       reduce to Dimension Order Routing in certain topologies, it is topology
1193       agnostic and fares well in the face of faults.
1194
1195       It has been shown that for both regular and irregular topologies,  LASH
1196       outperforms  Up/Down.  The reason for this is that LASH distributes the
1197       traffic more evenly through a network, avoiding the  bottleneck  issues
1198       related to a root node and always routes shortest-path.
1199
1200       The algorithm was developed by Simula Research Laboratory.
1201
1202
1203       Use '-R lash -Q ' option to activate the LASH algorithm.
1204
1205       Note:  QoS support has to be turned on in order that SL/VL mappings are
1206       used.
1207
1208       Note: LMC > 0 is not supported by the LASH routing. If this  is  speci‐
1209       fied, the default routing algorithm is invoked instead.
1210
1211       For  open regular cartesian meshes the DOR algorithm is the ideal rout‐
1212       ing algorithm. For toroidal meshes on the other hand there are  routing
1213       loops  that can cause deadlocks. LASH can be used to route these cases.
1214       The performance of LASH can be improved by preconditioning the mesh  in
1215       cases  where  there  are multiple links connecting switches and also in
1216       cases where the switches are not cabled consistently. An option  exists
1217       for  LASH  to  do this. To invoke this use '-R lash -Q --do_mesh_analy‐
1218       sis'. This will add an additional phase that analyses the mesh  to  try
1219       to  determine  the  dimension and size of a mesh. If it determines that
1220       the mesh looks like an open or closed cartesian mesh  it  reorders  the
1221       ports in dimension order before the rest of the LASH algorithm runs.
1222
1223       DOR Routing Algorithm
1224
1225       The Dimension Order Routing algorithm is based on the Min Hop algorithm
1226       and so uses shortest paths.  Instead of spreading  traffic  out  across
1227       different  paths  with the same shortest distance, it chooses among the
1228       available shortest paths based on an ordering of dimensions.  Each port
1229       must  be  consistently  cabled  to represent a hypercube dimension or a
1230       mesh dimension.  Alternatively, the -O option can be used to  assign  a
1231       custom  mapping between the ports on a given switch, and the associated
1232       dimension.  Paths are grown from a destination back to a  source  using
1233       the lowest dimension (port) of available paths at each step.  This pro‐
1234       vides the ordering necessary to avoid deadlock.  When there are  multi‐
1235       ple  links  between  any  two  switches,  they still represent only one
1236       dimension and traffic is balanced across them unless port  equalization
1237       is  turned  off.  In the case of hypercubes, the same port must be used
1238       throughout the fabric to represent the hypercube dimension and match on
1239       both  ends of the cable, or the -O option used to accomplish the align‐
1240       ment.  In the case of meshes, the dimension should consistently use the
1241       same  pair  of  ports,  one port on one end of the cable, and the other
1242       port on the other end, continuing along the mesh dimension, or  the  -O
1243       option used as an override.
1244
1245       Use '-R dor' option to activate the DOR algorithm.
1246
1247       DFSSSP and SSSP Routing Algorithm
1248
1249       The  (Deadlock-Free)  Single-Source-Shortest-Path  routing algorithm is
1250       designed to optimize link utilization thru global balancing of  routes,
1251       while  supporting  arbitrary  topologies.  The DFSSSP routing algorithm
1252       uses InfiniBand virtual lanes (SL) to provide deadlock-freedom.
1253
1254       The DFSSSP algorithm consists of five major steps:
1255       1) It discovers the subnet and models the subnet as a  directed  multi‐
1256       graph  in which each node represents a node of the physical network and
1257       each edge represents one direction of the  full-duplex  links  used  to
1258       connect the nodes.
1259       2)  A loop, which iterates over all CA and switches of the subnet, will
1260       perform three steps to generate the linear forwarding tables  for  each
1261       switch:
1262       2.1)  use Dijkstra's algorithm to find the shortest path from all nodes
1263       to the current selected destination;
1264       2.2) update the edge weights in the  graph,  i.e.  add  the  number  of
1265       routes, which use a link to reach the destination, to the link/edge;
1266       2.3)  update  the  LFT  of each switch with the outgoing port which was
1267       used in the current step to route the traffic to the destination node.
1268       3) After the number of available virtual lanes or layers in the  subnet
1269       is  detected  and  a  channel  dependency graph is initialized for each
1270       layer, the algorithm will put each possible route of  the  subnet  into
1271       the first layer.
1272       4)  A  loop  iterates over all channel dependency graphs (CDG) and per‐
1273       forms the following substeps:
1274       4.1) search for a cycle in the current CDG;
1275       4.2) when a cycle is found, i.e. a possible deadlock  is  present,  one
1276       edge  is selected and all routes, which induced this edge, are moved to
1277       the "next higher" virtual layer (CDG[i+1]);
1278       4.3) the cycle search is continued until  all  cycles  are  broken  and
1279       routes are moved "up".
1280       5)  When  the  number  of  needed layers does not exceeds the number of
1281       available SL/VL to remove all cycles in all CDGs, the routing is  dead‐
1282       lock-free  and  an  relation  table  is  generated,  which contains the
1283       assignment of routes from source to destination to a SL
1284
1285       Note on SSSP:
1286       This algorithm does not perform the steps 3)-5) and can not be  consid‐
1287       ered  to  be deadlock-free for all topologies. But on the one hand, you
1288       can choose this algorithm for really large  networks  (5,000+  CAs  and
1289       deadlock-free by design) to reduce the runtime of the algorithm. On the
1290       other hand, you might use the SSSP routing algorithm as an alternative,
1291       when all deadlock-free routing algorithms fail to route the network for
1292       whatever reason.  In the last case, SSSP was  designed  to  deliver  an
1293       equal  or  higher bandwidth due to better congestion avoidance than the
1294       Min Hop routing algorithm.
1295
1296       Notes for usage:
1297       a) running DFSSSP: '-R dfsssp -Q'
1298       a.1) QoS has to be configured to equally spread the load on the  avail‐
1299       able SL or virtual lanes
1300       a.2)  applications  must perform a path record query to get path SL for
1301       each route, which the application will use to transmit packages
1302       b) running SSSP:   '-R sssp'
1303       c) both algorithms support LMC > 0
1304
1305       Hints for optimizing I/O traffic:
1306       Having more nodes (I/O and compute) connected to a switch than incoming
1307       links  can  result  in  a  'bad'  routing of the I/O traffic as long as
1308       (DF)SSSP routing is not aware of the dedicated I/O nodes, i.e., in  the
1309       following  network configuration CN1-CN3 might send all I/O traffic via
1310       Link2 to IO1,IO2:
1311
1312            CN1         Link1        IO1
1313               \       /----\       /
1314         CN2 -- Switch1      Switch2 -- CN4
1315               /       \----/       \
1316            CN3         Link2        IO2
1317
1318       To prevent this from happening (DF)SSSP can use both the  compute  node
1319       guid   file   and   the   I/O  guid  file  specified  by  the  ´-u´  or
1320       ´--cn_guid_file´ and ´-G´ or ´--io_guid_file´ options (similar  to  the
1321       Fat-Tree routing).  This ensures that traffic towards compute nodes and
1322       I/O nodes is balanced separately and therefore distributed as  much  as
1323       possible  across  the available links. Port GUIDs, as listed by ibstat,
1324       must be specified (not Node GUIDs).
1325       The priority for the optimization is as follows:
1326         compute nodes -> I/O nodes -> other nodes
1327       Possible use case scenarios:
1328       a) neither ´-u´ nor ´-G´ are specified: all nodes a treated  as  ´other
1329       nodes´ and therefore balanced equally;
1330       b)  ´-G´ is specified: traffic towards I/O nodes will be balanced opti‐
1331       mally;
1332       c) the system has three node types, such as  login/admin,  compute  and
1333       I/O,  but  the  balancing focus should be I/O, then one has to use ´-u´
1334       and ´-G´ with I/O guids listed in cn_guid_file and compute  node  guids
1335       listed in io_guid_file;
1336       d) ...
1337
1338       Torus-2QoS Routing Algorithm
1339
1340       Torus-2QoS  is  routing  algorithm designed for large-scale 2D/3D torus
1341       fabrics; see torus-2QoS(8) for full documentation.
1342
1343       Use '-R torus-2QoS -Q' or '-R torus-2QoS,no_fallback  -Q'  to  activate
1344       the torus-2QoS algorithm.
1345
1346       Nue Routing Algorithm
1347
1348       Use  either `-R nue' or `-R nue -Q --nue_max_num_vls <int>' to activate
1349       Nue.
1350
1351       Note: if `--nue_max_num_vls' is specified and unequal to  1,  then  QoS
1352       support  must be turned on, so that SL2VL mappings are valid and appli‐
1353       cations comply with suggested  SLs  to  avoid  credit-loops.  For  more
1354       details on QoS and Nue see below.
1355
1356       The implementation of Nue routing for OpenSM is a 100%-applicable, bal‐
1357       anced, and deadlock-free unicast routing engine (which also  configures
1358       multicast  tables,  see  'Note  on multicast' below). The key points of
1359       this algorithm are the following:
1360         - 100% fault-tolerant, oblivious routing strategy
1361         - topology-agnostic, i.e., applicable to every topology (no matter if
1362       topology
1363           is regular, irregular after faults, or random)
1364         -  100%  deadlock-free  routing  within the resource limits (i.e., it
1365       never
1366           exceeds the given number of available virtual lanes,  and  it  does
1367       not
1368           necessarily require virtual lanes) for every topology
1369         - very good path balancing and therefore high throughput (even better
1370       when
1371           using METIS, see notes below)
1372         - QoS (via SLs/VLs) + deadlock-freedom can be  combined  (since  both
1373       rely on
1374           VLs),  e.g.,  using  VL0-3  for  Nue's deadlock-freedom (and 1. QoS
1375       level) and
1376           VL4-7 as second QoS level
1377         - forwarding tables are fast to calculate: O(n^2 *  log  n),  however
1378       slightly
1379           slower  compared  to topology-aware routings (for obvious reasons),
1380       and
1381         - the path-to-VL mapping only depends on the destination,  which  may
1382       be useful
1383           for scalable, efficient path resolution and caching mechanisms.
1384       From  a  very  high level perspective, Nue routing is similar to DFSSSP
1385       (see above) in the sense that both use Dijkstra and edge weight updates
1386       for  path  balancing, and paths are mapped to virtual layers assuming a
1387       1:1 mapping of SL2VL tables.  However, the  fundamental  difference  is
1388       that Nue routing doesn't perform the path calculation on the graph rep‐
1389       resenting the real fabric, and instead routes directly within the chan‐
1390       nel dependency graph. This approach allows Nue routing to place routing
1391       restrictions (to avoid any credit-loops) in an on-demand manner,  which
1392       overcomes  the problem of all other good VL-based algorithms.  Meaning,
1393       the competitors cannot control or limit the use of VLs, and  might  run
1394       out  of them and have to give up. On the flip side, Nue may have to use
1395       detours for a few routes, and hence cannot really be considered "short‐
1396       est-path"  routing,  because  it  is impossible to accomplish deadlock-
1397       free, shortest-path routing with an limited number of available virtual
1398       lanes for arbitrary network topologies.
1399
1400       Note on the use of METIS library with Nue:
1401       Nue routing may has to separate the LIDs into multiple subsets, one for
1402       every virtual layer, if multiple layers are used. Nue has  two  options
1403       to  perform  this partitioning (not to be confused with IB partitions);
1404       the first is a fairly simple semi-random assignment  of  LIDs  to  lay‐
1405       ers/subsets, and the second partitioning uses the METIS library to par‐
1406       tition the network graph into k approximately equal  sized  parts.  The
1407       latter approach has shown better results in terms of path balancing and
1408       avoidance of using fallback paths, and hence it is  HIGHLY  advised  to
1409       install/use  the  METIS  library  with  OpenSM (enforced via `--enable-
1410       metis' configure flag when building OpenSM). For the  rare  case,  that
1411       METIS isn't packaged with the Linux distro, here is a link to the offi‐
1412       cial website to download and install METIS 5.1.0 manually:
1413          http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
1414       OpenSM's configure script also provides options in  case  METIS  header
1415       and library aren't found in the default path.
1416
1417       Runtime options for Nue:
1418       The  behavior of Nue routing can be directly influenced by the osm.conf
1419       parameter (which is also available as command line option):
1420         - nue_max_num_vls: controls/limits the number of virtual lanes/layers
1421       which
1422              Nue is allowed to use (detailed explanation in osm.conf file).
1423       Furthermore,  Nue  supports  TRUE  and  FALSE  settings of avoid_throt‐
1424       tled_links, use_ucast_cache, and qos (more on this hereafter); and  lmc
1425       > 0.
1426
1427       Notes on Quality of Service (QoS):
1428       The  advantage  of  Nue  is  that  it  works with AND without QoS being
1429       enabled, i.e.,  the  usage  of  SLs/VLs  for  deadlock-freedom  can  be
1430       avoided. Here are the three possible usage scenarios:
1431         -  neither  setting  `--nue_max_num_vls  <int>' nor `-Q': Nue assumes
1432       that only 1
1433              virtual layer (identical to physical network; or  OperVLs  equal
1434       to VL0) is
1435              usable and all paths are to be calculated within this one layer.
1436       Hence,
1437              there is no need for special SL2VL mappings in the  network  and
1438       the use of
1439              specific SLs by applications.
1440         -  setting  `-Q'  but not `--nue_max_num_vls <int>': This combination
1441       works like
1442              the previous one,  meaning  the  SL  returned  for  path  record
1443       requests is not
1444              defined  by Nue, since all paths are deadlock-free without using
1445       VLs.
1446              However, any separate QoS settings may influence the SL returned
1447       to
1448              applications.
1449         - setting `-Q --nue_max_num_vls <int>' with int != 1: In this config‐
1450       uration,
1451              applications have to query and obey the SL for path  records  as
1452       returned
1453              by  Nue because otherwise the deadlock-freedom cannot be guaran‐
1454       teed
1455              anymore. Furthermore, errors in the fabric may require  applica‐
1456       tions to
1457              repath to avoid message deadlocks. Since Nue operates on virtual
1458       layer,
1459              admins should configure the SL2VL mapping tables in  an  homoge‐
1460       neous 1:1
1461              manner across the entire subnet to separate the layers.
1462       As  an  additional  note,  using  more VLs for Nue usually improves the
1463       overall network throughput, so there are trade offs admins may have  to
1464       consider when configuring the subnet manager with Nue routing.
1465
1466       Note on multicast:
1467       The  Nue  routing engine configures multicast forwarding tables by uti‐
1468       lizing a spanning tree calculation routed at a subnet switch  suggested
1469       by  OpenSM.  This  spanning  tree for a mcast group will try to use the
1470       least overloaded links (w.r.t the ucast  paths-per-link  metric/weight)
1471       in  the fabric. However, Nue routing currently does not guarantee dead‐
1472       lock-freedom for the set of multicast routes on all topologies, nor for
1473       the  combination of deadlock-free unicast routes with additional multi‐
1474       cast routes. Assuming, for a given topology the calculated mcast routes
1475       are dl-free, then an admin may fix the latter problem by separating the
1476       VLs,  e.g.,   using   VL0-6   for   unicast   routing   by   specifying
1477       `--nue_max_num_vls 7' and utilizing VL7 for multicast.
1478
1479
1480       Routing References
1481
1482       To  learn  more  about deadlock-free routing, see the article "Deadlock
1483       Free Message Routing in  Multiprocessor  Interconnection  Networks"  by
1484       William J Dally and Charles L Seitz (1985).
1485
1486       To  learn  more about the up/down algorithm, see the article "Effective
1487       Strategy to Compute Forwarding Tables for InfiniBand Networks" by  Jose
1488       Carlos  Sancho,  Antonio  Robles,  and  Jose  Duato  at the Universidad
1489       Politecnica de Valencia.
1490
1491       To learn more about LASH and the flexibility behind it, the requirement
1492       for  layers,  performance comparisons to other algorithms, see the fol‐
1493       lowing articles:
1494
1495       "Layered Routing in Irregular Networks", Lysne et al, IEEE Transactions
1496       on Parallel and Distributed Systems, VOL.16, No12, December 2005.
1497
1498       "Routing  for  the  ASI Fabric Manager", Solheim et al. IEEE Communica‐
1499       tions Magazine, Vol.44, No.7, July 2006.
1500
1501       "Layered Shortest Path (LASH) Routing in  Irregular  System  Area  Net‐
1502       works",  Skeie  et al. IEEE Computer Society Communication Architecture
1503       for Clusters 2002.
1504
1505       To learn more about the DFSSSP and  SSSP  routing  algorithm,  see  the
1506       articles:
1507       J.  Domke, T. Hoefler and W. Nagel: Deadlock-Free Oblivious Routing for
1508       Arbitrary Topologies, In Proceedings of  the  25th  IEEE  International
1509       Parallel & Distributed Processing Symposium (IPDPS 2011)
1510       T. Hoefler, T. Schneider and A. Lumsdaine: Optimized Routing for Large-
1511       Scale InfiniBand Networks, In 17th Annual IEEE Symposium on  High  Per‐
1512       formance Interconnects (HOTI 2009)
1513
1514       To learn more about the Nue routing algorithm, see the article "Routing
1515       on the Dependency Graph: A New Approach to  Deadlock-Free  High-Perfor‐
1516       mance  Routing"  by  J. Domke, T. Hoefler and S. Matsuoka (published in
1517       HPDC'16).
1518
1519       Modular Routine Engine
1520
1521       Modular routing engine structure allows for the ease of "plugging"  new
1522       routing modules.
1523
1524       Currently, only unicast callbacks are supported. Multicast can be added
1525       later.
1526
1527       One existing routing module is up-down "updn", which may  be  activated
1528       with '-R updn' option (instead of old '-u').
1529
1530       General usage is: $ opensm -R 'module-name'
1531
1532       There is also a trivial routing module which is able to load LFT tables
1533       from a file.
1534
1535       Main features:
1536
1537        - this will load switch LFTs and/or LID matrices (min hops tables)
1538        - this will load switch LFTs according to the path entries introduced
1539          in the file
1540        - no additional checks will be performed (such as "is port connected",
1541          etc.)
1542        - in case when fabric LIDs were changed this will try to reconstruct
1543          LFTs correctly if endport GUIDs are represented in the file
1544          (in order to disable this, GUIDs may be removed from the file
1545           or zeroed)
1546
1547       The file format is compatible with output of  'ibroute'  util  and  for
1548       whole fabric can be generated with dump_lfts.sh script.
1549
1550       To activate file based routing module, use:
1551
1552         opensm -R file -U /path/to/lfts_file
1553
1554       If the lfts_file is not found or is in error, the default routing algo‐
1555       rithm is utilized.
1556
1557       The ability to dump switch lid matrices (aka min hops tables)  to  file
1558       and later to load these is also supported.
1559
1560       The  usage  is similar to unicast forwarding tables loading from a lfts
1561       file (introduced by 'file' routing engine), but  new  lid  matrix  file
1562       name  should  be specified by -M or --lid_matrix_file option. For exam‐
1563       ple:
1564
1565         opensm -R file -M ./opensm-lid-matrix.dump
1566
1567       The dump file is named ´opensm-lid-matrix.dump´ and will  be  generated
1568       in   standard   opensm   dump  directory  (/var/log  by  default)  when
1569       OSM_LOG_ROUTING logging flag is set.
1570
1571       When routing engine 'file' is activated, but the lfts file is not spec‐
1572       ified or not cannot be open default lid matrix algorithm will be used.
1573
1574       There  is also a switch forwarding tables dumper which generates a file
1575       compatible with dump_lfts.sh output. This file can be used as input for
1576       forwarding  tables  loading  by  'file' routing engine.  Both or one of
1577       options -U and -M can be specified together with ´-R file´.
1578
1579

PER MODULE LOGGING CONFIGURATION

1581       To enable per module logging, configure per_module_logging_file to  the
1582       per module logging config file name in the opensm options file. To dis‐
1583       able, configure per_module_logging_file to (null) there.
1584
1585       The per module logging config file format is a set of lines with module
1586       name and logging level as follows:
1587
1588        <module name><separator><logging level>
1589
1590        <module name> is the file name including .c
1591        <separator> is either = , space, or tab
1592        <logging level> is the same levels as used in the coarse/overall
1593        logging as follows:
1594
1595        BIT    LOG LEVEL ENABLED
1596        ----   -----------------
1597        0x01 - ERROR (error messages)
1598        0x02 - INFO (basic messages, low volume)
1599        0x04 - VERBOSE (interesting stuff, moderate volume)
1600        0x08 - DEBUG (diagnostic, high volume)
1601        0x10 - FUNCS (function entry/exit, very high volume)
1602        0x20 - FRAMES (dumps all SMP and GMP frames)
1603        0x40 - ROUTING (dump FDB routing information)
1604        0x80 - SYS (syslog at LOG_INFO level in addition to OpenSM logging)
1605
1606

FILES

1608       /etc/rdma/opensm.conf
1609              default OpenSM config file.
1610
1611
1612       /etc/rdma/ib-node-name-map
1613              default node name map file.  See ibnetdiscover for more informa‐
1614              tion on format.
1615
1616
1617       /etc/rdma/partitions.conf
1618              default partition config file
1619
1620
1621       /etc/rdma/qos-policy.conf
1622              default QOS policy config file
1623
1624
1625       /etc/rdma/prefix-routes.conf
1626              default prefix routes file
1627
1628
1629       /etc/rdma/per-module-logging.conf
1630              default per module logging config file
1631
1632
1633       /etc/rdma/torus-2QoS.conf
1634              default torus-2QoS config file
1635
1636

AUTHORS

1638       Hal Rosenstock
1639              <hal@mellanox.com>
1640
1641       Sasha Khapyorsky
1642              <sashak@voltaire.com>
1643
1644       Eitan Zahavi
1645              <eitan@mellanox.co.il>
1646
1647       Yevgeny Kliteynik
1648              <kliteyn@mellanox.co.il>
1649
1650       Thomas Sodring
1651              <tsodring@simula.no>
1652
1653       Ira Weiny
1654              <weiny2@llnl.gov>
1655
1656       Dale Purdy
1657              <purdy@sgi.com>
1658
1659

SEE ALSO

1661       torus-2QoS(8), torus-2QoS.conf(5).
1662
1663
1664
1665OpenIB                           Sept 15, 2014                       OPENSM(8)
Impressum