1OPENSM(8)                      OpenIB Management                     OPENSM(8)
2
3
4

NAME

6       opensm - InfiniBand subnet manager and administration (SM/SA)
7
8

SYNOPSIS

10       opensm  [--version]]  [-F  |  --config  <file_name>]  [-c(reate-config)
11       <file_name>] [-g(uid) <GUID in hex>] [-l(mc) <LMC>] [-p(riority)  <PRI‐
12       ORITY>]  [--subnet_prefix  <PREFIX in hex>] [--smkey <SM_Key>] [--sm_sl
13       <SL number>] [-r(eassign_lids)] [-R <engine name(s)> | --routing_engine
14       <engine  name(s)>]  [--do_mesh_analysis]  [--lash_start_vl <vl number>]
15       [--nue_max_num_vls <vl number>]  [-A  |  --ucast_cache]  [-z  |  --con‐
16       nect_roots]  [-M <file name> | --lid_matrix_file <file name>] [-U <file
17       name> | --lfts_file <file name>] [-S | --sadb_file <file name>]  [-a  |
18       --root_guid_file  <path  to file>] [-u | --cn_guid_file <path to file>]
19       [-G | --io_guid_file <path to file>] [--port-shifting] [--scatter-ports
20       <random seed>] [-H | --max_reverse_hops <max reverse hops allowed>] [-X
21       | --guid_routing_order_file <path to file>] [-m | --ids_guid_file <path
22       to  file>]  [-o(nce)] [-s(weep) <interval>] [-t(imeout) <milliseconds>]
23       [--retries <number>] [--maxsmps <number>] [--console  [off  |  local  |
24       socket  |  loopback]]  [--console-port  <port>]  [-i  |  --ignore_guids
25       <equalize-ignore-guids-file>] [-w | --hop_weights_file <path to  file>]
26       [-O    |    --port_search_ordering_file   <path   to   file>]   [-O   |
27       --dimn_ports_file  <path  to  file>]   (DEPRECATED)   [--dump_files_dir
28       <directory-name>]  [-f  <log  file path> | --log_file <log file path> ]
29       [-L | --log_limit <size in MB>] [-e(rase_log_file)] [-P(config) <parti‐
30       tion  config  file>  ]  [-N  |  --no_part_enforce]  (DEPRECATED)  [-Z |
31       --part_enforce [both | in | out | off]] [-W | --allow_both_pkeys] [-Q |
32       --qos  [-Y  |  --qos_policy_file  <file  name>]] [--congestion-control]
33       [--cckey <key>] [-y | --stay_on_fatal] [-B | --daemon] [-J |  --pidfile
34       <file_name>]  [-I  |  --inactive]  [--perfmgr]  [--perfmgr_sweep_time_s
35       <seconds>] [--prefix_routes_file  <path>]  [--consolidate_ipv6_snm_req]
36       [--log_prefix   <prefix   text>]   [--torus_config   <path   to  file>]
37       [-v(erbose)] [-V] [-D <flags>] [-d(ebug) <number>] [-h(elp)] [-?]
38
39

DESCRIPTION

41       opensm is an InfiniBand compliant Subnet  Manager  and  Administration,
42       and runs on top of OpenIB.
43
44       opensm  provides  an implementation of an InfiniBand Subnet Manager and
45       Administration. Such a software entity is required to run for in  order
46       to initialize the InfiniBand hardware (at least one per each InfiniBand
47       subnet).
48
49       opensm also now contains an experimental version of a performance  man‐
50       ager as well.
51
52       opensm defaults were designed to meet the common case usage on clusters
53       with up to a few hundred nodes. Thus, in this default mode, opensm will
54       scan the IB fabric, initialize it, and sweep occasionally for changes.
55
56       opensm  attaches to a specific IB port on the local machine and config‐
57       ures only the fabric connected to it. (If the local machine  has  other
58       IB  ports,  opensm  will  ignore  the  fabrics connected to those other
59       ports). If no port is specified, it will select the first "best" avail‐
60       able port.
61
62       opensm  can present the available ports and prompt for a port number to
63       attach to.
64
65       By default, the run is  logged  to  two  files:  /var/log/messages  and
66       /var/log/opensm.log.   The  first file will register only general major
67       events, whereas the second will include details of reported errors. All
68       errors  reported in this second file should be treated as indicators of
69       IB fabric health issues.  (Note that when a fatal  and  non-recoverable
70       error  occurs,  opensm  will  exit.)  Both log files should include the
71       message "SUBNET UP" if opensm was able to setup the subnet correctly.
72
73

OPTIONS

75       --version
76              Prints OpenSM version and exits.
77
78       -F, --config <config file>
79              The name  of  the  OpenSM  config  file.  When  not  specified
80              /etc/rdma/opensm.conf will be used (if exists).
81
82       -c, --create-config <file name>
83              OpenSM  will  dump  its  configuration to the specified file and
84              exit.  This is a way to generate OpenSM configuration file  tem‐
85              plate.
86
87       -g, --guid <GUID in hex>
88              This  option  specifies  the  local  port  GUID value with which
89              OpenSM should bind.  OpenSM may be bound to 1 port  at  a  time.
90              If  GUID  given  is  0,  OpenSM displays a list of possible port
91              GUIDs and waits for user input.  Without -g, OpenSM tries to use
92              the default port.
93
94       -l, --lmc <LMC value>
95              This  option  specifies  the  subnet's LMC value.  The number of
96              LIDs assigned to each port is 2^LMC.  The LMC value must  be  in
97              the  range  0-7.   LMC  values  > 0 allow multiple paths between
98              ports.  LMC values > 0 should only be used if the subnet  topol‐
99              ogy  actually provides multiple paths between ports, i.e. multi‐
100              ple interconnects between switches.  Without -l, OpenSM defaults
101              to LMC = 0, which allows one path between any two ports.
102
103       -p, --priority <Priority value>
104              This  option  specifies the SM´s PRIORITY.  This will effect the
105              handover cases, where master is chosen  by  priority  and  GUID.
106              Range goes from 0 (default and lowest priority) to 15 (highest).
107
108       --subnet_prefix <PREFIX in hex>
109              This option specifies the subnet prefix to use in on the fabric.
110              The default prefix is 0xfe80000000000000.
111
112       --smkey <SM_Key value>
113              This option specifies the SM´s  SM_Key  (64  bits).   This  will
114              effect  SM  authentication.   Note that OpenSM version 3.2.1 and
115              below used the default value '1' in a host  byte  order,  it  is
116              fixed  now but you may need this option to interoperate with old
117              OpenSM running on a little endian machine.
118
119       --sm_sl <SL number>
120              This option sets the SL to use for communication with the SM/SA.
121              Defaults to 0.
122
123       -r, --reassign_lids
124              This  option  causes  OpenSM  to reassign LIDs to all end nodes.
125              Specifying -r on a running subnet may  disrupt  subnet  traffic.
126              Without -r, OpenSM attempts to preserve existing LID assignments
127              resolving multiple use of same LID.
128
129       -R, --routing_engine <Routing engine names>
130              This option chooses routing engine(s) to use instead of Min  Hop
131              algorithm  (default).  Multiple routing engines can be specified
132              separated by commas so that specific ordering of  routing  algo‐
133              rithms  will  be  tried if earlier routing engines fail.  If all
134              configured routing engines fail, OpenSM will always  attempt  to
135              route  with Min Hop unless 'no_fallback' is included in the list
136              of routing engines.   Supported  engines:  minhop,  updn,  dnup,
137              file, ftree, lash, dor, torus-2QoS, nue, dfsssp, sssp.
138
139       --do_mesh_analysis
140              This  option  enables  additional  analysis for the lash routing
141              engine to precondition switch port assignments in regular carte‐
142              sian  meshes which may reduce the number of SLs required to give
143              a deadlock free routing.
144
145       --lash_start_vl <vl number>
146              This option sets the starting VL to use  for  the  lash  routing
147              algorithm.  Defaults to 0.
148
149       --nue_max_num_vls <vl number>
150              This  option  sets  the maximum number of VLs to use for the Nue
151              routing engine.  Every number greater or equal to 0 is  allowed,
152              and  the default is 1 to enforce deadlock-freedom even if QoS is
153              not enabled. If set to 0, then Nue  routing  will  automatically
154              determine and choose maximum supported by the fabric. And if set
155              to   any   integer   >=   1,   then   Nue   uses    min(max_sup‐
156              ported,nue_max_num_vls).     Rule    of    thumb    is:   higher
157              nue_max_num_vls results in better path balancing.
158
159       -A, --ucast_cache
160              This option enables unicast routing cache and  prevents  routing
161              recalculation  (which  is  a heavy task in a large cluster) when
162              there was no topology change detected during the heavy sweep, or
163              when  the  topology change does not require new routing calcula‐
164              tion, e.g. when one or more CAs/RTRs/leaf switches  going  down,
165              or  one  or more of these nodes coming back after being down.  A
166              very common case that is handled by the unicast routing cache is
167              host reboot, which otherwise would cause two full routing recal‐
168              culations: one when the host goes down, and the other  when  the
169              host comes back online.
170
171       -z, --connect_roots
172              This  option  enforces routing engines (up/down and fat-tree) to
173              make connectivity between root switches and in this  way  to  be
174              fully IBA compliant. In many cases this can violate "pure" dead‐
175              lock free algorithm, so use it carefully.
176
177       -M, --lid_matrix_file <file name>
178              This option specifies the name of the lid matrix dump file  from
179              where switch lid matrices (min hops tables) will be loaded.
180
181       -U, --lfts_file <file name>
182              This  option  specifies  the  name  of  the LFTs file from where
183              switch forwarding tables will be loaded when using "file"  rout‐
184              ing engine.
185
186       -S, --sadb_file <file name>
187              This option specifies the name of the SA DB dump file from where
188              SA database will be loaded.
189
190       -a, --root_guid_file <file name>
191              Set the root nodes for the Up/Down or Fat-Tree routing algorithm
192              to the guids provided in the given file (one to a line).
193
194       -u, --cn_guid_file <file name>
195              Set  the  compute  nodes for the Fat-Tree or DFSSSP/SSSP routing
196              algorithms to the port GUIDs provided in the given file (one  to
197              a line).
198
199       -G, --io_guid_file <file name>
200              Set  the I/O nodes for the Fat-Tree or DFSSSP/SSSP routing algo‐
201              rithms to the port GUIDs provided in the given file  (one  to  a
202              line).
203              In the case of Fat-Tree routing:
204              I/O nodes are non-CN nodes allowed to use up to max_reverse_hops
205              switches the wrong way around to improve connectivity.
206              In the case of (DF)SSSP routing:
207              Providing guids of compute and/or I/O  nodes  will  ensure  that
208              paths  towards  those  nodes  are  as much separated as possible
209              within their node category, i.e., I/O traffic will not share the
210              same link if multiple links are available.
211
212       --port-shifting
213              This  option  enables  a  feature called port shifting.  In some
214              fabrics,  particularly  cluster  environments,  routes  commonly
215              align  and  congest  with  other  routes  due to algorithmically
216              unchanging traffic patterns.  This routing option  will  "shift"
217              routing around in an attempt to alleviate this problem.
218
219       --scatter-ports <random seed>
220              This  option  is  used  to  randomize  port selection in routing
221              rather  than  using  a  round-robin  algorithm  (which  is   the
222              default).  Value  supplied with option is used as a random seed.
223              If value is 0, which is the default, the scatter ports option is
224              disabled.
225
226       -H, --max_reverse_hops <max reverse hops allowed>
227              Set the maximum number of reverse hops an I/O node is allowed to
228              make. A reverse hop is the use of a switch the wrong way around.
229
230       -m, --ids_guid_file <file name>
231              Name of the map file with set of the IDs which will be  used  by
232              Up/Down  routing algorithm instead of node GUIDs (format: <guid>
233              <id> per line).
234
235       -X, --guid_routing_order_file <file name>
236              Set the order port guids will  be  routed  for  the  MinHop  and
237              Up/Down  routing  algorithms  to the guids provided in the given
238              file (one to a line).
239
240       -o, --once
241              This option causes OpenSM to configure  the  subnet  once,  then
242              exit.  Ports remain in the ACTIVE state.
243
244       -s, --sweep <interval value>
245              This  option  specifies  the  number  of  seconds between subnet
246              sweeps.  Specifying -s 0 disables sweeping.  Without -s,  OpenSM
247              defaults to a sweep interval of 10 seconds.
248
249       -t, --timeout <value>
250              This option specifies the time in milliseconds used for transac‐
251              tion timeouts.  Timeout values  should  be  >  0.   Without  -t,
252              OpenSM defaults to a timeout value of 200 milliseconds.
253
254       --retries <number>
255              This  option  specifies  the number of retries used for transac‐
256              tions.  Without --retries, OpenSM  defaults  to  3  retries  for
257              transactions.
258
259       --maxsmps <number>
260              This option specifies the number of VL15 SMP MADs allowed on the
261              wire at any one time.  Specifying --maxsmps 0  allows  unlimited
262              outstanding SMPs.  Without --maxsmps, OpenSM defaults to a maxi‐
263              mum of 4 outstanding SMPs.
264
265       --console [off | local | loopback | socket]
266              This option brings up the OpenSM console (default  off).   Note,
267              loopback  and  socket  open  a  socket which can be connected to
268              WITHOUT CREDENTIALS.  Loopback is safer if  access  to  your  SM
269              host  is  controlled.  tcp_wrappers (hosts.[allow|deny]) is used
270              with loopback and socket.  loopback  and  socket  will  only  be
271              available  if  OpenSM  was  built with --enable-console-loopback
272              (default yes) and --enable-console-socket (default  no)  respec‐
273              tively.
274
275       --console-port <port>
276              Specify an alternate telnet port for the socket console (default
277              10000).  Note that this option only appears if OpenSM was  built
278              with --enable-console-socket.
279
280       -i, --ignore_guids <equalize-ignore-guids-file>
281              This option provides the means to define a set of ports (by node
282              guid and port number) that will be  ignored  by  the  link  load
283              equalization algorithm.
284
285       -w, --hop_weights_file <path to file>
286              This  option  provides weighting factors per port representing a
287              hop cost in computing the lid  matrix.   The  file  consists  of
288              lines  containing  a switch port GUID (specified as a 64 bit hex
289              number, with leading 0x), output port number, and weighting fac‐
290              tor.   Any  port  not listed in the file defaults to a weighting
291              factor of 1.  Lines  starting  with  #  are  comments.   Weights
292              affect  only the output route from the port, so many useful con‐
293              figurations will require weights to be specified in pairs.
294
295       -O, --port_search_ordering_file <path to file>
296              This option tweaks the routing. It suitable for  two  cases:  1.
297              While  using DOR routing algorithm.  This option provides a map‐
298              ping between hypercube dimensions and  ports  on  a  per  switch
299              basis  for  the  DOR routing engine.  The file consists of lines
300              containing a switch node GUID (specified as a 64 bit hex number,
301              with  leading  0x)  followed by a list of non-zero port numbers,
302              separated by spaces, one switch per line.   The  order  for  the
303              port  numbers is in one to one correspondence to the dimensions.
304              Ports not listed on a line are assigned to the remaining  dimen‐
305              sions,  in  port  order.   Anything  after a # is a comment.  2.
306              While using general routing algorithm.  This option provides the
307              order  of  the ports that would be chosen for routing, from each
308              switch rather than searching for an appropriate port from port 1
309              to  N.  The file consists of lines containing a switch node GUID
310              (specified as a 64 bit hex number, with leading 0x) followed  by
311              a list of non-zero port numbers, separated by spaces, one switch
312              per line.  In case of DOR, the order for the port numbers is  in
313              one  to  one correspondence to the dimensions.  Ports not listed
314              on a line are assigned to  the  remaining  dimensions,  in  port
315              order.  Anything after a # is a comment.
316
317       -O, --dimn_ports_file <path to file> (DEPRECATED)
318              This  is  a  deprecated  flag.  Please  use --port_search_order‐
319              ing_file instead.  This option provides a mapping between hyper‐
320              cube  dimensions  and  ports  on  a per switch basis for the DOR
321              routing engine.  The file consists of lines containing a  switch
322              node  GUID  (specified  as a 64 bit hex number, with leading 0x)
323              followed by a list of non-zero port numbers, separated  by  spa‐
324              ces,  one switch per line.  The order for the port numbers is in
325              one to one correspondence to the dimensions.  Ports  not  listed
326              on  a  line  are  assigned  to the remaining dimensions, in port
327              order.  Anything after a # is a comment.
328
329       -x, --honor_guid2lid
330              This option forces OpenSM to honor the guid2lid  file,  when  it
331              comes   out   of  Standby  state,  if  such  file  exists  under
332              OSM_CACHE_DIR, and is valid.  By default, this is FALSE.
333
334       --dump_files_dir <directory name>
335              This option will set the directory to hold the file dumps.
336
337       -f, --log_file <file name>
338              This option defines the log to be the given file.   By  default,
339              the log goes to /var/log/opensm.log.  For the log to go to stan‐
340              dard output use -f stdout.
341
342       -L, --log_limit <size in MB>
343              This option defines maximal log file size in MB. When  specified
344              the log file will be truncated upon reaching this limit.
345
346       -e, --erase_log_file
347              This  option  will  cause deletion of the log file (if it previ‐
348              ously exists). By default, the log file is accumulative.
349
350       -P, --Pconfig <partition config file>
351              This option defines the optional partition  configuration  file.
352              The default name is /etc/rdma/partitions.conf.
353
354       --prefix_routes_file <file name>
355              Prefix routes control how the SA responds to path record queries
356              for off-subnet DGIDs.  By default, the SA  fails  such  queries.
357              The PREFIX ROUTES section below describes the format of the con‐
358              figuration      file.       The      default       path       is
359              /etc/rdma/prefix-routes.conf.
360
361       -Q, --qos
362              This option enables QoS setup. It is disabled by default.
363
364       -Y, --qos_policy_file <file name>
365              This  option  defines  the optional QoS policy file. The default
366              name    is    /etc/rdma/qos-policy.conf.     See     QoS_manage‐
367              ment_in_OpenSM.txt in opensm doc for more information on config‐
368              uring QoS policy via this file.
369
370       --congestion_control
371              (EXPERIMENTAL) This option enables congestion control configura‐
372              tion.   It  is disabled by default.  See config file for conges‐
373              tion control configuration options.  --cc_key <key>  (EXPERIMEN‐
374              TAL)  This  option  configures the CCkey to use when configuring
375              congestion control.  Note that this option does not configure  a
376              new CCkey into switches and CAs.  Defaults to 0.
377
378       -N, --no_part_enforce (DEPRECATED)
379              This  is  a  deprecated flag. Please use --part_enforce instead.
380              This option disables partition enforcement  on  switch  external
381              ports.
382
383       -Z, --part_enforce [both | in | out | off]
384              This  option  indicates  the  partition  enforcement  type  (for
385              switches).  Enforcement type can be inbound only (in),  outbound
386              only (out), both or disabled (off). Default is both.
387
388       -W, --allow_both_pkeys
389              This  option  indicates whether both full and limited membership
390              on the same  partition  can  be  configured  in  the  PKeyTable.
391              Default is not to allow both pkeys.
392
393       -y, --stay_on_fatal
394              This  option  will  cause SM not to exit on fatal initialization
395              issues: if SM discovers duplicated guids or a 12x link with lane
396              reversal  badly  configured.   By  default,  the SM will exit on
397              these errors.
398
399       -B, --daemon
400              Run in daemon mode - OpenSM will run in the background.
401
402       -J, --pidfile <file_name>
403              Makes the SM write its  own  PID  to  the  specified  file  when
404              started in daemon mode.
405
406       -I, --inactive
407              Start SM in inactive rather than init SM state.  This option can
408              be used in conjunction with the perfmgr so as to  run  a  stand‐
409              alone  performance  manager without SM/SA.  However, this is NOT
410              currently implemented in the performance manager.
411
412       --perfmgr
413              Enable the perfmgr.  Only takes effect if  --enable-perfmgr  was
414              specified  at configure time.  See performance-manager-HOWTO.txt
415              in opensm doc for more information on running perfmgr.
416
417       --perfmgr_sweep_time_s <seconds>
418              Specify the sweep time for the performance  manager  in  seconds
419              (default is 180 seconds).  Only takes effect if --enable-perfmgr
420              was specified at configure time.
421
422       --consolidate_ipv6_snm_req
423              Use shared MLID for IPv6 Solicited  Node  Multicast  groups  per
424              MGID scope and P_Key.
425
426       --log_prefix <prefix text>
427              This  option  specifies  the  prefix to the syslog messages from
428              OpenSM.  A suitable prefix can be used to identify the IB subnet
429              in syslog messages when two or more instances of OpenSM run in a
430              single node to manage multiple fabrics. For example, in a  dual-
431              fabric  (or dual-rail) IB cluster, the prefix for the first fab‐
432              ric could be "mpi" and the other fabric could be "storage".
433
434       --torus_config <path to torus-2QoS config file>
435              This option defines the file name for  the  extra  configuration
436              information  needed  for  the  torus-2QoS  routing engine.   The
437              default name is /etc/rdma/torus-2QoS.conf
438
439       -v, --verbose
440              This option increases the log verbosity level.   The  -v  option
441              may  be  specified  multiple  times to further increase the ver‐
442              bosity level.  See the -D option for more information about  log
443              verbosity.
444
445       -V     This  option  sets  the  maximum  verbosity level and forces log
446              flushing.  The -V option is equivalent to ´-D 0xFF -d  2´.   See
447              the -D option for more information about log verbosity.
448
449       -D <value>
450              This  option  sets  the log verbosity level.  A flags field must
451              follow the -D option.  A bit set/clear in the flags enables/dis‐
452              ables a specific log level as follows:
453
454               BIT    LOG LEVEL ENABLED
455               ----   -----------------
456               0x01 - ERROR (error messages)
457               0x02 - INFO (basic messages, low volume)
458               0x04 - VERBOSE (interesting stuff, moderate volume)
459               0x08 - DEBUG (diagnostic, high volume)
460               0x10 - FUNCS (function entry/exit, very high volume)
461               0x20 - FRAMES (dumps all SMP and GMP frames)
462               0x40 - ROUTING (dump FDB routing information)
463               0x80 - SYS (syslog at LOG_INFO level in addition to OpenSM log‐
464              ging)
465
466              Without -D, OpenSM defaults to ERROR + INFO  (0x3).   Specifying
467              -D 0 disables all messages.  Specifying -D 0xFF enables all mes‐
468              sages (see -V).  High verbosity levels  may  require  increasing
469              the transaction timeout with the -t option.
470
471       -d, --debug <value>
472              This  option  specifies  a  debug option.  These options are not
473              normally needed.  The number  following  -d  selects  the  debug
474              option to enable as follows:
475
476               OPT   Description
477               ---    -----------------
478               -d0  - Ignore other SM nodes
479               -d1  - Force single threaded dispatching
480               -d2  - Force log flushing after each log message
481               -d3  - Disable multicast support
482
483       -h, --help
484              Display this usage info then exit.
485
486       -?     Display this usage info then exit.
487
488

ENVIRONMENT VARIABLES

490       The following environment variables control opensm behavior:
491
492       OSM_TMP_DIR  - controls the directory in which the temporary files gen‐
493       erated by opensm  are  created.  These  files  are:  opensm-subnet.lst,
494       opensm.fdbs, and opensm.mcfdbs. By default, this directory is /var/log.
495       Note that --dump_files_dir command line option or dump_file_dir  option
496       in option/config file takes precedence over this environment variable.
497
498       OSM_CACHE_DIR - opensm stores certain data to the disk such that subse‐
499       quent  runs   are   consistent.   The   default   directory   used   is
500       /var/cache/opensm.  The following files are included in it:
501
502        guid2lid  - stores the LID range assigned to each GUID
503        guid2mkey - stores the MKey previously assigned to each GUID
504        neighbors - stores a map of the GUIDs at either end of each link
505                    in the fabric
506
507

NOTES

509       When  opensm receives a HUP signal, it starts a new heavy sweep as if a
510       trap was received or a topology change was found.
511
512       Also, SIGUSR1 can be used to trigger a  reopen  of  /var/log/opensm.log
513       for logrotate purposes.
514
515

PARTITION CONFIGURATION

517       The   default   name   of   OpenSM  partitions  configuration  file  is
518       /etc/rdma/partitions.conf. The default may  be  changed  by  using  the
519       --Pconfig (-P) option with OpenSM.
520
521       The  default  partition  will be created by OpenSM unconditionally even
522       when partition configuration file does not exist or cannot be accessed.
523
524       The default partition has P_Key value 0x7fff. OpenSM´s port will always
525       have  full  membership  in  default partition. All other end ports will
526       have full membership if the partition configuration file is  not  found
527       or cannot be accessed, or limited membership if the file exists and can
528       be accessed but there is no rule for the Default partition.
529
530       Effectively, this amounts to the same as if one of the following  rules
531       below appear in the partition configuration file.
532
533       In the case of no rule for the Default partition:
534
535       Default=0x7fff : ALL=limited, SELF=full ;
536
537       In  the  case  of  no  partition  configuration  file or file cannot be
538       accessed:
539
540       Default=0x7fff : ALL=full ;
541
542
543       File Format
544
545       Comments:
546
547       Line content followed after ´#´ character is  comment  and  ignored  by
548       parser.
549
550       General file format:
551
552       <Partition Definition>:[<newline>]<Partition Properties>;
553
554            Partition Definition:
555              [PartitionName][=PKey][,indx0][,ipoib_bc_flags][,defmember=full|limited]
556
557               PartitionName  - string, will be used with logging. When
558                                omitted, empty string will be used.
559               PKey           - P_Key value for this partition. Only low 15
560                                bits will be used. When omitted will be
561                                autogenerated.
562               indx0          - indicates that this pkey should be inserted in
563                                block 0 index 0.
564               ipoib_bc_flags - used to indicate/specify IPoIB capability of
565                                this partition.
566
567               defmember=full|limited|both - specifies default membership for
568                                port guid list. Default is limited.
569
570            ipoib_bc_flags:
571               ipoib_flag|[mgroup_flag]*
572
573               ipoib_flag:
574                   ipoib  - indicates that this partition may be used for
575                            IPoIB, as a result the IPoIB broadcast group will
576                            be created with the mgroup_flag flags given,
577                            if any.
578
579            Partition Properties:
580              [<Port list>|<MCast Group>]* | <Port list>
581
582            Port list:
583               <Port Specifier>[,<Port Specifier>]
584
585            Port Specifier:
586               <PortGUID>[=[full|limited|both]]
587
588               PortGUID         - GUID of partition member EndPort.
589                                  Hexadecimal numbers should start from
590                                  0x, decimal numbers are accepted too.
591               full, limited,   - indicates full and/or limited membership for
592               both               this port.  When omitted (or unrecognized)
593                                  limited membership is assumed.  Both
594                                  indicates both full and limited membership
595                                  for this port.
596
597            MCast Group:
598               mgid=gid[,mgroup_flag]*<newline>
599
600                                - gid specified is verified to be a Multicast
601                                  address.  IP groups are verified to match
602                                  the rate and mtu of the broadcast group.
603                                  The P_Key bits of the mgid for IP groups are
604                                  verified to either match the P_Key specified
605                                  in by "Partition Definition" or if they are
606                                  0x0000 the P_Key will be copied into those
607                                  bits.
608
609            mgroup_flag:
610               rate=<val>  - specifies rate for this MC group
611                             (default is 3 (10GBps))
612               mtu=<val>   - specifies MTU for this MC group
613                             (default is 4 (2048))
614               sl=<val>    - specifies SL for this MC group
615                             (default is 0)
616               scope=<val> - specifies scope for this MC group
617                             (default is 2 (link local)).  Multiple scope
618                             settings are permitted for a partition.
619                             NOTE: This overwrites the scope nibble of the
620                                   specified mgid.  Furthermore specifying
621                                   multiple scope settings will result in
622                                   multiple MC groups being created.
623               Q_Key=<val>     - specifies the Q_Key for this MC group
624                                 (default: 0x0b1b for IP groups, 0 for other
625                                  groups)
626                                 WARNING: changing this for the broadcast
627                                          group may break IPoIB on client
628                                          nodes!!
629               TClass=<val>    - specifies tclass for this MC group
630                                 (default is 0)
631               FlowLabel=<val> - specifies FlowLabel for this MC group
632                                 (default  is  0)       NOTE:  All mgroup_flag
633       flags MUST be separated by comma (,).
634
635       Note that values for rate, mtu, and scope, for both partitions and mul‐
636       ticast groups, should be specified as defined in the IBTA specification
637       (for example, mtu=4 for 2048).
638
639       There are several useful keywords for PortGUID definition:
640
641        - 'ALL' means all end ports in this subnet.
642        - 'ALL_CAS' means all Channel Adapter end ports in this subnet.
643        - 'ALL_SWITCHES' means all Switch end ports in this subnet.
644        - 'ALL_ROUTERS' means all Router end ports in this subnet.
645        - 'SELF' means subnet manager's port.
646
647       Empty list means no ports in this partition.
648
649       Notes:
650
651       White space is permitted between delimiters ('=', ',',':',';').
652
653       PartitionName does not need to be unique, PKey does need to be  unique.
654       If  PKey is repeated then those partition configurations will be merged
655       and first PartitionName will be used (see also next note).
656
657       It is possible to split partition configuration in more than one  defi‐
658       nition, but then PKey should be explicitly specified (otherwise differ‐
659       ent PKey values will be generated for those definitions).
660
661       Examples:
662
663        Default=0x7fff : ALL, SELF=full ;
664        Default=0x7fff : ALL, ALL_SWITCHES=full, SELF=full ;
665
666        NewPartition , ipoib : 0x123456=full, 0x3456789034=limi,  0x2134af2306
667       ;
668
669        YetAnotherOne = 0x300 : SELF=full ;
670        YetAnotherOne = 0x300 : ALL=limited ;
671
672        ShareIO = 0x80 , defmember=full : 0x123451, 0x123452;
673        # 0x123453, 0x123454 will be limited
674        ShareIO = 0x80 : 0x123453, 0x123454, 0x123455=full;
675        # 0x123456, 0x123457 will be limited
676        ShareIO   =   0x80   :   defmember=limited   :   0x123456,   0x123457,
677       0x123458=full;
678        ShareIO = 0x80 , defmember=full : 0x123459, 0x12345a;
679        ShareIO  =  0x80  ,  defmember=full  :   0x12345b,   0x12345c=limited,
680       0x12345d;
681
682        # multicast groups added to default
683        Default=0x7fff,ipoib:
684               mgid=ff12:401b::0707,sl=1 # random IPv4 group
685               mgid=ff12:601b::16    # MLDv2-capable routers
686               mgid=ff12:401b::16    # IGMP
687               mgid=ff12:601b::2     # All routers
688               mgid=ff12::1,sl=1,Q_Key=0xDEADBEEF,rate=3,mtu=2 # random group
689               ALL=full;
690
691
692       Note:
693
694       The following rule is equivalent to how OpenSM used to run prior to the
695       partition manager:
696
697        Default=0x7fff,ipoib:ALL=full;
698
699

QOS CONFIGURATION

701       There are a set of QoS related low-level configuration parameters.  All
702       these  parameter  names  are  prefixed by "qos_" string. Here is a full
703       list of these parameters:
704
705        qos_max_vls    - The maximum number of VLs that will be on the subnet
706        qos_high_limit - The limit of High Priority component of VL
707                         Arbitration table (IBA 7.6.9)
708        qos_vlarb_low  - Low priority VL Arbitration table (IBA 7.6.9)
709                         template
710        qos_vlarb_high - High priority VL Arbitration table (IBA 7.6.9)
711                         template
712                         Both VL arbitration templates are pairs of
713                         VL and weight
714        qos_sl2vl      - SL2VL Mapping table (IBA 7.6.6) template. It is
715                         a list of VLs corresponding to SLs 0-15 (Note
716                         that VL15 used here means drop this SL)
717
718       Typical default values (hard-coded in OpenSM initialization) are:
719
720        qos_max_vls 15
721        qos_high_limit 0
722        qos_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4
723        qos_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0
724        qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7
725
726       The syntax is compatible with rest of OpenSM configuration options  and
727       values may be stored in OpenSM config file (cached options file).
728
729       In  addition  to  the  above,  we may define separate QoS configuration
730       parameters sets for various target types. As targets, we currently sup‐
731       port CAs, routers, switch external ports, and switch's enhanced port 0.
732       The names of such specialized parameters are prefixed by  "qos_<type>_"
733       string. Here is a full list of the currently supported sets:
734
735        qos_ca_  - QoS configuration parameters set for CAs.
736        qos_rtr_ - parameters set for routers.
737        qos_sw0_ - parameters set for switches' port 0.
738        qos_swe_ - parameters set for switches' external ports.
739
740       Examples:
741        qos_sw0_max_vls=2
742        qos_ca_sl2vl=0,1,2,3,5,5,5,12,12,0,
743        qos_swe_high_limit=0
744
745

PREFIX ROUTES

747       Prefix  routes  control  how the SA responds to path record queries for
748       off-subnet DGIDs.  By default, the SA fails such  queries.   Note  that
749       IBA  does  not  specify how the SA should obtain off-subnet path record
750       information.  The prefix routes configuration is meant  as  a  stop-gap
751       until the specification is completed.
752
753       Each  line  in  the configuration file is a 64-bit prefix followed by a
754       64-bit GUID, separated by white space.  The GUID specifies  the  router
755       port  on the local subnet that will handle the prefix.  Blank lines are
756       ignored, as is anything between a # character and the end of the  line.
757       The  prefix  and  GUID  are  both  in  hex, the leading 0x is optional.
758       Either, or both, can be wild-carded by specifying an  asterisk  instead
759       of an explicit prefix or GUID.
760
761       When  responding  to a path record query for an off-subnet DGID, opensm
762       searches for the first prefix match in the configuration file.   There‐
763       fore,  the order of the lines in the configuration file is important: a
764       wild-carded prefix at the beginning of the configuration  file  renders
765       all  subsequent lines useless.  If there is no match, then opensm fails
766       the query.  It is legal to repeat prefixes in the  configuration  file,
767       opensm  will return the path to the first available matching router.  A
768       configuration file with a single line where both prefix  and  GUID  are
769       wild-carded  means  that  a path record query specifying any off-subnet
770       DGID should return a path to the first available router.  This configu‐
771       ration  yields  the same behavior formerly achieved by compiling opensm
772       with -DROUTER_EXP which has been obsoleted.
773
774

MKEY CONFIGURATION

776       OpenSM supports configuring a single  management  key  (MKey)  for  use
777       across the subnet.
778
779       The following configuration options are available:
780
781        m_key                  - the 64-bit MKey to be used on the subnet
782                                 (IBA 14.2.4)
783        m_key_protection_level - the numeric value of the MKey ProtectBits
784                                 (IBA 14.2.4.1)
785        m_key_lease_period     - the number of seconds a CA will wait for a
786                                 response from the SM before resetting the
787                                 protection level to 0 (IBA 14.2.4.2).
788
789       OpenSM  will  configure  all  ports  with  the MKey specified by m_key,
790       defaulting to a value of 0. A m_key value of 0 disables MKey protection
791       on  the subnet.  Switches and HCAs with a non-zero MKey will not accept
792       requests to change their configuration unless the request includes  the
793       proper MKey.
794
795       MKey Protection Levels
796
797       MKey  protection  levels  modify  how  switches and CAs respond to SMPs
798       lacking a valid MKey.  OpenSM will configure each port's ProtectBits to
799       support  the level defined by the m_key_protection_level parameter.  If
800       no parameter is specified, OpenSM defaults to operating  at  protection
801       level 0.
802
803       There are currently 4 protection levels defined by the IBA:
804
805        0 - Queries return valid data, including MKey.  Configuration changes
806            are not allowed unless the request contains a valid MKey.
807        1 - Like level 0, but the MKey is set to 0 (0x00000000) in queries,
808            unless the request contains a valid MKey.
809        2 - Neither queries nor configuration changes are allowed, unless the
810            request contains a valid MKey.
811        3 - Identical to 2.  Maintained for backwards compatibility.
812
813       MKey Lease Period
814
815       InfiniBand  supports  a  MKey lease timeout, which is intended to allow
816       administrators or a new SM to recover/reset lost MKeys on a fabric.
817
818       If MKeys are enabled on the subnet  and  a  switch  or  CA  receives  a
819       request  that  requires a valid MKey but does not contain one, it warns
820       the SM by sending a trap (Bad M_Key, Trap  256).   If  the  MKey  lease
821       period is non-zero, it also starts a countdown timer for the time spec‐
822       ified by the lease period.  If a SM (or other agent) responds with  the
823       correct  MKey,  the timer is stopped and reset.  Should the timer reach
824       zero, the switch or CA will reset  its  MKey  protection  level  to  0,
825       exposing the MKey and allowing recovery.
826
827       OpenSM will initialize all ports to use a mkey lease period of the num‐
828       ber of seconds specified in the config file.  If  no  mkey_lease_period
829       is specified, a default of 0 will be used.
830
831       OpenSM  normally quickly responds to all Bad_M_Key traps, resetting the
832       lease timers.  Additionally, OpenSM's subnet sweeps  will  also  cancel
833       any  running  timers.   For  maximum  protection  against accidentally-
834       exposed MKeys, the MKey lease time should be a  few  multiples  of  the
835       subnet sweep time.  If OpenSM detects at startup that your sweep inter‐
836       val is greater than your MKey lease period, it  will  reset  the  lease
837       period  to  be greater than the sweep interval.  Similarly, if sweeping
838       is disabled at startup, it will be re-enabled  with  an  interval  less
839       than the Mkey lease period.
840
841       If  OpenSM  is  required  to  recover  a subnet for which it is missing
842       mkeys, it must do so one switch level at a time.  As  such,  the  total
843       time to recover the subnet may be as long as the mkey lease period mul‐
844       tiplied by the maximum number of hops between the SM and  an  endpoint,
845       plus one.
846
847       MKey Effects on Diagnostic Utilities
848
849       Setting a MKey may have a detrimental effect on diagnostic software run
850       on the subnet, unless your diagnostic  software  is  able  to  retrieve
851       MKeys from the SA or can be explicitly configured with the proper MKey.
852       This is particularly true at protection level 2, where CAs will  ignore
853       queries for management information that do not contain the proper MKey.
854
855

ROUTING

857       OpenSM now offers ten routing engines:
858
859       1.   Min  Hop  Algorithm - based on the minimum hops to each node where
860       the path length is optimized.
861
862       2.  UPDN Unicast routing algorithm - also based on the minimum hops  to
863       each  node,  but  it  is  constrained  to ranking rules. This algorithm
864       should be chosen if the subnet is not a pure Fat Tree, and deadlock may
865       occur due to a loop in the subnet.
866
867       3.  DNUP Unicast routing algorithm - similar to UPDN but allows routing
868       in fabrics which have some CA nodes attached closer to the  roots  than
869       some switch nodes.
870
871       4.  Fat Tree Unicast routing algorithm - this algorithm optimizes rout‐
872       ing for congestion-free "shift" communication pattern.   It  should  be
873       chosen  if  a subnet is a symmetrical or almost symmetrical fat-tree of
874       various types,  not  just  K-ary-N-Trees:  non-constant  K,  not  fully
875       staffed,  any  Constant  Bisectional Bandwidth (CBB) ratio.  Similar to
876       UPDN, Fat Tree routing is constrained to ranking rules.
877
878       5. LASH unicast routing algorithm - uses InfiniBand virtual layers (SL)
879       to  provide deadlock-free shortest-path routing while also distributing
880       the paths between layers. LASH is an alternative  deadlock-free  topol‐
881       ogy-agnostic routing algorithm to the non-minimal UPDN algorithm avoid‐
882       ing the use of a potentially congested root node.
883
884       6. DOR Unicast routing algorithm - based on the Min Hop algorithm,  but
885       avoids  port  equalization  except for redundant links between the same
886       two switches.  This provides deadlock free routes for  hypercubes  when
887       the  fabric  is  cabled  as a hypercube and for meshes when cabled as a
888       mesh (see details below).
889
890       7. Torus-2QoS unicast routing algorithm - a DOR-based routing algorithm
891       specialized  for 2D/3D torus topologies.  Torus-2QoS provides deadlock-
892       free routing while supporting two quality of service (QoS) levels.   In
893       addition  it  is able to route around multiple failed fabric links or a
894       single failed fabric switch without introducing deadlocks, and  without
895       changing path SL values granted before the failure.
896
897       8.  DFSSSP  unicast  routing algorithm - a deadlock-free single-source-
898       shortest-path routing, which uses the SSSP algorithm (see algorithm 9.)
899       as  the  base  to optimize link utilization and uses InfiniBand virtual
900       lanes (SL) to provide deadlock-freedom.
901
902       9. SSSP unicast routing algorithm - a single-source-shortest-path rout‐
903       ing algorithm, which globally balances the number of routes per link to
904       optimize link utilization. This routing algorithm has  no  restrictions
905       in terms of the underlying topology.
906
907       10. Nue unicast routing algorithm - a 100%-applicable and deadlock-free
908       routing which can be used for any arbitrary or faulty network  topology
909       and  any  number  of virtual lanes (this includes the absence of VLs as
910       well). Paths are globally balanced w.r.t the number of routes per link,
911       and  are  kept  as  short  as possible while enforcing deadlock-freedom
912       within the VL constraint.
913
914       OpenSM also supports a file method which can load routes from a  table.
915       See ´Modular Routing Engine´ for more information on this.
916
917       The basic routing algorithm is comprised of two stages:
918
919       1. MinHop matrix calculation
920          How many hops are required to get from each port to each LID ?
921          The  algorithm to fill these tables is different if you run standard
922       (min hop) or Up/Down.
923          For standard routing, a "relaxation" algorithm is used to  propagate
924       min hop from every destination LID through neighbor switches
925          For Up/Down routing, a BFS from every target is used. The BFS tracks
926       link direction (up or down) and avoid steps that will perform up  after
927       a down step was used.
928
929       2. Once MinHop matrices exist, each switch is visited and for each tar‐
930       get LID a decision is made as to what port should be  used  to  get  to
931       that LID.
932          This step is common to standard and Up/Down routing. Each port has a
933       counter counting the number of target LIDs going through it.
934          When there are multiple alternative ports with same MinHop to a LID,
935       the one with less previously assigned LIDs is selected.
936          If  LMC  >  0,  more  checks  are  added:  Within each group of LIDs
937       assigned to same target port,
938          a. use only ports which have same MinHop
939          b. first prefer the ones that go to different systemImageGuid  (then
940       the previous LID of the same LMC group)
941          c. if none - prefer those which go through another NodeGuid
942          d. fall back to the number of paths method (if all go to same node).
943
944       Effect of Topology Changes
945
946       OpenSM  will  preserve  existing  routing in any case where there is no
947       change in the fabric switches unless the -r (--reassign_lids) option is
948       specified.
949
950       -r
951       --reassign_lids
952                 This option causes OpenSM to reassign LIDs to all
953                 end nodes. Specifying -r on a running subnet
954                 may disrupt subnet traffic.
955                 Without -r, OpenSM attempts to preserve existing
956                 LID assignments resolving multiple use of same LID.
957
958       If  a  link is added or removed, OpenSM does not recalculate the routes
959       that do not have to change. A route has to change if  the  port  is  no
960       longer  UP or no longer the MinHop. When routing changes are performed,
961       the same algorithm for balancing the routes is invoked.
962
963       In the case of using the file based routing, any topology  changes  are
964       currently  ignored  The  'file' routing engine just loads the LFTs from
965       the file specified, with no reaction to real topology. Obviously,  this
966       will  not be able to recheck LIDs (by GUID) for disconnected nodes, and
967       LFTs for non-existent  switches  will  be  skipped.  Multicast  is  not
968       affected by 'file' routing engine (this uses min hop tables).
969
970
971       Min Hop Algorithm
972
973       The  Min Hop algorithm is invoked by default if no routing algorithm is
974       specified.  It can also be invoked by specifying '-R minhop'.
975
976       The Min Hop algorithm is divided into two stages: computation  of  min-
977       hop  tables  on  every switch and LFT output port assignment. Link sub‐
978       scription is also equalized with the ability to override based on  port
979       GUID. The latter is supplied by:
980
981       -i <equalize-ignore-guids-file>
982       --ignore_guids <equalize-ignore-guids-file>
983                 This option provides the means to define a set of ports
984                 (by guid) that will be ignored by the link load
985                 equalization algorithm. Note that only endports (CA,
986                 switch port 0, and router ports) and not switch external
987                 ports are supported.
988
989       LMC awareness routes based on (remote) system or switch basis.
990
991
992       Purpose of UPDN Algorithm
993
994       The  UPDN  algorithm is designed to prevent deadlocks from occurring in
995       loops of the subnet. A loop-deadlock is a situation in which it  is  no
996       longer  possible  to  send data between any two hosts connected through
997       the loop. As such, the UPDN routing algorithm should  be  used  if  the
998       subnet  is  not  a pure Fat Tree, and one of its loops may experience a
999       deadlock (due, for example, to high pressure).
1000
1001       The UPDN algorithm is based on the following main stages:
1002
1003       1.  Auto-detect root nodes - based on the CA hop length from any switch
1004       in  the  subnet,  a statistical histogram is built for each switch (hop
1005       num vs number of occurrences). If the  histogram  reflects  a  specific
1006       column  (higher than others) for a certain node, then it is marked as a
1007       root node. Since the algorithm is statistical, it may not find any root
1008       nodes.  The  list  of the root nodes found by this auto-detect stage is
1009       used by the ranking process stage.
1010
1011           Note 1: The user can override the node list manually.
1012           Note 2: If this stage cannot find any root nodes, and the user did
1013                   not specify a guid list file, OpenSM defaults back to the
1014                   Min Hop routing algorithm.
1015
1016       2.  Ranking process - All root switch nodes  (found  in  stage  1)  are
1017       assigned  a  rank of 0. Using the BFS algorithm, the rest of the switch
1018       nodes in the subnet are ranked incrementally. This ranking aids in  the
1019       process of enforcing rules that ensure loop-free paths.
1020
1021       3.   Min  Hop Table setting - after ranking is done, a BFS algorithm is
1022       run from each (CA or  switch)  node  in  the  subnet.  During  the  BFS
1023       process, the FDB table of each switch node traversed by BFS is updated,
1024       in reference to the starting node, based on the ranking rules and  guid
1025       values.
1026
1027       At  the  end  of  the  process, the updated FDB tables ensure loop-free
1028       paths through the subnet.
1029
1030       Note: Up/Down routing does not allow LID routing communication  between
1031       switches that are located inside spine "switch systems".  The reason is
1032       that there is no way to allow a LID route between them  that  does  not
1033       break  the  Up/Down  rule.  One ramification of this is that you cannot
1034       run SM on switches other than the leaf switches of the fabric.
1035
1036
1037       UPDN Algorithm Usage
1038
1039       Activation through OpenSM
1040
1041       Use '-R updn' option (instead of old '-u') to activate the  UPDN  algo‐
1042       rithm.   Use  '-a  <root_guid_file>'  for adding an UPDN guid file that
1043       contains the root nodes for ranking.  If the `-a' option is  not  used,
1044       OpenSM uses its auto-detect root nodes algorithm.
1045
1046       Notes on the guid list file:
1047
1048       1.    A  valid guid file specifies one guid in each line. Lines with an
1049       invalid format will be discarded.
1050       2.   The user should specify the root switch guids. However, it is also
1051       possible  to  specify  CA guids; OpenSM will use the guid of the switch
1052       (if it exists) that connects the CA to the subnet as a root node.
1053
1054       Purpose of DNUP Algorithm
1055
1056       The DNUP algorithm is designed to serve a similar purpose to UPDN. How‐
1057       ever it is intended to work in network topologies which are unsuited to
1058       UPDN due to nodes being connected closer to the roots than some of  the
1059       switches.   An  example  would  be  a  fabric  which contains nodes and
1060       uplinks connected to the same switch. The operation of DNUP is the same
1061       as  UPDN with the exception of the ranking process.  In DNUP all switch
1062       nodes are ranked based solely on their  distance  from  CA  Nodes,  all
1063       switch nodes directly connected to at least one CA are assigned a value
1064       of 1 all other switch nodes are assigned a value of one more  than  the
1065       minimum rank of all neighbor switch nodes.
1066
1067       Fat-tree Routing Algorithm
1068
1069       The fat-tree algorithm optimizes routing for "shift" communication pat‐
1070       tern.  It should be chosen if a subnet is a symmetrical or almost  sym‐
1071       metrical  fat-tree  of  various  types.   It supports not just K-ary-N-
1072       Trees, by handling for non-constant K, cases where not all leafs  (CAs)
1073       are present, any CBB ratio.  As in UPDN, fat-tree also prevents credit-
1074       loop-deadlocks.
1075
1076       If the root guid file  is  not  provided  ('-a'  or  '--root_guid_file'
1077       options),  the  topology has to be pure fat-tree that complies with the
1078       following rules:
1079         - Tree rank should be between two and eight (inclusively)
1080         - Switches of the same rank should have the same number
1081           of UP-going port groups*, unless they are root switches,
1082           in which case the shouldn't have UP-going ports at all.
1083         - Switches of the same rank should have the same number
1084           of DOWN-going port groups, unless they are leaf switches.
1085         - Switches of the same rank should have the same number
1086           of ports in each UP-going port group.
1087         - Switches of the same rank should have the same number
1088           of ports in each DOWN-going port group.
1089         - All the CAs have to be at the same tree level (rank).
1090
1091       If the root guid file is provided, the topology doesn't have to be pure
1092       fat-tree, and it should only comply with the following rules:
1093         - Tree rank should be between two and eight (inclusively)
1094         - All the Compute Nodes** have to be at the same tree level (rank).
1095           Note that non-compute node CAs are allowed here to be at different
1096           tree ranks.
1097
1098       *  ports that are connected to the same remote switch are referenced as
1099       ´port group´.
1100
1101       **  list  of  compute  nodes  (CNs)  can  be  specified  by   ´-u´   or
1102       ´--cn_guid_file´ OpenSM options.
1103
1104       Topologies  that  do  not  comply  cause a fallback to min hop routing.
1105       Note that this can also occur on link failures which cause the topology
1106       to no longer be "pure" fat-tree.
1107
1108       Note  that  although fat-tree algorithm supports trees with non-integer
1109       CBB ratio, the routing will not be as balanced as in  case  of  integer
1110       CBB  ratio.   In  addition  to this, although the algorithm allows leaf
1111       switches to have any number of CAs, the closer the tree is to be  fully
1112       populated,  the  more  effective the "shift" communication pattern will
1113       be.  In general, even if the root list  is  provided,  the  closer  the
1114       topology to a pure and symmetrical fat-tree, the more optimal the rout‐
1115       ing will be.
1116
1117       The algorithm also dumps compute node ordering  file  (opensm-ftree-ca-
1118       order.dump)  in  the  same directory where the OpenSM log resides. This
1119       ordering file provides the CN order that may be used  to  create  effi‐
1120       cient communication pattern, that will match the routing tables.
1121
1122       Routing between non-CN nodes
1123
1124       The use of the cn_guid_file option allows non-CN nodes to be located on
1125       different levels in the fat tree.  In such case, it is  not  guaranteed
1126       that  the  Fat  Tree algorithm will route between two non-CN nodes.  To
1127       solve this problem, a list of non-CN nodes can be specified by ´-G´  or
1128       ´--io_guid_file´  option.  Theses nodes will be allowed to use switches
1129       the wrong way round a specific number of times (specified  by  ´-H´  or
1130       ´--max_reverse_hops´.     With    the   proper   max_reverse_hops   and
1131       io_guid_file values, you can ensure full connectivity in the Fat Tree.
1132
1133       Please note that using max_reverse_hops creates  routes  that  use  the
1134       switch  in  a  counter-stream way.  This option should never be used to
1135       connect nodes with high bandwidth traffic between them ! It should only
1136       be  used to allow connectivity for HA purposes or similar.  Also having
1137       routes the other way around can in theory cause credit loops.
1138
1139       Use these options with extreme care !
1140
1141       Activation through OpenSM
1142
1143       Use '-R ftree' option to activate  the  fat-tree  algorithm.   Use  '-a
1144       <root_guid_file>' to provide root nodes for ranking. If the `-a' option
1145       is not used, routing algorithm will detect  roots  automatically.   Use
1146       '-u  <root_cn_file>'  to provide the list of compute nodes. If the `-u'
1147       option is not used, all the CAs are considered as compute nodes.
1148
1149       Note: LMC > 0 is not supported by fat-tree routing. If this  is  speci‐
1150       fied, the default routing algorithm is invoked instead.
1151
1152
1153       LASH Routing Algorithm
1154
1155       LASH is an acronym for LAyered SHortest Path Routing. It is a determin‐
1156       istic shortest path routing algorithm that  enables  topology  agnostic
1157       deadlock-free routing within communication networks.
1158
1159       When computing the routing function, LASH analyzes the network topology
1160       for the shortest-path routes between all pairs of  sources  /  destina‐
1161       tions  and  groups  these paths into virtual layers in such a way as to
1162       avoid deadlock.
1163
1164       Note LASH analyzes routes and ensures deadlock freedom  between  switch
1165       pairs.  The link from HCA between and switch does not need virtual lay‐
1166       ers as deadlock will not arise between switch and HCA.
1167
1168       In more detail, the algorithm works as follows:
1169
1170       1) LASH determines the shortest-path between all pairs of source / des‐
1171       tination  switches.  Note,  LASH  ensures  the  same SL is used for all
1172       SRC/DST - DST/SRC pairs and there is no guarantee that the return  path
1173       for a given DST/SRC will be the reverse of the route SRC/DST.
1174
1175       2)  LASH then begins an SL assignment process where a route is assigned
1176       to a layer (SL) if the addition of that route does not  cause  deadlock
1177       within  that  layer.  This  is  achieved by maintaining and analysing a
1178       channel dependency graph for each layer. Once the potential addition of
1179       a path could lead to deadlock, LASH opens a new layer and continues the
1180       process.
1181
1182       3) Once this stage has been completed, it is  highly  likely  that  the
1183       first  layers  processed  will contain more paths than the latter ones.
1184       To better balance the use of layers, LASH moves paths from one layer to
1185       another so that the number of paths in each layer averages out.
1186
1187       Note,  the implementation of LASH in opensm attempts to use as few lay‐
1188       ers as possible. This number can be less than the number of actual lay‐
1189       ers available.
1190
1191       In  general  LASH  is  a  very flexible algorithm. It can, for example,
1192       reduce to Dimension Order Routing in certain topologies, it is topology
1193       agnostic and fares well in the face of faults.
1194
1195       It  has been shown that for both regular and irregular topologies, LASH
1196       outperforms Up/Down. The reason for this is that LASH  distributes  the
1197       traffic  more  evenly through a network, avoiding the bottleneck issues
1198       related to a root node and always routes shortest-path.
1199
1200       The algorithm was developed by Simula Research Laboratory.
1201
1202
1203       Use '-R lash -Q ' option to activate the LASH algorithm.
1204
1205       Note: QoS support has to be turned on in order that SL/VL mappings  are
1206       used.
1207
1208       Note:  LMC  > 0 is not supported by the LASH routing. If this is speci‐
1209       fied, the default routing algorithm is invoked instead.
1210
1211       For open regular cartesian meshes the DOR algorithm is the ideal  rout‐
1212       ing  algorithm. For toroidal meshes on the other hand there are routing
1213       loops that can cause deadlocks. LASH can be used to route these  cases.
1214       The  performance of LASH can be improved by preconditioning the mesh in
1215       cases where there are multiple links connecting switches  and  also  in
1216       cases  where the switches are not cabled consistently. An option exists
1217       for LASH to do this. To invoke this use '-R  lash  -Q  --do_mesh_analy‐
1218       sis'.  This  will add an additional phase that analyses the mesh to try
1219       to determine the dimension and size of a mesh. If  it  determines  that
1220       the  mesh  looks  like an open or closed cartesian mesh it reorders the
1221       ports in dimension order before the rest of the LASH algorithm runs.
1222
1223       DOR Routing Algorithm
1224
1225       The Dimension Order Routing algorithm is based on the Min Hop algorithm
1226       and  so  uses  shortest paths.  Instead of spreading traffic out across
1227       different paths with the same shortest distance, it chooses  among  the
1228       available shortest paths based on an ordering of dimensions.  Each port
1229       must be consistently cabled to represent a  hypercube  dimension  or  a
1230       mesh  dimension.   Alternatively, the -O option can be used to assign a
1231       custom mapping between the ports on a given switch, and the  associated
1232       dimension.   Paths  are grown from a destination back to a source using
1233       the lowest dimension (port) of available paths at each step.  This pro‐
1234       vides  the ordering necessary to avoid deadlock.  When there are multi‐
1235       ple links between any two  switches,  they  still  represent  only  one
1236       dimension  and traffic is balanced across them unless port equalization
1237       is turned off.  In the case of hypercubes, the same port must  be  used
1238       throughout the fabric to represent the hypercube dimension and match on
1239       both ends of the cable, or the -O option used to accomplish the  align‐
1240       ment.  In the case of meshes, the dimension should consistently use the
1241       same pair of ports, one port on one end of the  cable,  and  the  other
1242       port  on  the other end, continuing along the mesh dimension, or the -O
1243       option used as an override.
1244
1245       Use '-R dor' option to activate the DOR algorithm.
1246
1247       DFSSSP and SSSP Routing Algorithm
1248
1249       The (Deadlock-Free) Single-Source-Shortest-Path  routing  algorithm  is
1250       designed  to optimize link utilization thru global balancing of routes,
1251       while supporting arbitrary topologies.  The  DFSSSP  routing  algorithm
1252       uses InfiniBand virtual lanes (SL) to provide deadlock-freedom.
1253
1254       The DFSSSP algorithm consists of five major steps:
1255       1)  It  discovers the subnet and models the subnet as a directed multi‐
1256       graph in which each node represents a node of the physical network  and
1257       each  edge  represents  one  direction of the full-duplex links used to
1258       connect the nodes.
1259       2) A loop, which iterates over all CA and switches of the subnet,  will
1260       perform  three  steps to generate the linear forwarding tables for each
1261       switch:
1262       2.1) use Dijkstra's algorithm to find the shortest path from all  nodes
1263       to the current selected destination;
1264       2.2)  update  the  edge  weights  in  the graph, i.e. add the number of
1265       routes, which use a link to reach the destination, to the link/edge;
1266       2.3) update the LFT of each switch with the  outgoing  port  which  was
1267       used in the current step to route the traffic to the destination node.
1268       3)  After the number of available virtual lanes or layers in the subnet
1269       is detected and a channel dependency  graph  is  initialized  for  each
1270       layer,  the  algorithm  will put each possible route of the subnet into
1271       the first layer.
1272       4) A loop iterates over all channel dependency graphs  (CDG)  and  per‐
1273       forms the following substeps:
1274       4.1) search for a cycle in the current CDG;
1275       4.2)  when  a  cycle is found, i.e. a possible deadlock is present, one
1276       edge is selected and all routes, which induced this edge, are moved  to
1277       the "next higher" virtual layer (CDG[i+1]);
1278       4.3)  the  cycle  search  is  continued until all cycles are broken and
1279       routes are moved "up".
1280       5) When the number of needed layers does  not  exceeds  the  number  of
1281       available  SL/VL to remove all cycles in all CDGs, the routing is dead‐
1282       lock-free and an  relation  table  is  generated,  which  contains  the
1283       assignment of routes from source to destination to a SL
1284
1285       Note on SSSP:
1286       This  algorithm does not perform the steps 3)-5) and can not be consid‐
1287       ered to be deadlock-free for all topologies. But on the one  hand,  you
1288       can  choose  this  algorithm  for really large networks (5,000+ CAs and
1289       deadlock-free by design) to reduce the runtime of the algorithm. On the
1290       other hand, you might use the SSSP routing algorithm as an alternative,
1291       when all deadlock-free routing algorithms fail to route the network for
1292       whatever  reason.   In  the  last case, SSSP was designed to deliver an
1293       equal or higher bandwidth due to better congestion avoidance  than  the
1294       Min Hop routing algorithm.
1295
1296       Notes for usage:
1297       a) running DFSSSP: '-R dfsssp -Q'
1298       a.1)  QoS has to be configured to equally spread the load on the avail‐
1299       able SL or virtual lanes
1300       a.2) applications must perform a path record query to get path  SL  for
1301       each route, which the application will use to transmit packages
1302       b) running SSSP:   '-R sssp'
1303       c) both algorithms support LMC > 0
1304
1305       Hints for optimizing I/O traffic:
1306       Having more nodes (I/O and compute) connected to a switch than incoming
1307       links can result in a 'bad' routing of  the  I/O  traffic  as  long  as
1308       (DF)SSSP  routing is not aware of the dedicated I/O nodes, i.e., in the
1309       following network configuration CN1-CN3 might send all I/O traffic  via
1310       Link2 to IO1,IO2:
1311
1312            CN1         Link1        IO1
1313               \       /----\       /
1314         CN2 -- Switch1      Switch2 -- CN4
1315               /       \----/       \
1316            CN3         Link2        IO2
1317
1318       To  prevent  this from happening (DF)SSSP can use both the compute node
1319       guid  file  and  the  I/O  guid  file  specified   by   the   ´-u´   or
1320       ´--cn_guid_file´  and  ´-G´ or ´--io_guid_file´ options (similar to the
1321       Fat-Tree routing).  This ensures that traffic towards compute nodes and
1322       I/O  nodes  is balanced separately and therefore distributed as much as
1323       possible across the available links. Port GUIDs, as listed  by  ibstat,
1324       must be specified (not Node GUIDs).
1325       The priority for the optimization is as follows:
1326         compute nodes -> I/O nodes -> other nodes
1327       Possible use case scenarios:
1328       a)  neither  ´-u´ nor ´-G´ are specified: all nodes a treated as ´other
1329       nodes´ and therefore balanced equally;
1330       b) ´-G´ is specified: traffic towards I/O nodes will be balanced  opti‐
1331       mally;
1332       c)  the  system  has three node types, such as login/admin, compute and
1333       I/O, but the balancing focus should be I/O, then one has  to  use  ´-u´
1334       and  ´-G´  with I/O guids listed in cn_guid_file and compute node guids
1335       listed in io_guid_file;
1336       d) ...
1337
1338       Torus-2QoS Routing Algorithm
1339
1340       Torus-2QoS is routing algorithm designed for  large-scale  2D/3D  torus
1341       fabrics; see torus-2QoS(8) for full documentation.
1342
1343       Use  '-R  torus-2QoS  -Q' or '-R torus-2QoS,no_fallback -Q' to activate
1344       the torus-2QoS algorithm.
1345
1346       Nue Routing Algorithm
1347
1348       Use either `-R nue' or `-R nue -Q --nue_max_num_vls <int>' to  activate
1349       Nue.
1350
1351       Note:  if  `--nue_max_num_vls'  is specified and unequal to 1, then QoS
1352       support must be turned on, so that SL2VL mappings are valid and  appli‐
1353       cations  comply  with  suggested  SLs  to  avoid credit-loops. For more
1354       details on QoS and Nue see below.
1355
1356       The implementation of Nue routing for OpenSM is a 100%-applicable, bal‐
1357       anced,  and deadlock-free unicast routing engine (which also configures
1358       multicast tables, see 'Note on multicast' below).  The  key  points  of
1359       this algorithm are the following:
1360         - 100% fault-tolerant, oblivious routing strategy
1361         - topology-agnostic, i.e., applicable to every topology (no matter if
1362       topology
1363           is regular, irregular after faults, or random)
1364         - 100% deadlock-free routing within the  resource  limits  (i.e.,  it
1365       never
1366           exceeds  the  given  number of available virtual lanes, and it does
1367       not
1368           necessarily require virtual lanes) for every topology
1369         - very good path balancing and therefore high throughput (even better
1370       when
1371           using METIS, see notes below)
1372         -  QoS  (via  SLs/VLs) + deadlock-freedom can be combined (since both
1373       rely on
1374           VLs), e.g., using VL0-3 for  Nue's  deadlock-freedom  (and  1.  QoS
1375       level) and
1376           VL4-7 as second QoS level
1377         -  forwarding  tables  are fast to calculate: O(n^2 * log n), however
1378       slightly
1379           slower compared to topology-aware routings (for  obvious  reasons),
1380       and
1381         -  the  path-to-VL mapping only depends on the destination, which may
1382       be useful
1383           for scalable, efficient path resolution and caching mechanisms.
1384       From a very high level perspective, Nue routing is  similar  to  DFSSSP
1385       (see above) in the sense that both use Dijkstra and edge weight updates
1386       for path balancing, and paths are mapped to virtual layers  assuming  a
1387       1:1  mapping  of  SL2VL tables.  However, the fundamental difference is
1388       that Nue routing doesn't perform the path calculation on the graph rep‐
1389       resenting the real fabric, and instead routes directly within the chan‐
1390       nel dependency graph. This approach allows Nue routing to place routing
1391       restrictions  (to avoid any credit-loops) in an on-demand manner, which
1392       overcomes the problem of all other good VL-based algorithms.   Meaning,
1393       the  competitors  cannot control or limit the use of VLs, and might run
1394       out of them and have to give up. On the flip side, Nue may have to  use
1395       detours for a few routes, and hence cannot really be considered "short‐
1396       est-path" routing, because it is  impossible  to  accomplish  deadlock-
1397       free, shortest-path routing with an limited number of available virtual
1398       lanes for arbitrary network topologies.
1399
1400       Note on the use of METIS library with Nue:
1401       Nue routing may has to separate the LIDs into multiple subsets, one for
1402       every  virtual  layer, if multiple layers are used. Nue has two options
1403       to perform this partitioning (not to be confused with  IB  partitions);
1404       the  first  is  a  fairly simple semi-random assignment of LIDs to lay‐
1405       ers/subsets, and the second partitioning uses the METIS library to par‐
1406       tition  the  network  graph into k approximately equal sized parts. The
1407       latter approach has shown better results in terms of path balancing and
1408       avoidance  of  using  fallback paths, and hence it is HIGHLY advised to
1409       install/use the METIS library  with  OpenSM  (enforced  via  `--enable-
1410       metis'  configure  flag  when building OpenSM). For the rare case, that
1411       METIS isn't packaged with the Linux distro, here is a link to the offi‐
1412       cial website to download and install METIS 5.1.0 manually:
1413          http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
1414       OpenSM's  configure  script  also provides options in case METIS header
1415       and library aren't found in the default path.
1416
1417       Runtime options for Nue:
1418       The behavior of Nue routing can be directly influenced by the  osm.conf
1419       parameter (which is also available as command line option):
1420         - nue_max_num_vls: controls/limits the number of virtual lanes/layers
1421       which
1422              Nue is allowed to use (detailed explanation in osm.conf file).
1423       Furthermore, Nue supports  TRUE  and  FALSE  settings  of  avoid_throt‐
1424       tled_links,  use_ucast_cache, and qos (more on this hereafter); and lmc
1425       > 0.
1426
1427       Notes on Quality of Service (QoS):
1428       The advantage of Nue is that  it  works  with  AND  without  QoS  being
1429       enabled,  i.e.,  the  usage  of  SLs/VLs  for  deadlock-freedom  can be
1430       avoided. Here are the three possible usage scenarios:
1431         - neither setting `--nue_max_num_vls <int>'  nor  `-Q':  Nue  assumes
1432       that only 1
1433              virtual  layer  (identical to physical network; or OperVLs equal
1434       to VL0) is
1435              usable and all paths are to be calculated within this one layer.
1436       Hence,
1437              there  is  no need for special SL2VL mappings in the network and
1438       the use of
1439              specific SLs by applications.
1440         - setting `-Q' but not `--nue_max_num_vls  <int>':  This  combination
1441       works like
1442              the  previous  one,  meaning  the  SL  returned  for path record
1443       requests is not
1444              defined by Nue, since all paths are deadlock-free without  using
1445       VLs.
1446              However, any separate QoS settings may influence the SL returned
1447       to
1448              applications.
1449         - setting `-Q --nue_max_num_vls <int>' with int != 1: In this config‐
1450       uration,
1451              applications  have  to query and obey the SL for path records as
1452       returned
1453              by Nue because otherwise the deadlock-freedom cannot be  guaran‐
1454       teed
1455              anymore.  Furthermore, errors in the fabric may require applica‐
1456       tions to
1457              repath to avoid message deadlocks. Since Nue operates on virtual
1458       layer,
1459              admins  should  configure the SL2VL mapping tables in an homoge‐
1460       neous 1:1
1461              manner across the entire subnet to separate the layers.
1462       As an additional note, using more VLs  for  Nue  usually  improves  the
1463       overall  network throughput, so there are trade offs admins may have to
1464       consider when configuring the subnet manager with Nue routing.
1465
1466       Note on multicast:
1467       The Nue routing engine configures multicast forwarding tables  by  uti‐
1468       lizing  a spanning tree calculation routed at a subnet switch suggested
1469       by OpenSM. This spanning tree for a mcast group will  try  to  use  the
1470       least  overloaded  links (w.r.t the ucast paths-per-link metric/weight)
1471       in the fabric. However, Nue routing currently does not guarantee  dead‐
1472       lock-freedom for the set of multicast routes on all topologies, nor for
1473       the combination of deadlock-free unicast routes with additional  multi‐
1474       cast routes. Assuming, for a given topology the calculated mcast routes
1475       are dl-free, then an admin may fix the latter problem by separating the
1476       VLs,   e.g.,   using   VL0-6   for   unicast   routing   by  specifying
1477       `--nue_max_num_vls 7' and utilizing VL7 for multicast.
1478
1479
1480       Routing References
1481
1482       To learn more about deadlock-free routing, see  the  article  "Deadlock
1483       Free  Message  Routing  in  Multiprocessor Interconnection Networks" by
1484       William J Dally and Charles L Seitz (1985).
1485
1486       To learn more about the up/down algorithm, see the  article  "Effective
1487       Strategy  to Compute Forwarding Tables for InfiniBand Networks" by Jose
1488       Carlos Sancho, Antonio  Robles,  and  Jose  Duato  at  the  Universidad
1489       Politecnica de Valencia.
1490
1491       To learn more about LASH and the flexibility behind it, the requirement
1492       for layers, performance comparisons to other algorithms, see  the  fol‐
1493       lowing articles:
1494
1495       "Layered Routing in Irregular Networks", Lysne et al, IEEE Transactions
1496       on Parallel and Distributed Systems, VOL.16, No12, December 2005.
1497
1498       "Routing for the ASI Fabric Manager", Solheim et  al.  IEEE  Communica‐
1499       tions Magazine, Vol.44, No.7, July 2006.
1500
1501       "Layered  Shortest  Path  (LASH)  Routing in Irregular System Area Net‐
1502       works", Skeie et al. IEEE Computer Society  Communication  Architecture
1503       for Clusters 2002.
1504
1505       To  learn  more  about  the  DFSSSP and SSSP routing algorithm, see the
1506       articles:
1507       J. Domke, T. Hoefler and W. Nagel: Deadlock-Free Oblivious Routing  for
1508       Arbitrary  Topologies,  In  Proceedings  of the 25th IEEE International
1509       Parallel & Distributed Processing Symposium (IPDPS 2011)
1510       T. Hoefler, T. Schneider and A. Lumsdaine: Optimized Routing for Large-
1511       Scale  InfiniBand  Networks, In 17th Annual IEEE Symposium on High Per‐
1512       formance Interconnects (HOTI 2009)
1513
1514       To learn more about the Nue routing algorithm, see the article "Routing
1515       on  the  Dependency Graph: A New Approach to Deadlock-Free High-Perfor‐
1516       mance Routing" by J. Domke, T. Hoefler and S.  Matsuoka  (published  in
1517       HPDC'16).
1518
1519       Modular Routine Engine
1520
1521       Modular  routing engine structure allows for the ease of "plugging" new
1522       routing modules.
1523
1524       Currently, only unicast callbacks are supported. Multicast can be added
1525       later.
1526
1527       One  existing  routing module is up-down "updn", which may be activated
1528       with '-R updn' option (instead of old '-u').
1529
1530       General usage is: $ opensm -R 'module-name'
1531
1532       There is also a trivial routing module which is able to load LFT tables
1533       from a file.
1534
1535       Main features:
1536
1537        - this will load switch LFTs and/or LID matrices (min hops tables)
1538        - this will load switch LFTs according to the path entries introduced
1539          in the file
1540        - no additional checks will be performed (such as "is port connected",
1541          etc.)
1542        - in case when fabric LIDs were changed this will try to reconstruct
1543          LFTs correctly if endport GUIDs are represented in the file
1544          (in order to disable this, GUIDs may be removed from the file
1545           or zeroed)
1546
1547       The  file  format  is  compatible with output of 'ibroute' util and for
1548       whole fabric can be generated with dump_lfts.sh script.
1549
1550       To activate file based routing module, use:
1551
1552         opensm -R file -U /path/to/lfts_file
1553
1554       If the lfts_file is not found or is in error, the default routing algo‐
1555       rithm is utilized.
1556
1557       The  ability  to dump switch lid matrices (aka min hops tables) to file
1558       and later to load these is also supported.
1559
1560       The usage is similar to unicast forwarding tables loading from  a  lfts
1561       file  (introduced  by  'file'  routing engine), but new lid matrix file
1562       name should be specified by -M or --lid_matrix_file option.  For  exam‐
1563       ple:
1564
1565         opensm -R file -M ./opensm-lid-matrix.dump
1566
1567       The  dump  file is named ´opensm-lid-matrix.dump´ and will be generated
1568       in  standard  opensm  dump  directory  (/var/log   by   default)   when
1569       OSM_LOG_ROUTING logging flag is set.
1570
1571       When routing engine 'file' is activated, but the lfts file is not spec‐
1572       ified or not cannot be open default lid matrix algorithm will be used.
1573
1574       There is also a switch forwarding tables dumper which generates a  file
1575       compatible with dump_lfts.sh output. This file can be used as input for
1576       forwarding tables loading by 'file' routing engine.   Both  or  one  of
1577       options -U and -M can be specified together with ´-R file´.
1578
1579

PER MODULE LOGGING CONFIGURATION

1581       To  enable per module logging, configure per_module_logging_file to the
1582       per module logging config file name in the opensm options file. To dis‐
1583       able, configure per_module_logging_file to (null) there.
1584
1585       The per module logging config file format is a set of lines with module
1586       name and logging level as follows:
1587
1588        <module name><separator><logging level>
1589
1590        <module name> is the file name including .c
1591        <separator> is either = , space, or tab
1592        <logging level> is the same levels as used in the coarse/overall
1593        logging as follows:
1594
1595        BIT    LOG LEVEL ENABLED
1596        ----   -----------------
1597        0x01 - ERROR (error messages)
1598        0x02 - INFO (basic messages, low volume)
1599        0x04 - VERBOSE (interesting stuff, moderate volume)
1600        0x08 - DEBUG (diagnostic, high volume)
1601        0x10 - FUNCS (function entry/exit, very high volume)
1602        0x20 - FRAMES (dumps all SMP and GMP frames)
1603        0x40 - ROUTING (dump FDB routing information)
1604        0x80 - SYS (syslog at LOG_INFO level in addition to OpenSM logging)
1605
1606

FILES

1608       /etc/rdma/opensm.conf
1609              default OpenSM config file.
1610
1611
1612       /etc/rdma/ib-node-name-map
1613              default node name map file.  See ibnetdiscover for more informa‐
1614              tion on format.
1615
1616
1617       /etc/rdma/partitions.conf
1618              default partition config file
1619
1620
1621       /etc/rdma/qos-policy.conf
1622              default QOS policy config file
1623
1624
1625       /etc/rdma/prefix-routes.conf
1626              default prefix routes file
1627
1628
1629       /etc/rdma/per-module-logging.conf
1630              default per module logging config file
1631
1632
1633       /etc/rdma/torus-2QoS.conf
1634              default torus-2QoS config file
1635
1636

AUTHORS

1638       Hal Rosenstock
1639              <hal@mellanox.com>
1640
1641       Sasha Khapyorsky
1642              <sashak@voltaire.com>
1643
1644       Eitan Zahavi
1645              <eitan@mellanox.co.il>
1646
1647       Yevgeny Kliteynik
1648              <kliteyn@mellanox.co.il>
1649
1650       Thomas Sodring
1651              <tsodring@simula.no>
1652
1653       Ira Weiny
1654              <weiny2@llnl.gov>
1655
1656       Dale Purdy
1657              <purdy@sgi.com>
1658
1659

SEE ALSO

1661       torus-2QoS(8), torus-2QoS.conf(5).
1662
1663
1664
1665OpenIB                           Sept 15, 2014                       OPENSM(8)
Impressum