1fi_opx(7)                      Libfabric v1.18.1                     fi_opx(7)
2
3
4
5       {%include JB/setup %}
6

NAME

8       fi_opx - The Omni-Path Express Fabric Provider
9

OVERVIEW

11       The  opx  provider is a native libfabric provider suitable for use with
12       Omni-Path fabrics.  OPX features great scalability and performance when
13       running libfabric-enabled message layers.
14       OPX requires 3 additonal external development libraries to build: libu‐
15       uid, libnuma, and the Linux kernel headers.
16

SUPPORTED FEATURES

18       The OPX provider supports most features defined for the libfabric API.
19
20       Key features include:
21
22       Endpoint types
23              The Omni-Path HFI hardware is connectionless and reliable.   The
24              OPX provider only supports the FI_EP_RDM endpoint type.
25
26       Capabilities
27              Supported   capabilities   include  FI_MSG,  FI_RMA,  FI_TAGGED,
28              FI_ATOMIC, FI_NAMED_RX_CTX, FI_SOURCE, FI_SEND, FI_RECV, FI_MUL‐
29              TI_RECV, FI_DIRECTED_RECV, FI_SOURCE*.
30
31       Notes  on FI_DIRECTED_RECV capability: The immediate data which is sent
32       within the “senddata” call to support FI_DIRECTED_RECV for OPX must  be
33       exactly  4  bytes, which OPX uses to completely identify the source ad‐
34       dress to an exascale-level number of ranks for tag matching on the recv
35       and  can  be managed within the MU packet.  Therefore the domain attri‐
36       bute “cq_data_size” is set to 4 which is the OFI standard minimum.
37
38       Modes  Two modes are defined: FI_CONTEXT2 and  FI_ASYNC_IOV.   The  OPX
39              provider requires FI_CONTEXT2.
40
41       Additional features
42              Supported  additional  features  include FABRIC_DIRECT, scalable
43              endpoints, and counters.
44
45       Progress
46              FI_PROGRESS_MANUAL and FI_PROGRESS_AUTO are supported, for  best
47              performance,     use     FI_PROGRESS_MANUAL    when    possible.
48              FI_PROGRESS_AUTO will spawn 1 thread per CQ.
49
50       Address vector
51              FI_AV_MAP and FI_AV_TABLE are both supported.  FI_AV_MAP is  de‐
52              fault.
53
54       Memory registration modes
55              Only FI_MR_SCALABLE is supported.
56

UNSUPPORTED FEATURES

58       Endpoint types
59              Unsupported endpoint types include FI_EP_DGRAM and FI_EP_MSG.
60
61       Capabilities
62              The  OPX  provider  does not support FI_RMA_EVENT and FI_TRIGGER
63              capabilities.
64

LIMITATIONS

66       OPX supports the following MPI versions:
67
68       Intel MPI from Parallel Studio 2020, update 4.  Intel MPI  from  OneAPI
69       2021,  update  3.  Open MPI 4.1.2a1 (Older version of Open MPI will not
70       work).  MPICH 3.4.2 and later.
71
72       Usage:
73
74       If using with OpenMPI 4.1.x, disable UCX and openib transports.  OPX is
75       not compatible with Open MPI 4.1.x PML/BTL.
76

CONFIGURATION OPTIONS

78       OPX_AV OPX  supports  the  option  of  setting  the AV mode to use in a
79              build.  3 settings are supported: - table - map - runtime
80
81       Using table or map will only allow OPX to use FI_AV_TABLE or FI_AV_MAP.
82       Using  runtime  will  allow OPX to use either AV mode depending on what
83       the application requests.  Specifying map or table however may lead  to
84       a slight performance improvement depending on the application.
85
86       To  change  OPX_AV,  add OPX_AV=table, OPX_AV=map, or OPX_AV=runtime to
87       the configure command.   For  example,  to  create  a  new  build  with
88       OPX_AV=table:
89       OPX_AV=table ./configure
90       make install
91       There is no way to change OPX_AV after it is set.  If OPX_AV is not set
92       in the configure, the default value is runtime.
93

RUNTIME PARAMETERS

95       FI_OPX_UUID
96              OPX requires a unique ID for each job.  In order  for  all  pro‐
97              cesses  in a job to communicate with each other, they require to
98              use  the  same  UUID.    This   variable   can   be   set   with
99              FI_OPX_UUID=${RANDOM}       The       default       UUID      is
100              00112233445566778899aabbccddeeff.
101
102       FI_OPX_RELIABILITY_SERVICE_USEC_MAX
103              This setting  controls  how  frequently  the  reliability/replay
104              function  will  issue PING requests to a remote connection.  Re‐
105              ducing this value may improve performance at the expense of  in‐
106              creased traffic on the OPX fabric.  Default setting is 500.
107
108       FI_OPX_RELIABILITY_SERVICE_PRE_ACK_RATE
109              This  setting controls how frequently a receiving rank will send
110              ACKs for packets it has received without being prompted  through
111              a  PING request.  A non-zero value N tells the receiving rank to
112              send an ACK for the last N packets every Nth  packet.   Used  in
113              conjunction  with an increased value for FI_OPX_RELIABILITY_SER‐
114              VICE_USEC_MAX may improve performance.
115
116       Valid values are 0 (disabled) and powers of 2 in the range of 1-32,768,
117       inclusive.
118
119       Default setting is 64.
120
121       FI_OPX_PROG_AFFINITY
122              This sets the affinity to be used for any progress threads.  Set
123              as a colon-separated triplet as start:end:stride,  where  stride
124              controls  the  interval  between  selected  cores.  For example,
125              1:5:2 will have cores 1, 3, and 5 as valid  cores  for  progress
126              threads.  Default is 1:4:1.
127
128       FI_OPX_AUTO_PROGRESS_INTERVAL_USEC
129              This setting controls the time (in usecs) between polls for auto
130              progress threads.  Default is 1.
131
132       FI_OPX_HFI_SELECT
133              Controls how OPX chooses which HFI to use when  opening  a  con‐
134              text.   Has  two  forms:  - <hfi-unit> Force OPX provider to use
135              hfi-unit.  - <selector1>[,<selector2>[,...,<selectorN>]]  Select
136              HFI based on first matching selector
137
138       Where  selector is one of the following forms: - default to use the de‐
139       fault logic - fixed:<hfi-unit> to fix  to  one  hfi-unit  -  <selector-
140       type>:<hfi-unit>:<selector-data>
141
142       The above fields have the following meaning: - selector-type The selec‐
143       tor criteria the caller opening the context is  evaluated  against.   -
144       hfi-unit  The  HFI to use if the caller matches the selector.  - selec‐
145       tor-data Data the caller must match (e.g. NUMA node ID).
146
147       Where selector-type is one of the following: - numa True when caller is
148       local  to  the  NUMA  node ID given by selector-data.  - core True when
149       caller is local to the CPU core given by selector-data.
150
151       And selector-data is one of the following: - value The  specific  value
152       to  match  -  <range-start>-<range-end>  Matches with any value in that
153       range
154
155       In the second form, when opening a context, OPX uses  the  hfi-unit  of
156       the  first-matching  selector.   Selectors are evaluated left-to-right.
157       OPX will return an error if the caller does not match any selector.
158
159       In either form, it is an error if the specified or selected HFI is  not
160       in the Active state.  In this case, OPX will return an error and execu‐
161       tion will not continue.
162
163       With this option, it is possible to cause OPX to try to open more  con‐
164       texts  on  an  HFI  than  there are free contexts on that HFI.  In this
165       case, one or more of the context-opening calls will fail and  OPX  will
166       return an error.  For the second form, as which HFI is selected depends
167       on properties of the caller, deterministic HFI selection  requires  de‐
168       terministic  caller  properties.   E.g.   for the numa selector, if the
169       caller can migrate between NUMA domains, then HFI selection will not be
170       deterministic.
171
172       The  logic used will always be the first valid in a selector list.  For
173       example, default and fixed will match all callers, so if either are  in
174       the  beginning  of  a selector list, you will only use fixed or default
175       regardles of if there are any more selectors.
176
177       Examples: - FI_OPX_HFI_SELECT=0 all callers will open contexts  on  HFI
178       0.   -  FI_OPX_HFI_SELECT=1 all callers will open contexts on HFI 1.  -
179       FI_OPX_HFI_SELECT=numa:0:0,numa:1:1,numa:0:2,numa:1:3 callers local  to
180       NUMA  nodes 0 and 2 will use HFI 0, callers local to NUMA domains 1 and
181       3 will use HFI 1.  - FI_OPX_HFI_SELECT=numa:0:0-3,default callers local
182       to NUMA nodes 0 thru 3 (including 0 and 3) will use HFI 0, and all else
183       will use default selection logic.  - FI_OPX_HFI_SELECT=core:1:0,fixed:0
184       callers local to CPU core 0 will use HFI 1, and all others will use HFI
185       0.  - FI_OPX_HFI_SELECT=default,core:1:0 all callers will  use  default
186       HFI selection logic.
187

SEE ALSO

189       fabric(7), fi_provider(7), fi_getinfo(7),
190

AUTHORS

192       OpenFabrics.
193
194
195
196Libfabric Programmer’s Manual     2023-02-15                         fi_opx(7)
Impressum