1fi_efa(7)                      Libfabric v1.18.1                     fi_efa(7)
2
3
4

NAME

6       fi_efa - The Amazon Elastic Fabric Adapter (EFA) Provider
7

OVERVIEW

9       The  EFA  provider  supports the Elastic Fabric Adapter (EFA) device on
10       Amazon EC2.  EFA provides reliable and unreliable datagram send/receive
11       with direct hardware access from userspace (OS bypass).
12

SUPPORTED FEATURES

14       The following features are supported:
15
16       Endpoint types
17              The  provider  supports endpoint type FI_EP_DGRAM, and FI_EP_RDM
18              on a new Scalable (unordered) Reliable Datagram protocol  (SRD).
19              SRD  provides  support  for reliable datagrams and more complete
20              error handling than typically seen with other Reliable  Datagram
21              (RD)  implementations.   The EFA provider provides segmentation,
22              reassembly of out-of-order packets  to  provide  send-after-send
23              ordering guarantees to applications via its FI_EP_RDM endpoint.
24
25       RDM Endpoint capabilities
26              The  following  data  transfer  interfaces are supported via the
27              FI_EP_RDM endpoint: FI_MSG,  FI_TAGGED,  and  FI_RMA.   FI_SEND,
28              FI_RECV, FI_DIRECTED_RECV, FI_MULTI_RECV, and FI_SOURCE capabil‐
29              ities are  supported.   The  endpoint  provides  send-after-send
30              guarantees for data operations.  The FI_EP_RDM endpoint does not
31              have a maximum message size.
32
33       DGRAM Endpoint capabilities
34              The DGRAM endpoint only supports FI_MSG capability with a  maxi‐
35              mum message size of the MTU of the underlying hardware (approxi‐
36              mately 8 KiB).
37
38       Address vectors
39              The provider supports FI_AV_TABLE and FI_AV_MAP  address  vector
40              types.  FI_EVENT is unsupported.
41
42       Completion events
43              The  provider  supports  FI_CQ_FORMAT_CONTEXT, FI_CQ_FORMAT_MSG,
44              and FI_CQ_FORMAT_DATA.  FI_CQ_FORMAT_TAGGED is supported on  the
45              RDM endpoint.  Wait objects are not currently supported.
46
47       Modes  The provider requires the use of FI_MSG_PREFIX when running over
48              the DGRAM endpoint, and requires FI_MR_LOCAL for all memory reg‐
49              istrations on the DGRAM endpoint.
50
51       Memory registration modes
52              The  RDM  endpoint does not require memory registration for send
53              and receive operations, i.e. it does  not  require  FI_MR_LOCAL.
54              Applications may specify FI_MR_LOCAL in the MR mode flags in or‐
55              der  to  use  descriptors  provided  by  the  application.   The
56              FI_EP_DGRAM endpoint only supports FI_MR_LOCAL.
57
58       Progress
59              RDM  and  DGRAM endpoints support FI_PROGRESS_MANUAL.  EFA erro‐
60              neously claims the support  for  FI_PROGRESS_AUTO,  despite  not
61              properly  supporting  automatic  progress.   Unfortunately, some
62              Libfabric consumers also ask for FI_PROGRESS_AUTO when they only
63              require  FI_PROGRESS_MANUAL,  and  fixing  this  bug would break
64              those applications.  This will be fixed in a future  version  of
65              the EFA provider by adding proper support for FI_PROGRESS_AUTO.
66
67       Threading
68              The  RDM  endpoint  supports  FI_THREAD_SAFE, the DGRAM endpoint
69              supports FI_THREAD_DOMAIN, i.e. the provider is not thread  safe
70              when using the DGRAM endpoint.
71

LIMITATIONS

73       The  DGRAM endpoint does not support FI_ATOMIC interfaces.  For RMA op‐
74       erations, completion events for RMA targets (FI_RMA_EVENT) is not  sup‐
75       ported.   The  DGRAM  endpoint  does not fully protect against resource
76       overruns,  so  resource  management  is  disabled  for  this   endpoint
77       (FI_RM_DISABLED).
78
79       No support for selective completions.
80
81       No support for counters for the DGRAM endpoint.
82
83       No support for inject.
84
85       When  using  FI_HMEM  for  AWS  Neuron or Habana SynapseAI buffers, the
86       provider requires peer to peer transaction support between the EFA  and
87       the  FI_HMEM device.  Therefore, the FI_HMEM_P2P_DISABLED option is not
88       supported by the EFA provider for AWS Neuron or Habana SynapseAI.
89

PROVIDER SPECIFIC ENDPOINT LEVEL OPTION

91       FI_OPT_EFA_RNR_RETRY
92              Defines the number of RNR retry.  The application can use it  to
93              reset  RNR  retry  counter via the call to fi_setopt.  Note that
94              this option must be set before the endpoint is enabled.   Other‐
95              wise,  the  call will fail.  Also note that this option only ap‐
96              plies to RDM endpoint.
97
98       FI_OPT_EFA_EMULATED_READ, FI_OPT_EFA_EMULATED_WRITE, FI_OPT_EFA_EMULAT‐
99       ED_ATOMICS - bool
100              These options only apply to the fi_getopt() call.  They are used
101              to query the EFA provider to determine if the endpoint  is  emu‐
102              lating  Read,  Write,  and  Atomic  operations  (return value is
103              true), or if these operations are assisted by  hardware  support
104              (return value is false).
105
106       FI_OPT_EFA_USE_DEVICE_RDMA - bool
107              Only  available  if the application selects a libfabric API ver‐
108              sion >= 1.18.  This option allows an application to change  lib‐
109              fabric’s  behavior  with  respect  to RDMA transfers.  Note that
110              there is also  an  environment  variable  FI_EFA_USE_DEVICE_RDMA
111              which the user may set as well.  If the environment variable and
112              the argument provided with this variable are in  conflict,  then
113              fi_setopt  will  return -FI_EINVAL, and the environment variable
114              will be respected.  If the hardware does not  support  RDMA  and
115              the argument is true, then fi_setopt will return -FI_EOPNOTSUPP.
116              If the application uses API version < 1.18, the argument is  ig‐
117              nored and fi_setopt returns -FI_ENOPROTOOPT.  The default behav‐
118              ior for RDMA transfers depends on API version.  For API >=  1.18
119              RDMA  is  enabled  by default on any hardware which supports it.
120              For API<1.18, RDMA is enabled by default only on  certain  newer
121              hardware revisions.
122
123       FI_OPT_EFA_SENDRECV_IN_ORDER_ALIGNED_128_BYTES - bool
124              It is used to force the endpoint to use in-order send/recv oper‐
125              ation for each 128 bytes aligned  block.   Enabling  the  option
126              will  guarantee  data  inside each 128 bytes aligned block being
127              sent and received in order, it will also guarantee  data  to  be
128              delivered  to  the receive buffer only once.  If endpoint is not
129              able to support this feature, it will return -FI_EOPNOTSUPP  for
130              the call to fi_setopt().
131
132       FI_OPT_EFA_WRITE_IN_ORDER_ALIGNED_128_BYTES - bool
133              It is used to set the endpoint to use in-order RDMA write opera‐
134              tion for each 128 bytes aligned block.  Enabling the option will
135              guarantee data inside each 128 bytes aligned block being written
136              in order, it will also guarantee data to  be  delivered  to  the
137              target  buffer  only  once.   If endpoint is not able to support
138              this feature, it will return  -FI_EOPNOTSUPP  for  the  call  to
139              fi_setopt().
140

RUNTIME PARAMETERS

142       FI_EFA_TX_SIZE
143              Maximum  number  of  transmit operations before the provider re‐
144              turns -FI_EAGAIN.  For only the  RDM  endpoint,  this  parameter
145              will  cause  transmit operations to be queued when this value is
146              set higher than the default and the transmit queue is full.
147
148       FI_EFA_RX_SIZE
149              Maximum number of receive operations before the provider returns
150              -FI_EAGAIN.
151
152       FI_EFA_TX_IOV_LIMIT
153              Maximum number of IOVs for a transmit operation.
154
155       FI_EFA_RX_IOV_LIMIT
156              Maximum number of IOVs for a receive operation.
157

RUNTIME PARAMETERS SPECIFIC TO RDM ENDPOINT

159       These OFI runtime parameters apply only to the RDM endpoint.
160
161       FI_EFA_RX_WINDOW_SIZE
162              Maximum  number of MTU-sized messages that can be in flight from
163              any single endpoint as part of long message data transfer.
164
165       FI_EFA_TX_QUEUE_SIZE
166              Depth of transmit queue opened with the NIC.  This  may  not  be
167              set to a value greater than what the NIC supports.
168
169       FI_EFA_RECVWIN_SIZE
170              Size of out of order reorder buffer (in messages).  Messages re‐
171              ceived out of this window will result in an error.
172
173       FI_EFA_CQ_SIZE
174              Size of any cq created, in number of entries.
175
176       FI_EFA_MR_CACHE_ENABLE
177              Enables using the mr cache and in-line registration instead of a
178              bounce  buffer  for iov’s larger than max_memcpy_size.  Defaults
179              to true.  When disabled, only uses a bounce buffer
180
181       FI_EFA_MR_MAX_CACHED_COUNT
182              Sets the maximum number of  memory  registrations  that  can  be
183              cached at any time.
184
185       FI_EFA_MR_MAX_CACHED_SIZE
186              Sets  the  maximum amount of memory that cached memory registra‐
187              tions can hold onto at any time.
188
189       FI_EFA_MAX_MEMCPY_SIZE
190              Threshold size switch between using memory copy into a  pre-reg‐
191              istered  bounce  buffer and memory registration on the user buf‐
192              fer.
193
194       FI_EFA_MTU_SIZE
195              Overrides the default MTU size of the device.
196
197       FI_EFA_RX_COPY_UNEXP
198              Enables the use of a separate pool of bounce-buffers to copy un‐
199              expected messages out of the pre-posted receive buffers.
200
201       FI_EFA_RX_COPY_OOO
202              Enables  the  use  of  a separate pool of bounce-buffers to copy
203              out-of-order RTS packets out of the pre-posted receive buffers.
204
205       FI_EFA_MAX_TIMEOUT
206              Maximum timeout (us) for backoff to a peer after a receiver  not
207              ready error.
208
209       FI_EFA_TIMEOUT_INTERVAL
210              Time  interval  (us) for the base timeout to use for exponential
211              backoff to a peer after a receiver not ready error.
212
213       FI_EFA_ENABLE_SHM_TRANSFER
214              Enable SHM provider to provide the communication across all  in‐
215              tra-node  processes.   SHM transfer will be disabled in the case
216              where ptrace protection is turned on.  You can turn  it  off  to
217              enable shm transfer.
218
219       FI_EFA_SHM_AV_SIZE
220              Defines  the maximum number of entries in SHM provider’s address
221              vector.
222
223       FI_EFA_SHM_MAX_MEDIUM_SIZE
224              Defines the switch point between small/medium message and  large
225              message.   The  message  larger  than  this switch point will be
226              transferred with large message protocol.  NOTE:  This  parameter
227              is now deprecated.
228
229       FI_EFA_INTER_MAX_MEDIUM_MESSAGE_SIZE
230              The  maximum  size  for  inter  EFA messages to be sent by using
231              medium message protocol.  Messages which can fit in  one  packet
232              will be sent as eager message.  Messages whose sizes are smaller
233              than this value will be  sent  using  medium  message  protocol.
234              Other  messages will be sent using CTS based long message proto‐
235              col.
236
237       FI_EFA_FORK_SAFE
238              Enable fork() support.  This may have a small performance impact
239              and should only be set when required.  Applications that require
240              to register regions backed by huge pages and also  require  fork
241              support are not supported.
242
243       FI_EFA_RUNT_SIZE
244              The  maximum  number  of  bytes that will be eagerly sent by in‐
245              flight messages uses  runting  read  message  protocol  (Default
246              307200).
247
248       FI_EFA_SET_CUDA_SYNC_MEMOPS
249              Set CU_POINTER_ATTRIBUTE_SYNC_MEMOPS for cuda ptr.  (Default: 1)
250
251       FI_EFA_INTER_MIN_READ_MESSAGE_SIZE
252              The  minimum  message  size  in bytes for inter EFA read message
253              protocol.  If instance support RDMA read, messages whose size is
254              larger  than  this  value will be sent by read message protocol.
255              (Default 1048576).
256
257       FI_EFA_INTER_MIN_READ_WRITE_SIZE
258              The mimimum message size for inter EFA write to use  read  write
259              protocol.   If  firmware  support  RDMA read, and FI_EFA_USE_DE‐
260              VICE_RDMA is 1, write requests whose size is  larger  than  this
261              value will use the read write protocol (Default 65536).
262
263       FI_EFA_USE_DEVICE_RDMA
264              Specify  whether  to  require or ignore RDMA features of the EFA
265              device.  - When set to 1/true/yes/on, all RDMA features  of  the
266              EFA  device  are  used.  But if EFA device does not support RDMA
267              and FI_EFA_USE_DEVICE_RDMA is set to 1/true/yes/on,  user’s  ap‐
268              plication  is  aborted and a warning message is printed.  - When
269              set to 0/false/no/off, libfabric will emulate all fi_rma  opera‐
270              tions  instead  of  offloading  them  to the EFA network device.
271              Libfabric will not use device RDMA to implement send/receive op‐
272              erations.   - If not set, RDMA operations will occur when avail‐
273              able based on RDMA device ID/version.
274

SEE ALSO

276       fabric(7), fi_provider(7), fi_getinfo(3)
277

AUTHORS

279       OpenFabrics.
280
281
282
283Libfabric Programmer’s Manual     2023-03-31                         fi_efa(7)
Impressum