1fi_efa(7)                      Libfabric v1.17.0                     fi_efa(7)
2
3
4

NAME

6       fi_efa - The Amazon Elastic Fabric Adapter (EFA) Provider
7

OVERVIEW

9       The  EFA  provider  supports the Elastic Fabric Adapter (EFA) device on
10       Amazon EC2.  EFA provides reliable and unreliable datagram send/receive
11       with direct hardware access from userspace (OS bypass).
12

SUPPORTED FEATURES

14       The following features are supported:
15
16       Endpoint types
17              The  provider  supports endpoint type FI_EP_DGRAM, and FI_EP_RDM
18              on a new Scalable (unordered) Reliable Datagram protocol  (SRD).
19              SRD  provides  support  for reliable datagrams and more complete
20              error handling than typically seen with other Reliable  Datagram
21              (RD)  implementations.   The EFA provider provides segmentation,
22              reassembly of out-of-order packets  to  provide  send-after-send
23              ordering guarantees to applications via its FI_EP_RDM endpoint.
24
25       RDM Endpoint capabilities
26              The  following  data  transfer  interfaces are supported via the
27              FI_EP_RDM endpoint: FI_MSG,  FI_TAGGED,  and  FI_RMA.   FI_SEND,
28              FI_RECV, FI_DIRECTED_RECV, FI_MULTI_RECV, and FI_SOURCE capabil‐
29              ities are  supported.   The  endpoint  provides  send-after-send
30              guarantees for data operations.  The FI_EP_RDM endpoint does not
31              have a maximum message size.
32
33       DGRAM Endpoint capabilities
34              The DGRAM endpoint only supports FI_MSG capability with a  maxi‐
35              mum message size of the MTU of the underlying hardware (approxi‐
36              mately 8 KiB).
37
38       Address vectors
39              The provider supports FI_AV_TABLE and FI_AV_MAP  address  vector
40              types.  FI_EVENT is unsupported.
41
42       Completion events
43              The  provider  supports  FI_CQ_FORMAT_CONTEXT, FI_CQ_FORMAT_MSG,
44              and FI_CQ_FORMAT_DATA.  FI_CQ_FORMAT_TAGGED is supported on  the
45              RDM endpoint.  Wait objects are not currently supported.
46
47       Modes  The provider requires the use of FI_MSG_PREFIX when running over
48              the DGRAM endpoint, and requires FI_MR_LOCAL for all memory reg‐
49              istrations on the DGRAM endpoint.
50
51       Memory registration modes
52              The  RDM  endpoint does not require memory registration for send
53              and receive operations, i.e. it does  not  require  FI_MR_LOCAL.
54              Applications may specify FI_MR_LOCAL in the MR mode flags in or‐
55              der  to  use  descriptors  provided  by  the  application.   The
56              FI_EP_DGRAM endpoint only supports FI_MR_LOCAL.
57
58       Progress
59              RDM  and  DGRAM endpoints support FI_PROGRESS_MANUAL.  EFA erro‐
60              neously claims the support  for  FI_PROGRESS_AUTO,  despite  not
61              properly  supporting  automatic  progress.   Unfortunately, some
62              Libfabric consumers also ask for FI_PROGRESS_AUTO when they only
63              require  FI_PROGRESS_MANUAL,  and  fixing  this  bug would break
64              those applications.  This will be fixed in a future  version  of
65              the EFA provider by adding proper support for FI_PROGRESS_AUTO.
66
67       Threading
68              The  RDM  endpoint  supports  FI_THREAD_SAFE, the DGRAM endpoint
69              supports FI_THREAD_DOMAIN, i.e. the provider is not thread  safe
70              when using the DGRAM endpoint.
71

LIMITATIONS

73       The  DGRAM endpoint does not support FI_ATOMIC interfaces.  For RMA op‐
74       erations, completion events for RMA targets (FI_RMA_EVENT) is not  sup‐
75       ported.   The  DGRAM  endpoint  does not fully protect against resource
76       overruns,  so  resource  management  is  disabled  for  this   endpoint
77       (FI_RM_DISABLED).
78
79       No support for selective completions.
80
81       No support for counters for the DGRAM endpoint.
82
83       No support for inject.
84
85       When using FI_HMEM for either CUDA and Neuron buffers, the provider re‐
86       quires peer to peer transaction support between the EFA and the FI_HMEM
87       device.  Therefore, the FI_HMEM_P2P_DISABLED option is not supported by
88       the EFA provider.
89

PROVIDER SPECIFIC ENDPOINT LEVEL OPTION

91       FI_OPT_EFA_RNR_RETRY
92              Defines the number of RNR retry.  The application can use it  to
93              reset  RNR  retry  counter via the call to fi_setopt.  Note that
94              this option must be set before the endpoint is enabled.   Other‐
95              wise,  the  call will fail.  Also note that this option only ap‐
96              plies to RDM endpoint.
97

RUNTIME PARAMETERS

99       FI_EFA_TX_SIZE
100              Maximum number of transmit operations before  the  provider  re‐
101              turns  -FI_EAGAIN.   For  only  the RDM endpoint, this parameter
102              will cause transmit operations to be queued when this  value  is
103              set higher than the default and the transmit queue is full.
104
105       FI_EFA_RX_SIZE
106              Maximum number of receive operations before the provider returns
107              -FI_EAGAIN.
108
109       FI_EFA_TX_IOV_LIMIT
110              Maximum number of IOVs for a transmit operation.
111
112       FI_EFA_RX_IOV_LIMIT
113              Maximum number of IOVs for a receive operation.
114

RUNTIME PARAMETERS SPECIFIC TO RDM ENDPOINT

116       These OFI runtime parameters apply only to the RDM endpoint.
117
118       FI_EFA_RX_WINDOW_SIZE
119              Maximum number of MTU-sized messages that can be in flight  from
120              any single endpoint as part of long message data transfer.
121
122       FI_EFA_TX_QUEUE_SIZE
123              Depth  of  transmit  queue opened with the NIC.  This may not be
124              set to a value greater than what the NIC supports.
125
126       FI_EFA_RECVWIN_SIZE
127              Size of out of order reorder buffer (in messages).  Messages re‐
128              ceived out of this window will result in an error.
129
130       FI_EFA_CQ_SIZE
131              Size of any cq created, in number of entries.
132
133       FI_EFA_MR_CACHE_ENABLE
134              Enables using the mr cache and in-line registration instead of a
135              bounce buffer for iov’s larger than  max_memcpy_size.   Defaults
136              to true.  When disabled, only uses a bounce buffer
137
138       FI_EFA_MR_MAX_CACHED_COUNT
139              Sets  the  maximum  number  of  memory registrations that can be
140              cached at any time.
141
142       FI_EFA_MR_MAX_CACHED_SIZE
143              Sets the maximum amount of memory that cached  memory  registra‐
144              tions can hold onto at any time.
145
146       FI_EFA_MAX_MEMCPY_SIZE
147              Threshold  size switch between using memory copy into a pre-reg‐
148              istered bounce buffer and memory registration on the  user  buf‐
149              fer.
150
151       FI_EFA_MTU_SIZE
152              Overrides the default MTU size of the device.
153
154       FI_EFA_RX_COPY_UNEXP
155              Enables the use of a separate pool of bounce-buffers to copy un‐
156              expected messages out of the pre-posted receive buffers.
157
158       FI_EFA_RX_COPY_OOO
159              Enables the use of a separate pool  of  bounce-buffers  to  copy
160              out-of-order RTS packets out of the pre-posted receive buffers.
161
162       FI_EFA_MAX_TIMEOUT
163              Maximum  timeout (us) for backoff to a peer after a receiver not
164              ready error.
165
166       FI_EFA_TIMEOUT_INTERVAL
167              Time interval (us) for the base timeout to use  for  exponential
168              backoff to a peer after a receiver not ready error.
169
170       FI_EFA_ENABLE_SHM_TRANSFER
171              Enable  SHM provider to provide the communication across all in‐
172              tra-node processes.  SHM transfer will be disabled in  the  case
173              where  ptrace  protection  is turned on.  You can turn it off to
174              enable shm transfer.
175
176       FI_EFA_SHM_AV_SIZE
177              Defines the maximum number of entries in SHM provider’s  address
178              vector.
179
180       FI_EFA_SHM_MAX_MEDIUM_SIZE
181              Defines  the switch point between small/medium message and large
182              message.  The message larger than  this  switch  point  will  be
183              transferred  with  large message protocol.  NOTE: This parameter
184              is now deprecated.
185
186       FI_EFA_INTER_MAX_MEDIUM_MESSAGE_SIZE
187              The maximum size for inter EFA messages  to  be  sent  by  using
188              medium  message  protocol.  Messages which can fit in one packet
189              will be sent as eager message.  Messages whose sizes are smaller
190              than  this  value  will  be  sent using medium message protocol.
191              Other messages will be sent using CTS based long message  proto‐
192              col.
193
194       FI_EFA_FORK_SAFE
195              Enable fork() support.  This may have a small performance impact
196              and should only be set when required.  Applications that require
197              to  register  regions backed by huge pages and also require fork
198              support are not supported.
199
200       FI_EFA_RUNT_SIZE
201              The maximum number of bytes that will be  eagerly  sent  by  in‐
202              flight  messages  uses  runting  read  message protocol (Default
203              307200).
204
205       FI_EFA_SET_CUDA_SYNC_MEMOPS
206              Set CU_POINTER_ATTRIBUTE_SYNC_MEMOPS for cuda ptr.  (Default: 1)
207
208       FI_EFA_INTER_MIN_READ_MESSAGE_SIZE
209              The minimum message size in bytes for  inter  EFA  read  message
210              protocol.  If instance support RDMA read, messages whose size is
211              larger than this value will be sent by  read  message  protocol.
212              (Default 1048576).
213
214       FI_EFA_INTER_MIN_READ_WRITE_SIZE
215              The  mimimum  message size for inter EFA write to use read write
216              protocol.  If firmware support  RDMA  read,  and  FI_EFA_USE_DE‐
217              VICE_RDMA  is  1,  write requests whose size is larger than this
218              value will use the read write protocol (Default 65536).
219

SEE ALSO

221       fabric(7), fi_provider(7), fi_getinfo(3)
222

AUTHORS

224       OpenFabrics.
225
226
227
228Libfabric Programmer’s Manual     2022-12-11                         fi_efa(7)
Impressum