1fi_efa(7) Libfabric v1.15.1 fi_efa(7)
2
3
4
6 fi_efa - The Amazon Elastic Fabric Adapter (EFA) Provider
7
9 The EFA provider supports the Elastic Fabric Adapter (EFA) device on
10 Amazon EC2. EFA provides reliable and unreliable datagram send/receive
11 with direct hardware access from userspace (OS bypass).
12
14 The following features are supported:
15
16 Endpoint types
17 The provider supports endpoint type FI_EP_DGRAM, and FI_EP_RDM
18 on a new Scalable (unordered) Reliable Datagram protocol (SRD).
19 SRD provides support for reliable datagrams and more complete
20 error handling than typically seen with other Reliable Datagram
21 (RD) implementations. The EFA provider provides segmentation,
22 reassembly of out-of-order packets to provide send-after-send
23 ordering guarantees to applications via its FI_EP_RDM endpoint.
24
25 RDM Endpoint capabilities
26 The following data transfer interfaces are supported via the
27 FI_EP_RDM endpoint: FI_MSG, FI_TAGGED, and FI_RMA. FI_SEND,
28 FI_RECV, FI_DIRECTED_RECV, FI_MULTI_RECV, and FI_SOURCE capabil‐
29 ities are supported. The endpoint provides send-after-send
30 guarantees for data operations. The FI_EP_RDM endpoint does not
31 have a maximum message size.
32
33 DGRAM Endpoint capabilities
34 The DGRAM endpoint only supports FI_MSG capability with a maxi‐
35 mum message size of the MTU of the underlying hardware (approxi‐
36 mately 8 KiB).
37
38 Address vectors
39 The provider supports FI_AV_TABLE and FI_AV_MAP address vector
40 types. FI_EVENT is unsupported.
41
42 Completion events
43 The provider supports FI_CQ_FORMAT_CONTEXT, FI_CQ_FORMAT_MSG,
44 and FI_CQ_FORMAT_DATA. FI_CQ_FORMAT_TAGGED is supported on the
45 RDM endpoint. Wait objects are not currently supported.
46
47 Modes The provider requires the use of FI_MSG_PREFIX when running over
48 the DGRAM endpoint, and requires FI_MR_LOCAL for all memory reg‐
49 istrations on the DGRAM endpoint.
50
51 Memory registration modes
52 The RDM endpoint does not require memory registration for send
53 and receive operations, i.e. it does not require FI_MR_LOCAL.
54 Applications may specify FI_MR_LOCAL in the MR mode flags in or‐
55 der to use descriptors provided by the application. The
56 FI_EP_DGRAM endpoint only supports FI_MR_LOCAL.
57
58 Progress
59 RDM and DGRAM endpoints support FI_PROGRESS_MANUAL. EFA erro‐
60 neously claims the support for FI_PROGRESS_AUTO, despite not
61 properly supporting automatic progress. Unfortunately, some
62 Libfabric consumers also ask for FI_PROGRESS_AUTO when they only
63 require FI_PROGRESS_MANUAL, and fixing this bug would break
64 those applications. This will be fixed in a future version of
65 the EFA provider by adding proper support for FI_PROGRESS_AUTO.
66
67 Threading
68 The RDM endpoint supports FI_THREAD_SAFE, the DGRAM endpoint
69 supports FI_THREAD_DOMAIN, i.e. the provider is not thread safe
70 when using the DGRAM endpoint.
71
73 The DGRAM endpoint does not support FI_ATOMIC interfaces. For RMA op‐
74 erations, completion events for RMA targets (FI_RMA_EVENT) is not sup‐
75 ported. The DGRAM endpoint does not fully protect against resource
76 overruns, so resource management is disabled for this endpoint
77 (FI_RM_DISABLED).
78
79 No support for selective completions.
80
81 No support for counters for the DGRAM endpoint.
82
83 No support for inject.
84
85 When using FI_HMEM for either CUDA and Neuron buffers, the provider re‐
86 quires peer to peer transaction support between the EFA and the FI_HMEM
87 device. Therefore, the FI_HMEM_P2P_DISABLED option is not supported by
88 the EFA provider.
89
91 FI_OPT_EFA_RNR_RETRY
92 Defines the number of RNR retry. The application can use it to
93 reset RNR retry counter via the call to fi_setopt. Note that
94 this option must be set before the endpoint is enabled. Other‐
95 wise, the call will fail. Also note that this option only ap‐
96 plies to RDM endpoint.
97
99 FI_EFA_TX_SIZE
100 Maximum number of transmit operations before the provider re‐
101 turns -FI_EAGAIN. For only the RDM endpoint, this parameter
102 will cause transmit operations to be queued when this value is
103 set higher than the default and the transmit queue is full.
104
105 FI_EFA_RX_SIZE
106 Maximum number of receive operations before the provider returns
107 -FI_EAGAIN.
108
109 FI_EFA_TX_IOV_LIMIT
110 Maximum number of IOVs for a transmit operation.
111
112 FI_EFA_RX_IOV_LIMIT
113 Maximum number of IOVs for a receive operation.
114
116 These OFI runtime parameters apply only to the RDM endpoint.
117
118 FI_EFA_RX_WINDOW_SIZE
119 Maximum number of MTU-sized messages that can be in flight from
120 any single endpoint as part of long message data transfer.
121
122 FI_EFA_TX_QUEUE_SIZE
123 Depth of transmit queue opened with the NIC. This may not be
124 set to a value greater than what the NIC supports.
125
126 FI_EFA_RECVWIN_SIZE
127 Size of out of order reorder buffer (in messages). Messages re‐
128 ceived out of this window will result in an error.
129
130 FI_EFA_CQ_SIZE
131 Size of any cq created, in number of entries.
132
133 FI_EFA_MR_CACHE_ENABLE
134 Enables using the mr cache and in-line registration instead of a
135 bounce buffer for iov’s larger than max_memcpy_size. Defaults
136 to true. When disabled, only uses a bounce buffer
137
138 FI_EFA_MR_MAX_CACHED_COUNT
139 Sets the maximum number of memory registrations that can be
140 cached at any time.
141
142 FI_EFA_MR_MAX_CACHED_SIZE
143 Sets the maximum amount of memory that cached memory registra‐
144 tions can hold onto at any time.
145
146 FI_EFA_MAX_MEMCPY_SIZE
147 Threshold size switch between using memory copy into a pre-reg‐
148 istered bounce buffer and memory registration on the user buf‐
149 fer.
150
151 FI_EFA_MTU_SIZE
152 Overrides the default MTU size of the device.
153
154 FI_EFA_RX_COPY_UNEXP
155 Enables the use of a separate pool of bounce-buffers to copy un‐
156 expected messages out of the pre-posted receive buffers.
157
158 FI_EFA_RX_COPY_OOO
159 Enables the use of a separate pool of bounce-buffers to copy
160 out-of-order RTS packets out of the pre-posted receive buffers.
161
162 FI_EFA_MAX_TIMEOUT
163 Maximum timeout (us) for backoff to a peer after a receiver not
164 ready error.
165
166 FI_EFA_TIMEOUT_INTERVAL
167 Time interval (us) for the base timeout to use for exponential
168 backoff to a peer after a receiver not ready error.
169
170 FI_EFA_ENABLE_SHM_TRANSFER
171 Enable SHM provider to provide the communication across all in‐
172 tra-node processes. SHM transfer will be disabled in the case
173 where ptrace protection is turned on. You can turn it off to
174 enable shm transfer.
175
176 FI_EFA_SHM_AV_SIZE
177 Defines the maximum number of entries in SHM provider’s address
178 vector.
179
180 FI_EFA_SHM_MAX_MEDIUM_SIZE
181 Defines the switch point between small/medium message and large
182 message. The message larger than this switch point will be
183 transferred with large message protocol. NOTE: This parameter
184 is now deprecated.
185
186 FI_EFA_INTER_MAX_MEDIUM_MESSAGE_SIZE
187 The maximum size for inter EFA messages to be sent by using
188 medium message protocol. Messages which can fit in one packet
189 will be sent as eager message. Messages whose sizes are smaller
190 than this value will be sent using medium message protocol.
191 Other messages will be sent using CTS based long message proto‐
192 col.
193
194 FI_EFA_FORK_SAFE
195 Enable fork() support. This may have a small performance impact
196 and should only be set when required. Applications that require
197 to register regions backed by huge pages and also require fork
198 support are not supported.
199
201 fabric(7), fi_provider(7), fi_getinfo(3)
202
204 OpenFabrics.
205
206
207
208Libfabric Programmer’s Manual 2022-01-27 fi_efa(7)