1fi_msg(3) Libfabric v1.10.0 fi_msg(3)
2
3
4
6 fi_msg - Message data transfer operations
7
8 fi_recv / fi_recvv / fi_recvmsg
9 Post a buffer to receive an incoming message
10
11 fi_send / fi_sendv / fi_sendmsg
12 fi_inject / fi_senddata : Initiate an operation to send a message
13
15 #include <rdma/fi_endpoint.h>
16
17 ssize_t fi_recv(struct fid_ep *ep, void * buf, size_t len,
18 void *desc, fi_addr_t src_addr, void *context);
19
20 ssize_t fi_recvv(struct fid_ep *ep, const struct iovec *iov, void **desc,
21 size_t count, fi_addr_t src_addr, void *context);
22
23 ssize_t fi_recvmsg(struct fid_ep *ep, const struct fi_msg *msg,
24 uint64_t flags);
25
26 ssize_t fi_send(struct fid_ep *ep, const void *buf, size_t len,
27 void *desc, fi_addr_t dest_addr, void *context);
28
29 ssize_t fi_sendv(struct fid_ep *ep, const struct iovec *iov,
30 void **desc, size_t count, fi_addr_t dest_addr, void *context);
31
32 ssize_t fi_sendmsg(struct fid_ep *ep, const struct fi_msg *msg,
33 uint64_t flags);
34
35 ssize_t fi_inject(struct fid_ep *ep, const void *buf, size_t len,
36 fi_addr_t dest_addr);
37
38 ssize_t fi_senddata(struct fid_ep *ep, const void *buf, size_t len,
39 void *desc, uint64_t data, fi_addr_t dest_addr, void *context);
40
41 ssize_t fi_injectdata(struct fid_ep *ep, const void *buf, size_t len,
42 uint64_t data, fi_addr_t dest_addr);
43
45 ep Fabric endpoint on which to initiate send or post receive buf‐
46 fer.
47
48 buf Data buffer to send or receive.
49
50 len Length of data buffer to send or receive, specified in bytes.
51 Valid transfers are from 0 bytes up to the endpoint's
52 max_msg_size.
53
54 iov Vectored data buffer.
55
56 count Count of vectored data entries.
57
58 desc Descriptor associated with the data buffer. See fi_mr(3).
59
60 data Remote CQ data to transfer with the sent message.
61
62 dest_addr
63 Destination address for connectionless transfers. Ignored for
64 connected endpoints.
65
66 src_addr
67 Source address to receive from for connectionless transfers.
68 Applies only to connectionless endpoints with the FI_DIRECT‐
69 ED_RECV capability enabled, otherwise this field is ignored. If
70 set to FI_ADDR_UNSPEC, any source address may match.
71
72 msg Message descriptor for send and receive operations.
73
74 flags Additional flags to apply for the send or receive operation.
75
76 context
77 User specified pointer to associate with the operation. This
78 parameter is ignored if the operation will not generate a suc‐
79 cessful completion, unless an op flag specifies the context pa‐
80 rameter be used for required input.
81
83 The send functions -- fi_send, fi_sendv, fi_sendmsg, fi_inject, and
84 fi_senddata -- are used to transmit a message from one endpoint to an‐
85 other endpoint. The main difference between send functions are the
86 number and type of parameters that they accept as input. Otherwise,
87 they perform the same general function. Messages sent using fi_msg op‐
88 erations are received by a remote endpoint into a buffer posted to re‐
89 ceive such messages.
90
91 The receive functions -- fi_recv, fi_recvv, fi_recvmsg -- post a data
92 buffer to an endpoint to receive inbound messages. Similar to the send
93 operations, receive operations operate asynchronously. Users should
94 not touch the posted data buffer(s) until the receive operation has
95 completed.
96
97 An endpoint must be enabled before an application can post send or re‐
98 ceive operations to it. For connected endpoints, receive buffers may
99 be posted prior to connect or accept being called on the endpoint.
100 This ensures that buffers are available to receive incoming data imme‐
101 diately after the connection has been established.
102
103 Completed message operations are reported to the user through one or
104 more event collectors associated with the endpoint. Users provide con‐
105 text which are associated with each operation, and is returned to the
106 user as part of the event completion. See fi_cq for completion event
107 details.
108
109 fi_send
110 The call fi_send transfers the data contained in the user-specified da‐
111 ta buffer to a remote endpoint, with message boundaries being main‐
112 tained. For connection based endpoints (FI_EP_MSG) the local endpoint
113 must be connected to a remote endpoint or destination before fi_send is
114 called. Unless the endpoint has been configured differently, the data
115 buffer passed into fi_send must not be touched by the application until
116 the fi_send call completes asynchronously.
117
118 fi_sendv
119 The fi_sendv call adds support for a scatter-gather list to fi_send.
120 The fi_sendv transfers the set of data buffers referenced by the iov
121 parameter to a remote endpoint as a single message.
122
123 fi_sendmsg
124 The fi_sendmsg call supports data transfers over both connected and un‐
125 connected endpoints, with the ability to control the send operation per
126 call through the use of flags. The fi_sendmsg function takes a
127 struct fi_msg as input.
128
129 struct fi_msg {
130 const struct iovec *msg_iov; /* scatter-gather array */
131 void **desc; /* local request descriptors */
132 size_t iov_count;/* # elements in iov */
133 fi_addr_t addr; /* optional endpoint address */
134 void *context; /* user-defined context */
135 uint64_t data; /* optional message data */
136 };
137
138 fi_inject
139 The send inject call is an optimized version of fi_send with the fol‐
140 lowing characteristics. The data buffer is available for reuse immedi‐
141 ately on return from the call, and no CQ entry will be written if the
142 transfer completes successfully.
143
144 Conceptually, this means that the fi_inject function behaves as if the
145 FI_INJECT transfer flag were set, selective completions are enabled,
146 and the FI_COMPLETION flag is not specified. Note that the CQ entry
147 will be suppressed even if the default behavior of the endpoint is to
148 write CQ entries for all successful completions. See the flags discus‐
149 sion below for more details. The requested message size that can be
150 used with fi_inject is limited by inject_size.
151
152 fi_senddata
153 The send data call is similar to fi_send, but allows for the sending of
154 remote CQ data (see FI_REMOTE_CQ_DATA flag) as part of the transfer.
155
156 fi_injectdata
157 The inject data call is similar to fi_inject, but allows for the send‐
158 ing of remote CQ data (see FI_REMOTE_CQ_DATA flag) as part of the
159 transfer.
160
161 fi_recv
162 The fi_recv call posts a data buffer to the receive queue of the corre‐
163 sponding endpoint. Posted receives are searched in the order in which
164 they were posted in order to match sends. Message boundaries are main‐
165 tained. The order in which the receives complete is dependent on the
166 endpoint type and protocol. For unconnected endpoints, the src_addr
167 parameter can be used to indicate that a buffer should be posted to re‐
168 ceive incoming data from a specific remote endpoint.
169
170 fi_recvv
171 The fi_recvv call adds support for a scatter-gather list to fi_recv.
172 The fi_recvv posts the set of data buffers referenced by the iov param‐
173 eter to a receive incoming data.
174
175 fi_recvmsg
176 The fi_recvmsg call supports posting buffers over both connected and
177 unconnected endpoints, with the ability to control the receive opera‐
178 tion per call through the use of flags. The fi_recvmsg function takes
179 a struct fi_msg as input.
180
182 The fi_recvmsg and fi_sendmsg calls allow the user to specify flags
183 which can change the default message handling of the endpoint. Flags
184 specified with fi_recvmsg / fi_sendmsg override most flags previously
185 configured with the endpoint, except where noted (see fi_endpoint.3).
186 The following list of flags are usable with fi_recvmsg and/or
187 fi_sendmsg.
188
189 FI_REMOTE_CQ_DATA
190 Applies to fi_sendmsg and fi_senddata. Indicates that remote CQ
191 data is available and should be sent as part of the request.
192 See fi_getinfo for additional details on FI_REMOTE_CQ_DATA.
193
194 FI_CLAIM
195 Applies to posted receive operations for endpoints configured
196 for FI_BUFFERED_RECV or FI_VARIABLE_MSG. This flag is used to
197 retrieve a message that was buffered by the provider. See the
198 Buffered Receives section for details.
199
200 FI_COMPLETION
201 Indicates that a completion entry should be generated for the
202 specified operation. The endpoint must be bound to a completion
203 queue with FI_SELECTIVE_COMPLETION that corresponds to the spec‐
204 ified operation, or this flag is ignored.
205
206 FI_DISCARD
207 Applies to posted receive operations for endpoints configured
208 for FI_BUFFERED_RECV or FI_VARIABLE_MSG. This flag is used to
209 free a message that was buffered by the provider. See the
210 Buffered Receives section for details.
211
212 FI_MORE
213 Indicates that the user has additional requests that will imme‐
214 diately be posted after the current call returns. Use of this
215 flag may improve performance by enabling the provider to opti‐
216 mize its access to the fabric hardware.
217
218 FI_INJECT
219 Applies to fi_sendmsg. Indicates that the outbound data buffer
220 should be returned to user immediately after the send call re‐
221 turns, even if the operation is handled asynchronously. This
222 may require that the underlying provider implementation copy the
223 data into a local buffer and transfer out of that buffer. This
224 flag can only be used with messages smaller than inject_size.
225
226 FI_MULTI_RECV
227 Applies to posted receive operations. This flag allows the user
228 to post a single buffer that will receive multiple incoming mes‐
229 sages. Received messages will be packed into the receive buffer
230 until the buffer has been consumed. Use of this flag may cause
231 a single posted receive operation to generate multiple events as
232 messages are placed into the buffer. The placement of received
233 data into the buffer may be subjected to provider specific
234 alignment restrictions.
235
236 The buffer will be released by the provider when the available buffer
237 space falls below the specified minimum (see FI_OPT_MIN_MULTI_RECV).
238 Note that an entry to the associated receive completion queue will al‐
239 ways be generated when the buffer has been consumed, even if other re‐
240 ceive completions have been suppressed (i.e. the Rx context has been
241 configured for FI_SELECTIVE_COMPLETION). See the FI_MULTI_RECV comple‐
242 tion flag fi_cq(3).
243
244 FI_INJECT_COMPLETE
245 Applies to fi_sendmsg. Indicates that a completion should be
246 generated when the source buffer(s) may be reused.
247
248 FI_TRANSMIT_COMPLETE
249 Applies to fi_sendmsg. Indicates that a completion should not
250 be generated until the operation has been successfully transmit‐
251 ted and is no longer being tracked by the provider.
252
253 FI_DELIVERY_COMPLETE
254 Applies to fi_sendmsg. Indicates that a completion should be
255 generated when the operation has been processed by the destina‐
256 tion.
257
258 FI_FENCE
259 Applies to transmits. Indicates that the requested operation,
260 also known as the fenced operation, and any operation posted af‐
261 ter the fenced operation will be deferred until all previous op‐
262 erations targeting the same peer endpoint have completed. Oper‐
263 ations posted after the fencing will see and/or replace the re‐
264 sults of any operations initiated prior to the fenced operation.
265
266 The ordering of operations starting at the posting of the fenced opera‐
267 tion (inclusive) to the posting of a subsequent fenced operation (ex‐
268 clusive) is controlled by the endpoint's ordering semantics.
269
270 FI_MULTICAST
271 Applies to transmits. This flag indicates that the address
272 specified as the data transfer destination is a multicast ad‐
273 dress. This flag must be used in all multicast transfers, in
274 conjunction with a multicast fi_addr_t.
275
277 Buffered receives indicate that the networking layer allocates and man‐
278 ages the data buffers used to receive network data transfers. As a re‐
279 sult, received messages must be copied from the network buffers into
280 application buffers for processing. However, applications can avoid
281 this copy if they are able to process the message in place (directly
282 from the networking buffers).
283
284 Handling buffered receives differs based on the size of the message be‐
285 ing sent. In general, smaller messages are passed directly to the ap‐
286 plication for processing. However, for large messages, an application
287 will only receive the start of the message and must claim the rest.
288 The details for how small messages are reported and large messages may
289 be claimed are described below.
290
291 When a provider receives a message, it will write an entry to the com‐
292 pletion queue associated with the receiving endpoint. For discussion
293 purposes, the completion queue is assumed to be configured for
294 FI_CQ_FORMAT_DATA. Since buffered receives are not associated with ap‐
295 plication posted buffers, the CQ entry op_context will point to a
296 struct fi_recv_context.
297
298 struct fi_recv_context {
299 struct fid_ep *ep;
300 void *context;
301 };
302
303 The 'ep' field will point to the receiving endpoint or Rx context, and
304 'context' will be NULL. The CQ entry's 'buf' will point to a provider
305 managed buffer where the start of the received message is located, and
306 'len' will be set to the total size of the message.
307
308 The maximum sized message that a provider can buffer is limited by an
309 FI_OPT_BUFFERED_LIMIT. This threshold can be obtained and may be ad‐
310 justed by the application using the fi_getopt and fi_setopt calls, re‐
311 spectively. Any adjustments must be made prior to enabling the end‐
312 point. The CQ entry 'buf' will point to a buffer of received data. If
313 the sent message is larger than the buffered amount, the CQ entry
314 'flags' will have the FI_MORE bit set. When the FI_MORE bit is set,
315 'buf' will reference at least FI_OPT_BUFFERED_MIN bytes of data (see
316 fi_endpoint.3 for more info).
317
318 After being notified that a buffered receive has arrived, applications
319 must either claim or discard the message. Typically, small messages
320 are processed and discarded, while large messages are claimed. Howev‐
321 er, an application is free to claim or discard any message regardless
322 of message size.
323
324 To claim a message, an application must post a receive operation with
325 the FI_CLAIM flag set. The struct fi_recv_context returned as part of
326 the notification must be provided as the receive operation's context.
327 The struct fi_recv_context contains a 'context' field. Applications
328 may modify this field prior to claiming the message. When the claim
329 operation completes, a standard receive completion entry will be gener‐
330 ated on the completion queue. The 'context' of the associated CQ entry
331 will be set to the 'context' value passed in through the fi_recv_con‐
332 text structure, and the CQ entry flags will have the FI_CLAIM bit set.
333
334 Buffered receives that are not claimed must be discarded by the appli‐
335 cation when it is done processing the CQ entry data. To discard a mes‐
336 sage, an application must post a receive operation with the FI_DISCARD
337 flag set. The struct fi_recv_context returned as part of the notifica‐
338 tion must be provided as the receive operation's context. When the
339 FI_DISCARD flag is set for a receive operation, the receive input buf‐
340 fer(s) and length parameters are ignored.
341
342 IMPORTANT: Buffered receives must be claimed or discarded in a timely
343 manner. Failure to do so may result in increased memory usage for net‐
344 work buffering or communication stalls. Once a buffered receive has
345 been claimed or discarded, the original CQ entry 'buf' or struct
346 fi_recv_context data may no longer be accessed by the application.
347
348 The use of the FI_CLAIM and FI_DISCARD operation flags is also de‐
349 scribed with respect to tagged message transfers in fi_tagged.3.
350 Buffered receives of tagged messages will include the message tag as
351 part of the CQ entry, if available.
352
353 The handling of buffered receives follows all message ordering restric‐
354 tions assigned to an endpoint. For example, completions may indicate
355 the order in which received messages arrived at the receiver based on
356 the endpoint attributes.
357
359 Variable length messages, or simply variable messages, are transfers
360 where the size of the message is unknown to the receiver prior to the
361 message being sent. It indicates that the recipient of a message does
362 not know the amount of data to expect prior to the message arriving.
363 It is most commonly used when the size of message transfers varies
364 greatly, with very large messages interspersed with much smaller mes‐
365 sages, making receive side message buffering difficult to manage.
366 Variable messages are not subject to max message length restrictions
367 (i.e. struct fi_ep_attr::max_msg_size limits), and may be up to the
368 maximum value of size_t (e.g. SIZE_MAX) in length.
369
370 Variable length messages support requests that the provider allocate
371 and manage the network message buffers. As a result, the application
372 requirements and provider behavior is identical as those defined for
373 supporting the FI_BUFFERED_RECV mode bit. See the Buffered Receive
374 section above for details. The main difference is that buffered re‐
375 ceives are limited by the fi_ep_attr::max_msg_size threshold, whereas
376 variable length messages are not.
377
378 Support for variable messages is indicated through the FI_VARIABLE_MSG
379 capability bit.
380
382 If an endpoint has been configured with FI_MSG_PREFIX, the application
383 must include buffer space of size msg_prefix_size, as specified by the
384 endpoint attributes. The prefix buffer must occur at the start of the
385 data referenced by the buf parameter, or be referenced by the first IO
386 vector. Message prefix space cannot be split between multiple IO vec‐
387 tors. The size of the prefix buffer should be included as part of the
388 total buffer length.
389
391 Returns 0 on success. On error, a negative value corresponding to fab‐
392 ric errno is returned. Fabric errno values are defined in rdma/fi_er‐
393 rno.h.
394
395 See the discussion below for details handling FI_EAGAIN.
396
398 -FI_EAGAIN
399 Indicates that the underlying provider currently lacks the re‐
400 sources needed to initiate the requested operation. The reasons
401 for a provider returning FI_EAGAIN are varied. However, common
402 reasons include insufficient internal buffering or full process‐
403 ing queues.
404
405 Insufficient internal buffering is often associated with operations
406 that use FI_INJECT. In such cases, additional buffering may become
407 available as posted operations complete.
408
409 Full processing queues may be a temporary state related to local pro‐
410 cessing (for example, a large message is being transferred), or may be
411 the result of flow control. In the latter case, the queues may remain
412 blocked until additional resources are made available at the remote
413 side of the transfer.
414
415 In all cases, the operation may be retried after additional resources
416 become available. It is strongly recommended that applications check
417 for transmit and receive completions after receiving FI_EAGAIN as a re‐
418 turn value, independent of the operation which failed. This is partic‐
419 ularly important in cases where manual progress is employed, as ac‐
420 knowledgements or flow control messages may need to be processed in or‐
421 der to resume execution.
422
424 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cq(3)
425
427 OpenFabrics.
428
429
430
431Libfabric Programmer's Manual 2019-09-27 fi_msg(3)