1fi_msg(3)                      Libfabric v1.10.0                     fi_msg(3)
2
3
4

NAME

6       fi_msg - Message data transfer operations
7
8       fi_recv / fi_recvv / fi_recvmsg
9              Post a buffer to receive an incoming message
10
11       fi_send / fi_sendv / fi_sendmsg
12       fi_inject / fi_senddata : Initiate an operation to send a message
13

SYNOPSIS

15              #include <rdma/fi_endpoint.h>
16
17              ssize_t fi_recv(struct fid_ep *ep, void * buf, size_t len,
18                  void *desc, fi_addr_t src_addr, void *context);
19
20              ssize_t fi_recvv(struct fid_ep *ep, const struct iovec *iov, void **desc,
21                  size_t count, fi_addr_t src_addr, void *context);
22
23              ssize_t fi_recvmsg(struct fid_ep *ep, const struct fi_msg *msg,
24                  uint64_t flags);
25
26              ssize_t fi_send(struct fid_ep *ep, const void *buf, size_t len,
27                  void *desc, fi_addr_t dest_addr, void *context);
28
29              ssize_t fi_sendv(struct fid_ep *ep, const struct iovec *iov,
30                  void **desc, size_t count, fi_addr_t dest_addr, void *context);
31
32              ssize_t fi_sendmsg(struct fid_ep *ep, const struct fi_msg *msg,
33                  uint64_t flags);
34
35              ssize_t fi_inject(struct fid_ep *ep, const void *buf, size_t len,
36                  fi_addr_t dest_addr);
37
38              ssize_t fi_senddata(struct fid_ep *ep, const void *buf, size_t len,
39                  void *desc, uint64_t data, fi_addr_t dest_addr, void *context);
40
41              ssize_t fi_injectdata(struct fid_ep *ep, const void *buf, size_t len,
42                  uint64_t data, fi_addr_t dest_addr);
43

ARGUMENTS

45       ep     Fabric  endpoint  on which to initiate send or post receive buf‐
46              fer.
47
48       buf    Data buffer to send or receive.
49
50       len    Length of data buffer to send or receive,  specified  in  bytes.
51              Valid   transfers   are  from  0  bytes  up  to  the  endpoint's
52              max_msg_size.
53
54       iov    Vectored data buffer.
55
56       count  Count of vectored data entries.
57
58       desc   Descriptor associated with the data buffer.  See fi_mr(3).
59
60       data   Remote CQ data to transfer with the sent message.
61
62       dest_addr
63              Destination address for connectionless transfers.   Ignored  for
64              connected endpoints.
65
66       src_addr
67              Source  address  to  receive  from for connectionless transfers.
68              Applies only to connectionless  endpoints  with  the  FI_DIRECT‐
69              ED_RECV capability enabled, otherwise this field is ignored.  If
70              set to FI_ADDR_UNSPEC, any source address may match.
71
72       msg    Message descriptor for send and receive operations.
73
74       flags  Additional flags to apply for the send or receive operation.
75
76       context
77              User specified pointer to associate with  the  operation.   This
78              parameter  is  ignored if the operation will not generate a suc‐
79              cessful completion, unless an op flag specifies the context  pa‐
80              rameter be used for required input.
81

DESCRIPTION

83       The  send  functions  --  fi_send, fi_sendv, fi_sendmsg, fi_inject, and
84       fi_senddata -- are used to transmit a message from one endpoint to  an‐
85       other  endpoint.   The  main  difference between send functions are the
86       number and type of parameters that they accept  as  input.   Otherwise,
87       they perform the same general function.  Messages sent using fi_msg op‐
88       erations are received by a remote endpoint into a buffer posted to  re‐
89       ceive such messages.
90
91       The  receive  functions -- fi_recv, fi_recvv, fi_recvmsg -- post a data
92       buffer to an endpoint to receive inbound messages.  Similar to the send
93       operations,  receive  operations  operate asynchronously.  Users should
94       not touch the posted data buffer(s) until  the  receive  operation  has
95       completed.
96
97       An  endpoint must be enabled before an application can post send or re‐
98       ceive operations to it.  For connected endpoints, receive  buffers  may
99       be  posted  prior  to  connect  or accept being called on the endpoint.
100       This ensures that buffers are available to receive incoming data  imme‐
101       diately after the connection has been established.
102
103       Completed  message  operations  are reported to the user through one or
104       more event collectors associated with the endpoint.  Users provide con‐
105       text  which  are associated with each operation, and is returned to the
106       user as part of the event completion.  See fi_cq for  completion  event
107       details.
108
109   fi_send
110       The call fi_send transfers the data contained in the user-specified da‐
111       ta buffer to a remote endpoint, with  message  boundaries  being  main‐
112       tained.   For connection based endpoints (FI_EP_MSG) the local endpoint
113       must be connected to a remote endpoint or destination before fi_send is
114       called.   Unless the endpoint has been configured differently, the data
115       buffer passed into fi_send must not be touched by the application until
116       the fi_send call completes asynchronously.
117
118   fi_sendv
119       The  fi_sendv  call  adds support for a scatter-gather list to fi_send.
120       The fi_sendv transfers the set of data buffers referenced  by  the  iov
121       parameter to a remote endpoint as a single message.
122
123   fi_sendmsg
124       The fi_sendmsg call supports data transfers over both connected and un‐
125       connected endpoints, with the ability to control the send operation per
126       call  through  the  use  of  flags.   The  fi_sendmsg  function takes a
127       struct fi_msg as input.
128
129              struct fi_msg {
130                  const struct iovec *msg_iov; /* scatter-gather array */
131                  void               **desc;   /* local request descriptors */
132                  size_t             iov_count;/* # elements in iov */
133                  fi_addr_t          addr;     /* optional endpoint address */
134                  void               *context; /* user-defined context */
135                  uint64_t           data;     /* optional message data */
136              };
137
138   fi_inject
139       The send inject call is an optimized version of fi_send with  the  fol‐
140       lowing characteristics.  The data buffer is available for reuse immedi‐
141       ately on return from the call, and no CQ entry will be written  if  the
142       transfer completes successfully.
143
144       Conceptually,  this means that the fi_inject function behaves as if the
145       FI_INJECT transfer flag were set, selective  completions  are  enabled,
146       and  the  FI_COMPLETION  flag is not specified.  Note that the CQ entry
147       will be suppressed even if the default behavior of the endpoint  is  to
148       write CQ entries for all successful completions.  See the flags discus‐
149       sion below for more details.  The requested message size  that  can  be
150       used with fi_inject is limited by inject_size.
151
152   fi_senddata
153       The send data call is similar to fi_send, but allows for the sending of
154       remote CQ data (see FI_REMOTE_CQ_DATA flag) as part of the transfer.
155
156   fi_injectdata
157       The inject data call is similar to fi_inject, but allows for the  send‐
158       ing  of  remote  CQ  data  (see  FI_REMOTE_CQ_DATA flag) as part of the
159       transfer.
160
161   fi_recv
162       The fi_recv call posts a data buffer to the receive queue of the corre‐
163       sponding  endpoint.  Posted receives are searched in the order in which
164       they were posted in order to match sends.  Message boundaries are main‐
165       tained.   The  order in which the receives complete is dependent on the
166       endpoint type and protocol.  For unconnected  endpoints,  the  src_addr
167       parameter can be used to indicate that a buffer should be posted to re‐
168       ceive incoming data from a specific remote endpoint.
169
170   fi_recvv
171       The fi_recvv call adds support for a scatter-gather  list  to  fi_recv.
172       The fi_recvv posts the set of data buffers referenced by the iov param‐
173       eter to a receive incoming data.
174
175   fi_recvmsg
176       The fi_recvmsg call supports posting buffers over  both  connected  and
177       unconnected  endpoints,  with the ability to control the receive opera‐
178       tion per call through the use of flags.  The fi_recvmsg function  takes
179       a struct fi_msg as input.
180

FLAGS

182       The  fi_recvmsg  and  fi_sendmsg  calls allow the user to specify flags
183       which can change the default message handling of the  endpoint.   Flags
184       specified  with  fi_recvmsg / fi_sendmsg override most flags previously
185       configured with the endpoint, except where noted  (see  fi_endpoint.3).
186       The   following  list  of  flags  are  usable  with  fi_recvmsg  and/or
187       fi_sendmsg.
188
189       FI_REMOTE_CQ_DATA
190              Applies to fi_sendmsg and fi_senddata.  Indicates that remote CQ
191              data  is  available  and  should be sent as part of the request.
192              See fi_getinfo for additional details on FI_REMOTE_CQ_DATA.
193
194       FI_CLAIM
195              Applies to posted receive operations  for  endpoints  configured
196              for  FI_BUFFERED_RECV  or FI_VARIABLE_MSG.  This flag is used to
197              retrieve a message that was buffered by the provider.   See  the
198              Buffered Receives section for details.
199
200       FI_COMPLETION
201              Indicates  that  a  completion entry should be generated for the
202              specified operation.  The endpoint must be bound to a completion
203              queue with FI_SELECTIVE_COMPLETION that corresponds to the spec‐
204              ified operation, or this flag is ignored.
205
206       FI_DISCARD
207              Applies to posted receive operations  for  endpoints  configured
208              for  FI_BUFFERED_RECV  or FI_VARIABLE_MSG.  This flag is used to
209              free a message that was  buffered  by  the  provider.   See  the
210              Buffered Receives section for details.
211
212       FI_MORE
213              Indicates  that the user has additional requests that will imme‐
214              diately be posted after the current call returns.  Use  of  this
215              flag  may  improve performance by enabling the provider to opti‐
216              mize its access to the fabric hardware.
217
218       FI_INJECT
219              Applies to fi_sendmsg.  Indicates that the outbound data  buffer
220              should  be  returned to user immediately after the send call re‐
221              turns, even if the operation is  handled  asynchronously.   This
222              may require that the underlying provider implementation copy the
223              data into a local buffer and transfer out of that buffer.   This
224              flag can only be used with messages smaller than inject_size.
225
226       FI_MULTI_RECV
227              Applies to posted receive operations.  This flag allows the user
228              to post a single buffer that will receive multiple incoming mes‐
229              sages.  Received messages will be packed into the receive buffer
230              until the buffer has been consumed.  Use of this flag may  cause
231              a single posted receive operation to generate multiple events as
232              messages are placed into the buffer.  The placement of  received
233              data  into  the  buffer  may  be  subjected to provider specific
234              alignment restrictions.
235
236       The buffer will be released by the provider when the  available  buffer
237       space  falls  below  the specified minimum (see FI_OPT_MIN_MULTI_RECV).
238       Note that an entry to the associated receive completion queue will  al‐
239       ways  be generated when the buffer has been consumed, even if other re‐
240       ceive completions have been suppressed (i.e.  the Rx context  has  been
241       configured for FI_SELECTIVE_COMPLETION).  See the FI_MULTI_RECV comple‐
242       tion flag fi_cq(3).
243
244       FI_INJECT_COMPLETE
245              Applies to fi_sendmsg.  Indicates that a  completion  should  be
246              generated when the source buffer(s) may be reused.
247
248       FI_TRANSMIT_COMPLETE
249              Applies  to  fi_sendmsg.  Indicates that a completion should not
250              be generated until the operation has been successfully transmit‐
251              ted and is no longer being tracked by the provider.
252
253       FI_DELIVERY_COMPLETE
254              Applies  to  fi_sendmsg.   Indicates that a completion should be
255              generated when the operation has been processed by the  destina‐
256              tion.
257
258       FI_FENCE
259              Applies  to  transmits.  Indicates that the requested operation,
260              also known as the fenced operation, and any operation posted af‐
261              ter the fenced operation will be deferred until all previous op‐
262              erations targeting the same peer endpoint have completed.  Oper‐
263              ations  posted after the fencing will see and/or replace the re‐
264              sults of any operations initiated prior to the fenced operation.
265
266       The ordering of operations starting at the posting of the fenced opera‐
267       tion  (inclusive)  to the posting of a subsequent fenced operation (ex‐
268       clusive) is controlled by the endpoint's ordering semantics.
269
270       FI_MULTICAST
271              Applies to transmits.  This  flag  indicates  that  the  address
272              specified  as  the  data transfer destination is a multicast ad‐
273              dress.  This flag must be used in all  multicast  transfers,  in
274              conjunction with a multicast fi_addr_t.
275

Buffered Receives

277       Buffered receives indicate that the networking layer allocates and man‐
278       ages the data buffers used to receive network data transfers.  As a re‐
279       sult,  received  messages  must be copied from the network buffers into
280       application buffers for processing.  However,  applications  can  avoid
281       this  copy  if  they are able to process the message in place (directly
282       from the networking buffers).
283
284       Handling buffered receives differs based on the size of the message be‐
285       ing  sent.  In general, smaller messages are passed directly to the ap‐
286       plication for processing.  However, for large messages, an  application
287       will  only  receive  the  start of the message and must claim the rest.
288       The details for how small messages are reported and large messages  may
289       be claimed are described below.
290
291       When  a provider receives a message, it will write an entry to the com‐
292       pletion queue associated with the receiving endpoint.   For  discussion
293       purposes,  the  completion  queue  is  assumed  to  be  configured  for
294       FI_CQ_FORMAT_DATA.  Since buffered receives are not associated with ap‐
295       plication  posted  buffers,  the  CQ  entry  op_context will point to a
296       struct fi_recv_context.
297
298              struct fi_recv_context {
299                  struct fid_ep *ep;
300                  void *context;
301              };
302
303       The 'ep' field will point to the receiving endpoint or Rx context,  and
304       'context'  will be NULL.  The CQ entry's 'buf' will point to a provider
305       managed buffer where the start of the received message is located,  and
306       'len' will be set to the total size of the message.
307
308       The  maximum  sized message that a provider can buffer is limited by an
309       FI_OPT_BUFFERED_LIMIT.  This threshold can be obtained and may  be  ad‐
310       justed  by the application using the fi_getopt and fi_setopt calls, re‐
311       spectively.  Any adjustments must be made prior to  enabling  the  end‐
312       point.  The CQ entry 'buf' will point to a buffer of received data.  If
313       the sent message is larger than  the  buffered  amount,  the  CQ  entry
314       'flags'  will  have  the FI_MORE bit set.  When the FI_MORE bit is set,
315       'buf' will reference at least FI_OPT_BUFFERED_MIN bytes  of  data  (see
316       fi_endpoint.3 for more info).
317
318       After  being notified that a buffered receive has arrived, applications
319       must either claim or discard the message.   Typically,  small  messages
320       are  processed and discarded, while large messages are claimed.  Howev‐
321       er, an application is free to claim or discard any  message  regardless
322       of message size.
323
324       To  claim  a message, an application must post a receive operation with
325       the FI_CLAIM flag set.  The struct fi_recv_context returned as part  of
326       the  notification  must be provided as the receive operation's context.
327       The struct fi_recv_context contains a  'context'  field.   Applications
328       may  modify  this  field prior to claiming the message.  When the claim
329       operation completes, a standard receive completion entry will be gener‐
330       ated on the completion queue.  The 'context' of the associated CQ entry
331       will be set to the 'context' value passed in through  the  fi_recv_con‐
332       text structure, and the CQ entry flags will have the FI_CLAIM bit set.
333
334       Buffered  receives that are not claimed must be discarded by the appli‐
335       cation when it is done processing the CQ entry data.  To discard a mes‐
336       sage,  an application must post a receive operation with the FI_DISCARD
337       flag set.  The struct fi_recv_context returned as part of the notifica‐
338       tion  must  be  provided  as the receive operation's context.  When the
339       FI_DISCARD flag is set for a receive operation, the receive input  buf‐
340       fer(s) and length parameters are ignored.
341
342       IMPORTANT:  Buffered  receives must be claimed or discarded in a timely
343       manner.  Failure to do so may result in increased memory usage for net‐
344       work  buffering  or  communication stalls.  Once a buffered receive has
345       been claimed or discarded,  the  original  CQ  entry  'buf'  or  struct
346       fi_recv_context data may no longer be accessed by the application.
347
348       The  use  of  the  FI_CLAIM  and FI_DISCARD operation flags is also de‐
349       scribed with  respect  to  tagged  message  transfers  in  fi_tagged.3.
350       Buffered  receives  of  tagged messages will include the message tag as
351       part of the CQ entry, if available.
352
353       The handling of buffered receives follows all message ordering restric‐
354       tions  assigned  to an endpoint.  For example, completions may indicate
355       the order in which received messages arrived at the receiver  based  on
356       the endpoint attributes.
357

Variable Length Messages

359       Variable  length  messages,  or simply variable messages, are transfers
360       where the size of the message is unknown to the receiver prior  to  the
361       message  being sent.  It indicates that the recipient of a message does
362       not know the amount of data to expect prior to  the  message  arriving.
363       It  is  most  commonly  used  when the size of message transfers varies
364       greatly, with very large messages interspersed with much  smaller  mes‐
365       sages,  making  receive  side  message  buffering  difficult to manage.
366       Variable messages are not subject to max  message  length  restrictions
367       (i.e.   struct  fi_ep_attr::max_msg_size  limits), and may be up to the
368       maximum value of size_t (e.g.  SIZE_MAX) in length.
369
370       Variable length messages support requests that  the  provider  allocate
371       and  manage  the network message buffers.  As a result, the application
372       requirements and provider behavior is identical as  those  defined  for
373       supporting  the  FI_BUFFERED_RECV  mode  bit.  See the Buffered Receive
374       section above for details.  The main difference is  that  buffered  re‐
375       ceives  are  limited by the fi_ep_attr::max_msg_size threshold, whereas
376       variable length messages are not.
377
378       Support for variable messages is indicated through the  FI_VARIABLE_MSG
379       capability bit.
380

NOTES

382       If  an endpoint has been configured with FI_MSG_PREFIX, the application
383       must include buffer space of size msg_prefix_size, as specified by  the
384       endpoint  attributes.  The prefix buffer must occur at the start of the
385       data referenced by the buf parameter, or be referenced by the first  IO
386       vector.   Message prefix space cannot be split between multiple IO vec‐
387       tors.  The size of the prefix buffer should be included as part of  the
388       total buffer length.
389

RETURN VALUE

391       Returns 0 on success.  On error, a negative value corresponding to fab‐
392       ric errno is returned.  Fabric errno values are defined in  rdma/fi_er‐
393       rno.h.
394
395       See the discussion below for details handling FI_EAGAIN.
396

ERRORS

398       -FI_EAGAIN
399              Indicates  that  the underlying provider currently lacks the re‐
400              sources needed to initiate the requested operation.  The reasons
401              for  a provider returning FI_EAGAIN are varied.  However, common
402              reasons include insufficient internal buffering or full process‐
403              ing queues.
404
405       Insufficient  internal  buffering  is  often associated with operations
406       that use FI_INJECT.  In such cases,  additional  buffering  may  become
407       available as posted operations complete.
408
409       Full  processing  queues may be a temporary state related to local pro‐
410       cessing (for example, a large message is being transferred), or may  be
411       the  result of flow control.  In the latter case, the queues may remain
412       blocked until additional resources are made  available  at  the  remote
413       side of the transfer.
414
415       In  all  cases, the operation may be retried after additional resources
416       become available.  It is strongly recommended that  applications  check
417       for transmit and receive completions after receiving FI_EAGAIN as a re‐
418       turn value, independent of the operation which failed.  This is partic‐
419       ularly  important  in  cases  where manual progress is employed, as ac‐
420       knowledgements or flow control messages may need to be processed in or‐
421       der to resume execution.
422

SEE ALSO

424       fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cq(3)
425

AUTHORS

427       OpenFabrics.
428
429
430
431Libfabric Programmer's Manual     2019-09-27                         fi_msg(3)
Impressum