1fi_msg(3)                    Libfabric v1.12.0rc1                    fi_msg(3)
2
3
4

NAME

6       fi_msg - Message data transfer operations
7
8       fi_recv / fi_recvv / fi_recvmsg
9              Post a buffer to receive an incoming message
10
11       fi_send  /  fi_sendv / fi_sendmsg fi_inject / fi_senddata : Initiate an
12       operation to send a message
13

SYNOPSIS

15              #include <rdma/fi_endpoint.h>
16
17              ssize_t fi_recv(struct fid_ep *ep, void * buf, size_t len,
18                  void *desc, fi_addr_t src_addr, void *context);
19
20              ssize_t fi_recvv(struct fid_ep *ep, const struct iovec *iov, void **desc,
21                  size_t count, fi_addr_t src_addr, void *context);
22
23              ssize_t fi_recvmsg(struct fid_ep *ep, const struct fi_msg *msg,
24                  uint64_t flags);
25
26              ssize_t fi_send(struct fid_ep *ep, const void *buf, size_t len,
27                  void *desc, fi_addr_t dest_addr, void *context);
28
29              ssize_t fi_sendv(struct fid_ep *ep, const struct iovec *iov,
30                  void **desc, size_t count, fi_addr_t dest_addr, void *context);
31
32              ssize_t fi_sendmsg(struct fid_ep *ep, const struct fi_msg *msg,
33                  uint64_t flags);
34
35              ssize_t fi_inject(struct fid_ep *ep, const void *buf, size_t len,
36                  fi_addr_t dest_addr);
37
38              ssize_t fi_senddata(struct fid_ep *ep, const void *buf, size_t len,
39                  void *desc, uint64_t data, fi_addr_t dest_addr, void *context);
40
41              ssize_t fi_injectdata(struct fid_ep *ep, const void *buf, size_t len,
42                  uint64_t data, fi_addr_t dest_addr);
43

ARGUMENTS

45       ep     Fabric endpoint on which to initiate send or post  receive  buf‐
46              fer.
47
48       buf    Data buffer to send or receive.
49
50       len    Length  of  data  buffer to send or receive, specified in bytes.
51              Valid  transfers  are  from  0  bytes  up  to   the   endpoint's
52              max_msg_size.
53
54       iov    Vectored data buffer.
55
56       count  Count of vectored data entries.
57
58       desc   Descriptor associated with the data buffer.  See fi_mr(3).
59
60       data   Remote CQ data to transfer with the sent message.
61
62       dest_addr
63              Destination  address  for connectionless transfers.  Ignored for
64              connected endpoints.
65
66       src_addr
67              Source address to receive  from  for  connectionless  transfers.
68              Applies  only  to  connectionless  endpoints with the FI_DIRECT‐
69              ED_RECV capability enabled, otherwise this field is ignored.  If
70              set to FI_ADDR_UNSPEC, any source address may match.
71
72       msg    Message descriptor for send and receive operations.
73
74       flags  Additional flags to apply for the send or receive operation.
75
76       context
77              User  specified  pointer  to associate with the operation.  This
78              parameter is ignored if the operation will not generate  a  suc‐
79              cessful  completion, unless an op flag specifies the context pa‐
80              rameter be used for required input.
81

DESCRIPTION

83       The send functions -- fi_send,  fi_sendv,  fi_sendmsg,  fi_inject,  and
84       fi_senddata  -- are used to transmit a message from one endpoint to an‐
85       other endpoint.  The main difference between  send  functions  are  the
86       number  and  type  of parameters that they accept as input.  Otherwise,
87       they perform the same general function.  Messages sent using fi_msg op‐
88       erations  are received by a remote endpoint into a buffer posted to re‐
89       ceive such messages.
90
91       The receive functions -- fi_recv, fi_recvv, fi_recvmsg -- post  a  data
92       buffer to an endpoint to receive inbound messages.  Similar to the send
93       operations, receive operations operate  asynchronously.   Users  should
94       not  touch  the  posted  data buffer(s) until the receive operation has
95       completed.
96
97       An endpoint must be enabled before an application can post send or  re‐
98       ceive  operations  to it.  For connected endpoints, receive buffers may
99       be posted prior to connect or accept  being  called  on  the  endpoint.
100       This  ensures that buffers are available to receive incoming data imme‐
101       diately after the connection has been established.
102
103       Completed message operations are reported to the user  through  one  or
104       more event collectors associated with the endpoint.  Users provide con‐
105       text which are associated with each operation, and is returned  to  the
106       user  as  part of the event completion.  See fi_cq for completion event
107       details.
108
109   fi_send
110       The call fi_send transfers the data contained in the user-specified da‐
111       ta  buffer  to  a  remote endpoint, with message boundaries being main‐
112       tained.
113
114   fi_sendv
115       The fi_sendv call adds support for a scatter-gather  list  to  fi_send.
116       The  fi_sendv  transfers  the set of data buffers referenced by the iov
117       parameter to a remote endpoint as a single message.
118
119   fi_sendmsg
120       The fi_sendmsg call supports data transfers  over  both  connected  and
121       connectionless  endpoints,  with the ability to control the send opera‐
122       tion per call through the use of flags.  The fi_sendmsg function  takes
123       a struct fi_msg as input.
124
125              struct fi_msg {
126                  const struct iovec *msg_iov; /* scatter-gather array */
127                  void               **desc;   /* local request descriptors */
128                  size_t             iov_count;/* # elements in iov */
129                  fi_addr_t          addr;     /* optional endpoint address */
130                  void               *context; /* user-defined context */
131                  uint64_t           data;     /* optional message data */
132              };
133
134   fi_inject
135       The  send  inject call is an optimized version of fi_send with the fol‐
136       lowing characteristics.  The data buffer is available for reuse immedi‐
137       ately  on  return from the call, and no CQ entry will be written if the
138       transfer completes successfully.
139
140       Conceptually, this means that the fi_inject function behaves as if  the
141       FI_INJECT  transfer  flag  were set, selective completions are enabled,
142       and the FI_COMPLETION flag is not specified.  Note that  the  CQ  entry
143       will  be  suppressed even if the default behavior of the endpoint is to
144       write CQ entries for all successful completions.  See the flags discus‐
145       sion  below  for  more details.  The requested message size that can be
146       used with fi_inject is limited by inject_size.
147
148   fi_senddata
149       The send data call is similar to fi_send, but allows for the sending of
150       remote CQ data (see FI_REMOTE_CQ_DATA flag) as part of the transfer.
151
152   fi_injectdata
153       The  inject data call is similar to fi_inject, but allows for the send‐
154       ing of remote CQ data (see  FI_REMOTE_CQ_DATA  flag)  as  part  of  the
155       transfer.
156
157   fi_recv
158       The fi_recv call posts a data buffer to the receive queue of the corre‐
159       sponding endpoint.  Posted receives are searched in the order in  which
160       they were posted in order to match sends.  Message boundaries are main‐
161       tained.  The order in which the receives complete is dependent  on  the
162       endpoint type and protocol.  For connectionless endpoints, the src_addr
163       parameter can be used to indicate that a buffer should be posted to re‐
164       ceive incoming data from a specific remote endpoint.
165
166   fi_recvv
167       The  fi_recvv  call  adds support for a scatter-gather list to fi_recv.
168       The fi_recvv posts the set of data buffers referenced by the iov param‐
169       eter to a receive incoming data.
170
171   fi_recvmsg
172       The  fi_recvmsg  call  supports posting buffers over both connected and
173       connectionless endpoints, with the ability to control the receive oper‐
174       ation per call through the use of flags.  The fi_recvmsg function takes
175       a struct fi_msg as input.
176

FLAGS

178       The fi_recvmsg and fi_sendmsg calls allow the  user  to  specify  flags
179       which  can  change the default message handling of the endpoint.  Flags
180       specified with fi_recvmsg / fi_sendmsg override most  flags  previously
181       configured  with  the endpoint, except where noted (see fi_endpoint.3).
182       The  following  list  of  flags  are  usable  with  fi_recvmsg   and/or
183       fi_sendmsg.
184
185       FI_REMOTE_CQ_DATA
186              Applies to fi_sendmsg and fi_senddata.  Indicates that remote CQ
187              data is available and should be sent as  part  of  the  request.
188              See fi_getinfo for additional details on FI_REMOTE_CQ_DATA.
189
190       FI_CLAIM
191              Applies  to  posted  receive operations for endpoints configured
192              for FI_BUFFERED_RECV or FI_VARIABLE_MSG.  This flag is  used  to
193              retrieve  a  message that was buffered by the provider.  See the
194              Buffered Receives section for details.
195
196       FI_COMPLETION
197              Indicates that a completion entry should be  generated  for  the
198              specified operation.  The endpoint must be bound to a completion
199              queue with FI_SELECTIVE_COMPLETION that corresponds to the spec‐
200              ified operation, or this flag is ignored.
201
202       FI_DISCARD
203              Applies  to  posted  receive operations for endpoints configured
204              for FI_BUFFERED_RECV or FI_VARIABLE_MSG.  This flag is  used  to
205              free  a  message  that  was  buffered  by the provider.  See the
206              Buffered Receives section for details.
207
208       FI_MORE
209              Indicates that the user has additional requests that will  imme‐
210              diately  be  posted after the current call returns.  Use of this
211              flag may improve performance by enabling the provider  to  opti‐
212              mize its access to the fabric hardware.
213
214       FI_INJECT
215              Applies  to fi_sendmsg.  Indicates that the outbound data buffer
216              should be returned to user immediately after the send  call  re‐
217              turns,  even  if  the operation is handled asynchronously.  This
218              may require that the underlying provider implementation copy the
219              data  into a local buffer and transfer out of that buffer.  This
220              flag can only be used with messages smaller than inject_size.
221
222       FI_MULTI_RECV
223              Applies to posted receive operations.  This flag allows the user
224              to post a single buffer that will receive multiple incoming mes‐
225              sages.  Received messages will be packed into the receive buffer
226              until  the buffer has been consumed.  Use of this flag may cause
227              a single posted receive operation to generate multiple events as
228              messages  are placed into the buffer.  The placement of received
229              data into the buffer  may  be  subjected  to  provider  specific
230              alignment restrictions.
231
232       The  buffer  will be released by the provider when the available buffer
233       space falls below the specified  minimum  (see  FI_OPT_MIN_MULTI_RECV).
234       Note  that an entry to the associated receive completion queue will al‐
235       ways be generated when the buffer has been consumed, even if other  re‐
236       ceive  completions  have been suppressed (i.e.  the Rx context has been
237       configured for FI_SELECTIVE_COMPLETION).  See the FI_MULTI_RECV comple‐
238       tion flag fi_cq(3).
239
240       FI_INJECT_COMPLETE
241              Applies  to  fi_sendmsg.   Indicates that a completion should be
242              generated when the source buffer(s) may be reused.
243
244       FI_TRANSMIT_COMPLETE
245              Applies to fi_sendmsg.  Indicates that a completion  should  not
246              be generated until the operation has been successfully transmit‐
247              ted and is no longer being tracked by the provider.
248
249       FI_DELIVERY_COMPLETE
250              Applies to fi_sendmsg.  Indicates that a  completion  should  be
251              generated  when the operation has been processed by the destina‐
252              tion.
253
254       FI_FENCE
255              Applies to transmits.  Indicates that the  requested  operation,
256              also known as the fenced operation, and any operation posted af‐
257              ter the fenced operation will be deferred until all previous op‐
258              erations targeting the same peer endpoint have completed.  Oper‐
259              ations posted after the fencing will see and/or replace the  re‐
260              sults of any operations initiated prior to the fenced operation.
261
262       The ordering of operations starting at the posting of the fenced opera‐
263       tion (inclusive) to the posting of a subsequent fenced  operation  (ex‐
264       clusive) is controlled by the endpoint's ordering semantics.
265
266       FI_MULTICAST
267              Applies  to  transmits.   This  flag  indicates that the address
268              specified as the data transfer destination is  a  multicast  ad‐
269              dress.   This  flag  must be used in all multicast transfers, in
270              conjunction with a multicast fi_addr_t.
271

Buffered Receives

273       Buffered receives indicate that the networking layer allocates and man‐
274       ages the data buffers used to receive network data transfers.  As a re‐
275       sult, received messages must be copied from the  network  buffers  into
276       application  buffers  for  processing.  However, applications can avoid
277       this copy if they are able to process the message  in  place  (directly
278       from the networking buffers).
279
280       Handling buffered receives differs based on the size of the message be‐
281       ing sent.  In general, smaller messages are passed directly to the  ap‐
282       plication  for processing.  However, for large messages, an application
283       will only receive the start of the message and  must  claim  the  rest.
284       The  details for how small messages are reported and large messages may
285       be claimed are described below.
286
287       When a provider receives a message, it will write an entry to the  com‐
288       pletion  queue  associated with the receiving endpoint.  For discussion
289       purposes,  the  completion  queue  is  assumed  to  be  configured  for
290       FI_CQ_FORMAT_DATA.  Since buffered receives are not associated with ap‐
291       plication posted buffers, the CQ  entry  op_context  will  point  to  a
292       struct fi_recv_context.
293
294              struct fi_recv_context {
295                  struct fid_ep *ep;
296                  void *context;
297              };
298
299       The  'ep' field will point to the receiving endpoint or Rx context, and
300       'context' will be NULL.  The CQ entry's 'buf' will point to a  provider
301       managed  buffer where the start of the received message is located, and
302       'len' will be set to the total size of the message.
303
304       The maximum sized message that a provider can buffer is limited  by  an
305       FI_OPT_BUFFERED_LIMIT.   This  threshold can be obtained and may be ad‐
306       justed by the application using the fi_getopt and fi_setopt calls,  re‐
307       spectively.   Any  adjustments  must be made prior to enabling the end‐
308       point.  The CQ entry 'buf' will point to a buffer of received data.  If
309       the  sent  message  is  larger  than  the buffered amount, the CQ entry
310       'flags' will have the FI_MORE bit set.  When the FI_MORE  bit  is  set,
311       'buf'  will  reference  at least FI_OPT_BUFFERED_MIN bytes of data (see
312       fi_endpoint.3 for more info).
313
314       After being notified that a buffered receive has arrived,  applications
315       must  either  claim  or discard the message.  Typically, small messages
316       are processed and discarded, while large messages are claimed.   Howev‐
317       er,  an  application is free to claim or discard any message regardless
318       of message size.
319
320       To claim a message, an application must post a receive  operation  with
321       the  FI_CLAIM flag set.  The struct fi_recv_context returned as part of
322       the notification must be provided as the receive  operation's  context.
323       The  struct  fi_recv_context  contains a 'context' field.  Applications
324       may modify this field prior to claiming the message.   When  the  claim
325       operation completes, a standard receive completion entry will be gener‐
326       ated on the completion queue.  The 'context' of the associated CQ entry
327       will  be  set to the 'context' value passed in through the fi_recv_con‐
328       text structure, and the CQ entry flags will have the FI_CLAIM bit set.
329
330       Buffered receives that are not claimed must be discarded by the  appli‐
331       cation when it is done processing the CQ entry data.  To discard a mes‐
332       sage, an application must post a receive operation with the  FI_DISCARD
333       flag set.  The struct fi_recv_context returned as part of the notifica‐
334       tion must be provided as the receive  operation's  context.   When  the
335       FI_DISCARD  flag is set for a receive operation, the receive input buf‐
336       fer(s) and length parameters are ignored.
337
338       IMPORTANT: Buffered receives must be claimed or discarded in  a  timely
339       manner.  Failure to do so may result in increased memory usage for net‐
340       work buffering or communication stalls.  Once a  buffered  receive  has
341       been  claimed  or  discarded,  the  original  CQ  entry 'buf' or struct
342       fi_recv_context data may no longer be accessed by the application.
343
344       The use of the FI_CLAIM and FI_DISCARD  operation  flags  is  also  de‐
345       scribed  with  respect  to  tagged  message  transfers  in fi_tagged.3.
346       Buffered receives of tagged messages will include the  message  tag  as
347       part of the CQ entry, if available.
348
349       The handling of buffered receives follows all message ordering restric‐
350       tions assigned to an endpoint.  For example, completions  may  indicate
351       the  order  in which received messages arrived at the receiver based on
352       the endpoint attributes.
353

Variable Length Messages

355       Variable length messages, or simply variable  messages,  are  transfers
356       where  the  size of the message is unknown to the receiver prior to the
357       message being sent.  It indicates that the recipient of a message  does
358       not  know  the  amount of data to expect prior to the message arriving.
359       It is most commonly used when the  size  of  message  transfers  varies
360       greatly,  with  very large messages interspersed with much smaller mes‐
361       sages, making receive  side  message  buffering  difficult  to  manage.
362       Variable  messages  are  not subject to max message length restrictions
363       (i.e.  struct fi_ep_attr::max_msg_size limits), and may be  up  to  the
364       maximum value of size_t (e.g.  SIZE_MAX) in length.
365
366       Variable  length  messages  support requests that the provider allocate
367       and manage the network message buffers.  As a result,  the  application
368       requirements  and  provider  behavior is identical as those defined for
369       supporting the FI_BUFFERED_RECV mode bit.   See  the  Buffered  Receive
370       section  above  for  details.  The main difference is that buffered re‐
371       ceives are limited by the fi_ep_attr::max_msg_size  threshold,  whereas
372       variable length messages are not.
373
374       Support  for variable messages is indicated through the FI_VARIABLE_MSG
375       capability bit.
376

NOTES

378       If an endpoint has been configured with FI_MSG_PREFIX, the  application
379       must  include buffer space of size msg_prefix_size, as specified by the
380       endpoint attributes.  The prefix buffer must occur at the start of  the
381       data  referenced by the buf parameter, or be referenced by the first IO
382       vector.  Message prefix space cannot be split between multiple IO  vec‐
383       tors.   The size of the prefix buffer should be included as part of the
384       total buffer length.
385

RETURN VALUE

387       Returns 0 on success.  On error, a negative value corresponding to fab‐
388       ric  errno is returned.  Fabric errno values are defined in rdma/fi_er‐
389       rno.h.
390
391       See the discussion below for details handling FI_EAGAIN.
392

ERRORS

394       -FI_EAGAIN
395              Indicates that the underlying provider currently lacks  the  re‐
396              sources needed to initiate the requested operation.  The reasons
397              for a provider returning FI_EAGAIN are varied.  However,  common
398              reasons include insufficient internal buffering or full process‐
399              ing queues.
400
401       Insufficient internal buffering is  often  associated  with  operations
402       that  use  FI_INJECT.   In  such cases, additional buffering may become
403       available as posted operations complete.
404
405       Full processing queues may be a temporary state related to  local  pro‐
406       cessing  (for example, a large message is being transferred), or may be
407       the result of flow control.  In the latter case, the queues may  remain
408       blocked  until  additional  resources  are made available at the remote
409       side of the transfer.
410
411       In all cases, the operation may be retried after  additional  resources
412       become  available.   It is strongly recommended that applications check
413       for transmit and receive completions after receiving FI_EAGAIN as a re‐
414       turn value, independent of the operation which failed.  This is partic‐
415       ularly important in cases where manual progress  is  employed,  as  ac‐
416       knowledgements or flow control messages may need to be processed in or‐
417       der to resume execution.
418

SEE ALSO

420       fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cq(3)
421

AUTHORS

423       OpenFabrics.
424
425
426
427Libfabric Programmer's Manual     2020-10-14                         fi_msg(3)
Impressum