fi_sendv(3)

1fi_msg(3)                      Libfabric v1.7.0                      fi_msg(3)
2
3
4

NAME

6       fi_msg - Message data transfer operations
7
8       fi_recv / fi_recvv / fi_recvmsg
9              Post a buffer to receive an incoming message
10
11       fi_send / fi_sendv / fi_sendmsg
12       fi_inject / fi_senddata : Initiate an operation to send a message
13

SYNOPSIS

15              #include <rdma/fi_endpoint.h>
16
17              ssize_t fi_recv(struct fid_ep *ep, void * buf, size_t len,
18                  void *desc, fi_addr_t src_addr, void *context);
19
20              ssize_t fi_recvv(struct fid_ep *ep, const struct iovec *iov, void **desc,
21                  size_t count, fi_addr_t src_addr, void *context);
22
23              ssize_t fi_recvmsg(struct fid_ep *ep, const struct fi_msg *msg,
24                  uint64_t flags);
25
26              ssize_t fi_send(struct fid_ep *ep, const void *buf, size_t len,
27                  void *desc, fi_addr_t dest_addr, void *context);
28
29              ssize_t fi_sendv(struct fid_ep *ep, const struct iovec *iov,
30                  void **desc, size_t count, fi_addr_t dest_addr, void *context);
31
32              ssize_t fi_sendmsg(struct fid_ep *ep, const struct fi_msg *msg,
33                  uint64_t flags);
34
35              ssize_t fi_inject(struct fid_ep *ep, const void *buf, size_t len,
36                  fi_addr_t dest_addr);
37
38              ssize_t fi_senddata(struct fid_ep *ep, const void *buf, size_t len,
39                  void *desc, uint64_t data, fi_addr_t dest_addr, void *context);
40
41              ssize_t fi_injectdata(struct fid_ep *ep, const void *buf, size_t len,
42                  uint64_t data, fi_addr_t dest_addr);
43

ARGUMENTS

45       ep     Fabric  endpoint  on which to initiate send or post receive buf‐
46              fer.
47
48       buf    Data buffer to send or receive.
49
50       len    Length of data buffer to send or receive,  specified  in  bytes.
51              Valid   transfers   are  from  0  bytes  up  to  the  endpoint's
52              max_msg_size.
53
54       iov    Vectored data buffer.
55
56       count  Count of vectored data entries.
57
58       desc   Descriptor associated with the data buffer
59
60       data   Remote CQ data to transfer with the sent message.
61
62       dest_addr
63              Destination address for connectionless transfers.   Ignored  for
64              connected endpoints.
65
66       src_addr
67              Source  address  to  receive  from for connectionless transfers.
68              Applies only to connectionless  endpoints  with  the  FI_DIRECT‐
69              ED_RECV capability enabled, otherwise this field is ignored.  If
70              set to FI_ADDR_UNSPEC, any source address may match.
71
72       msg    Message descriptor for send and receive operations.
73
74       flags  Additional flags to apply for the send or receive operation.
75
76       context
77              User specified pointer to associate with  the  operation.   This
78              parameter  is  ignored if the operation will not generate a suc‐
79              cessful completion, unless an op flag specifies the context  pa‐
80              rameter be used for required input.
81

DESCRIPTION

83       The  send  functions  --  fi_send, fi_sendv, fi_sendmsg, fi_inject, and
84       fi_senddata -- are used to transmit a message from one endpoint to  an‐
85       other  endpoint.   The  main  difference between send functions are the
86       number and type of parameters that they accept  as  input.   Otherwise,
87       they perform the same general function.  Messages sent using fi_msg op‐
88       erations are received by a remote endpoint into a buffer posted to  re‐
89       ceive such messages.
90
91       The  receive  functions -- fi_recv, fi_recvv, fi_recvmsg -- post a data
92       buffer to an endpoint to receive inbound messages.  Similar to the send
93       operations,  receive  operations  operate asynchronously.  Users should
94       not touch the posted data buffer(s) until  the  receive  operation  has
95       completed.
96
97       An  endpoint must be enabled before an application can post send or re‐
98       ceive operations to it.  For connected endpoints, receive  buffers  may
99       be  posted  prior  to  connect  or accept being called on the endpoint.
100       This ensures that buffers are available to receive incoming data  imme‐
101       diately after the connection has been established.
102
103       Completed  message  operations  are reported to the user through one or
104       more event collectors associated with the endpoint.  Users provide con‐
105       text  which  are associated with each operation, and is returned to the
106       user as part of the event completion.  See fi_cq for  completion  event
107       details.
108
109   fi_send
110       The call fi_send transfers the data contained in the user-specified da‐
111       ta buffer to a remote endpoint, with  message  boundaries  being  main‐
112       tained.   For connection based endpoints (FI_EP_MSG) the local endpoint
113       must be connected to a remote endpoint or destination before fi_send is
114       called.   Unless the endpoint has been configured differently, the data
115       buffer passed into fi_send must not be touched by the application until
116       the fi_send call completes asynchronously.
117
118   fi_sendv
119       The  fi_sendv  call  adds support for a scatter-gather list to fi_send.
120       The fi_sendv transfers the set of data buffers referenced  by  the  iov
121       parameter to a remote endpoint as a single message.
122
123   fi_sendmsg
124       The fi_sendmsg call supports data transfers over both connected and un‐
125       connected endpoints, with the ability to control the send operation per
126       call  through  the  use  of  flags.   The  fi_sendmsg  function takes a
127       struct fi_msg as input.
128
129              struct fi_msg {
130                  const struct iovec *msg_iov; /* scatter-gather array */
131                  void               **desc;   /* local request descriptors */
132                  size_t             iov_count;/* # elements in iov */
133                  fi_addr_t          addr;     /* optional endpoint address */
134                  void               *context; /* user-defined context */
135                  uint64_t           data;     /* optional message data */
136              };
137
138   fi_inject
139       The send inject call is an optimized version of fi_send.  The fi_inject
140       function  behaves  as  if  the  FI_INJECT  transfer  flag were set, and
141       FI_COMPLETION were not.  That is, the data buffer is available for  re‐
142       use  immediately  on  returning from fi_inject, and no completion event
143       will be generated for this send.  The completion  event  will  be  sup‐
144       pressed even if the CQ was bound without FI_SELECTIVE_COMPLETION or the
145       endpoint's op_flags contain FI_COMPLETION.  See  the  flags  discussion
146       below  for  more  details.  The requested message size that can be used
147       with fi_inject is limited by inject_size.
148
149   fi_senddata
150       The send data call is similar to fi_send, but allows for the sending of
151       remote CQ data (see FI_REMOTE_CQ_DATA flag) as part of the transfer.
152
153   fi_injectdata
154       The  inject data call is similar to fi_inject, but allows for the send‐
155       ing of remote CQ data (see  FI_REMOTE_CQ_DATA  flag)  as  part  of  the
156       transfer.
157
158   fi_recv
159       The fi_recv call posts a data buffer to the receive queue of the corre‐
160       sponding endpoint.  Posted receives are searched in the order in  which
161       they were posted in order to match sends.  Message boundaries are main‐
162       tained.  The order in which the receives complete is dependent  on  the
163       endpoint  type  and  protocol.  For unconnected endpoints, the src_addr
164       parameter can be used to indicate that a buffer should be posted to re‐
165       ceive incoming data from a specific remote endpoint.
166
167   fi_recvv
168       The  fi_recvv  call  adds support for a scatter-gather list to fi_recv.
169       The fi_recvv posts the set of data buffers referenced by the iov param‐
170       eter to a receive incoming data.
171
172   fi_recvmsg
173       The  fi_recvmsg  call  supports posting buffers over both connected and
174       unconnected endpoints, with the ability to control the  receive  opera‐
175       tion  per call through the use of flags.  The fi_recvmsg function takes
176       a struct fi_msg as input.
177

FLAGS

179       The fi_recvmsg and fi_sendmsg calls allow the  user  to  specify  flags
180       which  can  change the default message handling of the endpoint.  Flags
181       specified with fi_recvmsg / fi_sendmsg override most  flags  previously
182       configured  with  the endpoint, except where noted (see fi_endpoint.3).
183       The  following  list  of  flags  are  usable  with  fi_recvmsg   and/or
184       fi_sendmsg.
185
186       FI_REMOTE_CQ_DATA
187              Applies to fi_sendmsg and fi_senddata.  Indicates that remote CQ
188              data is available and should be sent as  part  of  the  request.
189              See fi_getinfo for additional details on FI_REMOTE_CQ_DATA.
190
191       FI_CLAIM
192              Applies  to  posted  receive operations for endpoints configured
193              for FI_BUFFERED_RECV or FI_VARIABLE_MSG.  This flag is  used  to
194              retrieve  a  message that was buffered by the provider.  See the
195              Buffered Receives section for details.
196
197       FI_COMPLETION
198              Indicates that a completion entry should be  generated  for  the
199              specified operation.  The endpoint must be bound to a completion
200              queue with FI_SELECTIVE_COMPLETION that corresponds to the spec‐
201              ified operation, or this flag is ignored.
202
203       FI_DISCARD
204              Applies  to  posted  receive operations for endpoints configured
205              for FI_BUFFERED_RECV or FI_VARIABLE_MSG.  This flag is  used  to
206              free  a  message  that  was  buffered  by the provider.  See the
207              Buffered Receives section for details.
208
209       FI_MORE
210              Indicates that the user has additional requests that will  imme‐
211              diately  be  posted after the current call returns.  Use of this
212              flag may improve performance by enabling the provider  to  opti‐
213              mize its access to the fabric hardware.
214
215       FI_INJECT
216              Applies  to fi_sendmsg.  Indicates that the outbound data buffer
217              should be returned to user immediately after the send  call  re‐
218              turns,  even  if  the operation is handled asynchronously.  This
219              may require that the underlying provider implementation copy the
220              data  into a local buffer and transfer out of that buffer.  This
221              flag can only be used with messages smaller than inject_size.
222
223       FI_MULTI_RECV
224              Applies to posted receive operations.  This flag allows the user
225              to post a single buffer that will receive multiple incoming mes‐
226              sages.  Received messages will be packed into the receive buffer
227              until  the buffer has been consumed.  Use of this flag may cause
228              a single posted receive operation to generate multiple events as
229              messages  are placed into the buffer.  The placement of received
230              data into the buffer  may  be  subjected  to  provider  specific
231              alignment restrictions.
232
233       The  buffer  will be released by the provider when the available buffer
234       space falls below the specified  minimum  (see  FI_OPT_MIN_MULTI_RECV).
235       Note  that an entry to the associated receive completion queue will al‐
236       ways be generated when the buffer has been consumed, even if other  re‐
237       ceive  completions  have been suppressed (i.e.  the Rx context has been
238       configured for FI_SELECTIVE_COMPLETION).  See the FI_MULTI_RECV comple‐
239       tion flag fi_cq(3).
240
241       FI_INJECT_COMPLETE
242              Applies  to  fi_sendmsg.   Indicates that a completion should be
243              generated when the source buffer(s) may be reused.
244
245       FI_TRANSMIT_COMPLETE
246              Applies to fi_sendmsg.  Indicates that a completion  should  not
247              be generated until the operation has been successfully transmit‐
248              ted and is no longer being tracked by the provider.
249
250       FI_DELIVERY_COMPLETE
251              Applies to fi_sendmsg.  Indicates that a  completion  should  be
252              generated  when the operation has been processed by the destina‐
253              tion.
254
255       FI_FENCE
256              Applies to transmits.  Indicates that the  requested  operation,
257              also known as the fenced operation, and any operation posted af‐
258              ter the fenced operation will be deferred until all previous op‐
259              erations targeting the same peer endpoint have completed.  Oper‐
260              ations posted after the fencing will see and/or replace the  re‐
261              sults of any operations initiated prior to the fenced operation.
262
263       The ordering of operations starting at the posting of the fenced opera‐
264       tion (inclusive) to the posting of a subsequent fenced  operation  (ex‐
265       clusive) is controlled by the endpoint's ordering semantics.
266
267       FI_MULTICAST
268              Applies  to  transmits.   This  flag  indicates that the address
269              specified as the data transfer destination is  a  multicast  ad‐
270              dress.   This  flag  must be used in all multicast transfers, in
271              conjunction with a multicast fi_addr_t.
272

Buffered Receives

274       Buffered receives indicate that the networking layer allocates and man‐
275       ages the data buffers used to receive network data transfers.  As a re‐
276       sult, received messages must be copied from the  network  buffers  into
277       application  buffers  for  processing.  However, applications can avoid
278       this copy if they are able to process the message  in  place  (directly
279       from the networking buffers).
280
281       Handling buffered receives differs based on the size of the message be‐
282       ing sent.  In general, smaller messages are passed directly to the  ap‐
283       plication  for processing.  However, for large messages, an application
284       will only receive the start of the message and  must  claim  the  rest.
285       The  details for how small messages are reported and large messages may
286       be claimed are described below.
287
288       When a provider receives a message, it will write an entry to the  com‐
289       pletion  queue  associated with the receiving endpoint.  For discussion
290       purposes,  the  completion  queue  is  assumed  to  be  configured  for
291       FI_CQ_FORMAT_DATA.  Since buffered receives are not associated with ap‐
292       plication posted buffers, the CQ  entry  op_context  will  point  to  a
293       struct fi_recv_context.
294
295              struct fi_recv_context {
296                  struct fid_ep *ep;
297                  void *context;
298              };
299
300       The  'ep' field will point to the receiving endpoint or Rx context, and
301       'context' will be NULL.  The CQ entry's 'buf' will point to a  provider
302       managed  buffer where the start of the received message is located, and
303       'len' will be set to the total size of the message.
304
305       The maximum sized message that a provider can buffer is limited  by  an
306       FI_OPT_BUFFERED_LIMIT.   This  threshold can be obtained and may be ad‐
307       justed by the application using the fi_getopt and fi_setopt calls,  re‐
308       spectively.   Any  adjustments  must be made prior to enabling the end‐
309       point.  The CQ entry 'buf' will point to a buffer of received data.  If
310       the  sent  message  is  larger  than  the buffered amount, the CQ entry
311       'flags' will have the FI_MORE bit set.  When the FI_MORE  bit  is  set,
312       'buf'  will  reference  at least FI_OPT_BUFFERED_MIN bytes of data (see
313       fi_endpoint.3 for more info).
314
315       After being notified that a buffered receive has arrived,  applications
316       must  either  claim  or discard the message.  Typically, small messages
317       are processed and discarded, while large messages are claimed.   Howev‐
318       er,  an  application is free to claim or discard any message regardless
319       of message size.
320
321       To claim a message, an application must post a receive  operation  with
322       the  FI_CLAIM flag set.  The struct fi_recv_context returned as part of
323       the notification must be provided as the receive  operation's  context.
324       The  struct  fi_recv_context  contains a 'context' field.  Applications
325       may modify this field prior to claiming the message.   When  the  claim
326       operation completes, a standard receive completion entry will be gener‐
327       ated on the completion queue.  The 'context' of the associated CQ entry
328       will  be  set to the 'context' value passed in through the fi_recv_con‐
329       text structure, and the CQ entry flags will have the FI_CLAIM bit set.
330
331       Buffered receives that are not claimed must be discarded by the  appli‐
332       cation when it is done processing the CQ entry data.  To discard a mes‐
333       sage, an application must post a receive operation with the  FI_DISCARD
334       flag set.  The struct fi_recv_context returned as part of the notifica‐
335       tion must be provided as the receive  operation's  context.   When  the
336       FI_DISCARD  flag is set for a receive operation, the receive input buf‐
337       fer(s) and length parameters are ignored.
338
339       IMPORTANT: Buffered receives must be claimed or discarded in  a  timely
340       manner.  Failure to do so may result in increased memory usage for net‐
341       work buffering or communication stalls.  Once a  buffered  receive  has
342       been  claimed  or  discarded,  the  original  CQ  entry 'buf' or struct
343       fi_recv_context data may no longer be accessed by the application.
344
345       The use of the FI_CLAIM and FI_DISCARD  operation  flags  is  also  de‐
346       scribed  with  respect  to  tagged  message  transfers  in fi_tagged.3.
347       Buffered receives of tagged messages will include the  message  tag  as
348       part of the CQ entry, if available.
349
350       The handling of buffered receives follows all message ordering restric‐
351       tions assigned to an endpoint.  For example, completions  may  indicate
352       the  order  in which received messages arrived at the receiver based on
353       the endpoint attributes.
354

Variable Length Messages

356       Variable length messages, or simply variable  messages,  are  transfers
357       where  the  size of the message is unknown to the receiver prior to the
358       message being sent.  It indicates that the recipient of a message  does
359       not  know  the  amount of data to expect prior to the message arriving.
360       It is most commonly used when the  size  of  message  transfers  varies
361       greatly,  with  very large messages interspersed with much smaller mes‐
362       sages, making receive  side  message  buffering  difficult  to  manage.
363       Variable  messages  are  not subject to max message length restrictions
364       (i.e.  struct fi_ep_attr::max_msg_size limits), and may be  up  to  the
365       maximum value of size_t (e.g.  SIZE_MAX) in length.
366
367       Variable  length  messages  support requests that the provider allocate
368       and manage the network message buffers.  As a result,  the  application
369       requirements  and  provider  behavior is identical as those defined for
370       supporting the FI_BUFFERED_RECV mode bit.   See  the  Buffered  Receive
371       section  above  for  details.  The main difference is that buffered re‐
372       ceives are limited by the fi_ep_attr::max_msg_size  threshold,  whereas
373       variable length messages are not.
374
375       Support  for variable messages is indicated through the FI_VARIABLE_MSG
376       capability bit.
377

NOTES

379       If an endpoint has been configured with FI_MSG_PREFIX, the  application
380       must  include buffer space of size msg_prefix_size, as specified by the
381       endpoint attributes.  The prefix buffer must occur at the start of  the
382       data  referenced by the buf parameter, or be referenced by the first IO
383       vector.  Message prefix space cannot be split between multiple IO  vec‐
384       tors.   The size of the prefix buffer should be included as part of the
385       total buffer length.
386

RETURN VALUE

388       Returns 0 on success.  On error, a negative value corresponding to fab‐
389       ric  errno is returned.  Fabric errno values are defined in rdma/fi_er‐
390       rno.h.
391
392       See the discussion below for details handling FI_EAGAIN.
393

ERRORS

395       -FI_EAGAIN
396              Indicates that the underlying provider currently lacks  the  re‐
397              sources needed to initiate the requested operation.  The reasons
398              for a provider returning FI_EAGAIN are varied.  However,  common
399              reasons include insufficient internal buffering or full process‐
400              ing queues.
401
402       Insufficient internal buffering is  often  associated  with  operations
403       that  use  FI_INJECT.   In  such cases, additional buffering may become
404       available as posted operations complete.
405
406       Full processing queues may be a temporary state related to  local  pro‐
407       cessing  (for example, a large message is being transferred), or may be
408       the result of flow control.  In the latter case, the queues may  remain
409       blocked  until  additional  resources  are made available at the remote
410       side of the transfer.
411
412       In all cases, the operation may be retried after  additional  resources
413       become  available.   It is strongly recommended that applications check
414       for transmit and receive completions after receiving FI_EAGAIN as a re‐
415       turn value, independent of the operation which failed.  This is partic‐
416       ularly important in cases where manual progress  is  employed,  as  ac‐
417       knowledgements or flow control messages may need to be processed in or‐
418       der to resume execution.
419

AUTHORS

424       OpenFabrics.
425
426
427
428Libfabric Programmer's Manual     2018-11-28                         fi_msg(3)