fi_cq(3) - f31

1fi_cq(3)                       Libfabric v1.8.0                       fi_cq(3)
2
3
4

NAME

6       fi_cq - Completion queue operations
7
8       fi_cq_open / fi_close
9              Open/close a completion queue
10
11       fi_control
12              Control CQ operation or attributes.
13
14       fi_cq_read / fi_cq_readfrom / fi_cq_readerr
15              Read a completion from a completion queue
16
17       fi_cq_sread / fi_cq_sreadfrom
18              A  synchronous (blocking) read that waits until a specified con‐
19              dition has been met before reading a completion from  a  comple‐
20              tion queue.
21
22       fi_cq_signal
23              Unblock any thread waiting in fi_cq_sread or fi_cq_sreadfrom.
24
25       fi_cq_strerror
26              Converts  provider  specific  error information into a printable
27              string
28

SYNOPSIS

30              #include <rdma/fi_domain.h>
31
32              int fi_cq_open(struct fid_domain *domain, struct fi_cq_attr *attr,
33                  struct fid_cq **cq, void *context);
34
35              int fi_close(struct fid *cq);
36
37              int fi_control(struct fid *cq, int command, void *arg);
38
39              ssize_t fi_cq_read(struct fid_cq *cq, void *buf, size_t count);
40
41              ssize_t fi_cq_readfrom(struct fid_cq *cq, void *buf, size_t count,
42                  fi_addr_t *src_addr);
43
44              ssize_t fi_cq_readerr(struct fid_cq *cq, struct fi_cq_err_entry *buf,
45                  uint64_t flags);
46
47              ssize_t fi_cq_sread(struct fid_cq *cq, void *buf, size_t count,
48                  const void *cond, int timeout);
49
50              ssize_t fi_cq_sreadfrom(struct fid_cq *cq, void *buf, size_t count,
51                  fi_addr_t *src_addr, const void *cond, int timeout);
52
53              int fi_cq_signal(struct fid_cq *cq);
54
55              const char * fi_cq_strerror(struct fid_cq *cq, int prov_errno,
56                    const void *err_data, char *buf, size_t len);
57

ARGUMENTS

59       domain Open resource domain
60
61       cq     Completion queue
62
63       attr   Completion queue attributes
64
65       context
66              User specified context associated with the completion queue.
67
68       buf    For read calls, the data buffer to write completions into.   For
69              write  calls,  a completion to insert into the completion queue.
70              For fi_cq_strerror, an optional buffer that  receives  printable
71              error information.
72
73       count  Number of CQ entries.
74
75       len    Length of data buffer
76
77       src_addr
78              Source address of a completed receive operation
79
80       flags  Additional flags to apply to the operation
81
82       command
83              Command of control operation to perform on CQ.
84
85       arg    Optional control argument
86
87       cond   Condition that must be met before a completion is generated
88
89       timeout
90              Time  in milliseconds to wait.  A negative value indicates infi‐
91              nite timeout.
92
93       prov_errno
94              Provider specific error value
95
96       err_data
97              Provider specific error data related to a completion
98

DESCRIPTION

100       Completion queues are used to report events associated with data trans‐
101       fers.   They are associated with message sends and receives, RMA, atom‐
102       ic, tagged messages, and triggered events.  Reported events are usually
103       associated with a fabric endpoint, but may also refer to memory regions
104       used as the target of an RMA or atomic operation.
105
106   fi_cq_open
107       fi_cq_open allocates a new completion queue.  Unlike event queues, com‐
108       pletion  queues  are  associated with a resource domain and may be off‐
109       loaded entirely in provider hardware.
110
111       The properties and behavior  of  a  completion  queue  are  defined  by
112       struct fi_cq_attr.
113
114              struct fi_cq_attr {
115                  size_t               size;      /* # entries for CQ */
116                  uint64_t             flags;     /* operation flags */
117                  enum fi_cq_format    format;    /* completion format */
118                  enum fi_wait_obj     wait_obj;  /* requested wait object */
119                  int                  signaling_vector; /* interrupt affinity */
120                  enum fi_cq_wait_cond wait_cond; /* wait condition format */
121                  struct fid_wait     *wait_set;  /* optional wait set */
122              };
123
124       size   Specifies  the minimum size of a completion queue.  A value of 0
125              indicates that the provider may choose a default value.
126
127       flags  Flags that control the configuration of the CQ.
128
129       - FI_AFFINITY
130              Indicates that the signaling_vector field (see below) is valid.
131
132       format Completion queues allow the application to select the amount  of
133              detail  that it must store and report.  The format attribute al‐
134              lows the application to select one of  several  completion  for‐
135              mats,  indicating  the structure of the data that the completion
136              queue should return when read.  Supported formats and the struc‐
137              tures  that correspond to each are listed below.  The meaning of
138              the CQ entry fields are defined in the  Completion  Fields  sec‐
139              tion.
140
141       - FI_CQ_FORMAT_UNSPEC
142              If  an  unspecified  format is requested, then the CQ will use a
143              provider selected default format.
144
145       - FI_CQ_FORMAT_CONTEXT
146              Provides only user specified context that  was  associated  with
147              the completion.
148
149              struct fi_cq_entry {
150                  void     *op_context; /* operation context */
151              };
152              · .RS 2
153
154       FI_CQ_FORMAT_MSG
155              Provides  minimal data for processing completions, with expanded
156              support for reporting information about received messages.
157
158              struct fi_cq_msg_entry {
159                  void     *op_context; /* operation context */
160                  uint64_t flags;       /* completion flags */
161                  size_t   len;         /* size of received data */
162              };
163              · .RS 2
164
165       FI_CQ_FORMAT_DATA
166              Provides data associated with a  completion.   Includes  support
167              for  received  message length, remote CQ data, and multi-receive
168              buffers.
169
170              struct fi_cq_data_entry {
171                  void     *op_context; /* operation context */
172                  uint64_t flags;       /* completion flags */
173                  size_t   len;         /* size of received data */
174                  void     *buf;        /* receive data buffer */
175                  uint64_t data;        /* completion data */
176              };
177              · .RS 2
178
179       FI_CQ_FORMAT_TAGGED
180              Expands completion data to include support for the  tagged  mes‐
181              sage interfaces.
182
183              struct fi_cq_tagged_entry {
184                  void     *op_context; /* operation context */
185                  uint64_t flags;       /* completion flags */
186                  size_t   len;         /* size of received data */
187                  void     *buf;        /* receive data buffer */
188                  uint64_t data;        /* completion data */
189                  uint64_t tag;         /* received tag */
190              };
191
192       wait_obj
193              CQ's  may  be  associated with a specific wait object.  Wait ob‐
194              jects allow applications to block until the wait object is  sig‐
195              naled,  indicating  that  a  completion is available to be read.
196              Users may use fi_control to retrieve the underlying wait  object
197              associated  with a CQ, in order to use it in other system calls.
198              The following values may be used to specify the type of wait ob‐
199              ject   associated   with  a  CQ:  FI_WAIT_NONE,  FI_WAIT_UNSPEC,
200              FI_WAIT_SET, FI_WAIT_FD, and FI_WAIT_MUTEX_COND.  The default is
201              FI_WAIT_NONE.
202
203       - FI_WAIT_NONE
204              Used to indicate that the user will not block (wait) for comple‐
205              tions on the CQ.  When FI_WAIT_NONE is specified,  the  applica‐
206              tion may not call fi_cq_sread or fi_cq_sreadfrom.
207
208       - FI_WAIT_UNSPEC
209              Specifies  that  the  user will only wait on the CQ using fabric
210              interface calls, such as  fi_cq_sread  or  fi_cq_sreadfrom.   In
211              this case, the underlying provider may select the most appropri‐
212              ate or highest performing wait object available, including  cus‐
213              tom  wait  mechanisms.   Applications that select FI_WAIT_UNSPEC
214              are not guaranteed to retrieve the underlying wait object.
215
216       - FI_WAIT_SET
217              Indicates that the completion queue should use a wait set object
218              to  wait for completions.  If specified, the wait_set field must
219              reference an existing wait set object.
220
221       - FI_WAIT_FD
222              Indicates that the CQ should use a file descriptor as  its  wait
223              mechanism.   A file descriptor wait object must be usable in se‐
224              lect, poll, and epoll routines.  However, a provider may  signal
225              an  FD  wait object by marking it as readable, writable, or with
226              an error.
227
228       - FI_WAIT_MUTEX_COND
229              Specifies that the CQ should use a pthread mutex and cond  vari‐
230              able as a wait object.
231
232       - FI_WAIT_CRITSEC_COND
233              Windows  specific.   Specifies that the CQ should use a critical
234              section and condition variable as a wait object.
235
236       signaling_vector
237              If the FI_AFFINITY flag is set, this indicates the  logical  cpu
238              number  (0..max  cpu - 1) that interrupts associated with the CQ
239              should target.  This field should be treated as a  hint  to  the
240              provider and may be ignored if the provider does not support in‐
241              terrupt affinity.
242
243       wait_cond
244              By default, when a completion is inserted into a  CQ  that  sup‐
245              ports  blocking  reads (fi_cq_sread/fi_cq_sreadfrom), the corre‐
246              sponding wait object is signaled.  Users may specify a condition
247              that must first be met before the wait is satisfied.  This field
248              indicates how the provider  should  interpret  the  cond  field,
249              which describes the condition needed to signal the wait object.
250
251       A  wait  condition should be treated as an optimization.  Providers are
252       not required to meet the requirements of the condition before signaling
253       the  wait object.  Applications should not rely on the condition neces‐
254       sarily being true when a blocking read call returns.
255
256       If wait_cond is set to FI_CQ_COND_NONE, then no  additional  conditions
257       are  applied  to the signaling of the CQ wait object, and the insertion
258       of any new entry will trigger the wait condition.  If wait_cond is  set
259       to FI_CQ_COND_THRESHOLD, then the cond field is interpreted as a size_t
260       threshold value.  The threshold indicates the number  of  entries  that
261       are to be queued before at the CQ before the wait is satisfied.
262
263       This field is ignored if wait_obj is set to FI_WAIT_NONE.
264
265       wait_set
266              If  wait_obj is FI_WAIT_SET, this field references a wait object
267              to which the completion queue should attach.  When an  event  is
268              inserted  into  the completion queue, the corresponding wait set
269              will be signaled if all necessary conditions are met.   The  use
270              of  a wait_set enables an optimized method of waiting for events
271              across multiple event and completion queues.  This field is  ig‐
272              nored if wait_obj is not FI_WAIT_SET.
273
274   fi_close
275       The  fi_close  call releases all resources associated with a completion
276       queue.  Any completions which remain on the CQ when it  is  closed  are
277       lost.
278
279       When  closing  the CQ, there must be no opened endpoints, transmit con‐
280       texts, or receive contexts associated with the CQ.   If  resources  are
281       still  associated  with  the CQ when attempting to close, the call will
282       return -FI_EBUSY.
283
284   fi_control
285       The fi_control call is used to access provider or  implementation  spe‐
286       cific  details of the completion queue.  Access to the CQ should be se‐
287       rialized across all calls when fi_control is invoked, as it  may  redi‐
288       rect  the  implementation of CQ operations.  The following control com‐
289       mands are usable with a CQ.
290
291       FI_GETWAIT (void **)
292              This command allows the user to retrieve the low-level wait  ob‐
293              ject  associated  with the CQ.  The format of the wait-object is
294              specified during CQ creation, through the  CQ  attributes.   The
295              fi_control arg parameter should be an address where a pointer to
296              the returned wait object will be written.  See fi_eq.3 for addi‐
297              tion details using fi_control with FI_GETWAIT.
298
299   fi_cq_read
300       The fi_cq_read operation performs a non-blocking read of completion da‐
301       ta from the CQ.  The format of the completion event is determined using
302       the  fi_cq_format  option  that  was  specified when the CQ was opened.
303       Multiple completions may be retrieved from a CQ in a single call.   The
304       maximum  number  of entries to return is limited to the specified count
305       parameter, with the number of entries successfully read from the CQ re‐
306       turned by the call.  (See return values section below.)
307
308       CQs are optimized to report operations which have completed successful‐
309       ly.  Operations which fail are reported 'out of band'.  Such operations
310       are retrieved using the fi_cq_readerr function.  When an operation that
311       has completed with an unexpected error is encountered, it is placed in‐
312       to a temporary error queue.  Attempting to read from a CQ while an item
313       is in the error queue results in fi_cq_read failing with a return  code
314       of -FI_EAVAIL.  Applications may use this return code to determine when
315       to call fi_cq_readerr.
316
317   fi_cq_readfrom
318       The fi_cq_readfrom call behaves identical to fi_cq_read, with  the  ex‐
319       ception  that  it allows the CQ to return source address information to
320       the user for any received data.  Source address data is only  available
321       for   those   endpoints   configured  with  FI_SOURCE  capability.   If
322       fi_cq_readfrom is called on an endpoint for which source addressing da‐
323       ta  is  not  available,  the  source address will be set to FI_ADDR_NO‐
324       TAVAIL.  The number of input src_addr entries must be the same  as  the
325       count parameter.
326
327       Returned  source  addressing  data is converted from the native address
328       used by the underlying fabric into an fi_addr_t, which may be  used  in
329       transmit operations.  Under most circumstances, returning fi_addr_t re‐
330       quires that the source address already have been inserted into the  ad‐
331       dress  vector associated with the receiving endpoint.  This is true for
332       address  vectors  of  type  FI_AV_TABLE.   In  select  providers   when
333       FI_AV_MAP  is  used,  source addresses may be converted algorithmically
334       into a usable fi_addr_t, even though the source address  has  not  been
335       inserted  into the address vector.  This is permitted by the API, as it
336       allows the provider to avoid address look-up as part of receive message
337       processing.   In no case do providers insert addresses into an AV sepa‐
338       rate from an application calling fi_av_insert or similar call.
339
340       For endpoints allocated using  the  FI_SOURCE_ERR  capability,  if  the
341       source  address  cannot  be  converted  into  a  valid fi_addr_t value,
342       fi_cq_readfrom will return -FI_EAVAIL, even if the data  were  received
343       successfully.  The completion will then be reported through fi_cq_read‐
344       err with error code -FI_EADDRNOTAVAIL.  See fi_cq_readerr for details.
345
346       If FI_SOURCE is specified without FI_SOURCE_ERR, source addresses which
347       cannot  be mapped to a usable fi_addr_t will be reported as FI_ADDR_NO‐
348       TAVAIL.
349
350   fi_cq_sread / fi_cq_sreadfrom
351       The fi_cq_sread and fi_cq_sreadfrom calls are the  blocking  equivalent
352       operations to fi_cq_read and fi_cq_readfrom.  Their behavior is similar
353       to the non-blocking calls, with the exception that the calls  will  not
354       return  until either a completion has been read from the CQ or an error
355       or timeout occurs.
356
357       Threads blocking in this function will return to the caller if they are
358       signaled by some external source.  This is true even if the timeout has
359       not occurred or was specified as infinite.
360
361       It is invalid for applications to call these functions if  the  CQ  has
362       been configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
363
364   fi_cq_readerr
365       The read error function, fi_cq_readerr, retrieves information regarding
366       any asynchronous operation which has completed with an  unexpected  er‐
367       ror.   fi_cq_readerr  is  a  non-blocking  call,  returning immediately
368       whether an error completion was found or not.
369
370       Error information is reported to the user through  struct fi_cq_err_en‐
371       try.  The format of this structure is defined below.
372
373              struct fi_cq_err_entry {
374                  void     *op_context; /* operation context */
375                  uint64_t flags;       /* completion flags */
376                  size_t   len;         /* size of received data */
377                  void     *buf;        /* receive data buffer */
378                  uint64_t data;        /* completion data */
379                  uint64_t tag;         /* message tag */
380                  size_t   olen;        /* overflow length */
381                  int      err;         /* positive error code */
382                  int      prov_errno;  /* provider error code */
383                  void    *err_data;    /*  error data */
384                  size_t   err_data_size; /* size of err_data */
385              };
386
387       The  general  reason  for  the error is provided through the err field.
388       Provider specific error information may also be available  through  the
389       prov_errno  and err_data fields.  Users may call fi_cq_strerror to con‐
390       vert provider specific error information into a  printable  string  for
391       debugging  purposes.   See  field details below for more information on
392       the use of err_data and err_data_size.
393
394       Note that error completions are generated for all operations, including
395       those  for  which  a completion was not requested (e.g.  an endpoint is
396       configured with FI_SELECTIVE_COMPLETION, but the request did  not  have
397       the  FI_COMPLETION  flag set).  In such cases, providers will return as
398       much information as made available by the underlying software and hard‐
399       ware  about  the  failure, other fields will be set to NULL or 0.  This
400       includes the op_context value, which may not have been provided or  was
401       ignored on input as part of the transfer.
402
403       Notable completion error codes are given below.
404
405       FI_EADDRNOTAVAIL
406              This  error code is used by CQs configured with FI_SOURCE_ERR to
407              report completions for which a usable fi_addr_t  source  address
408              could not be found.  An error code of FI_EADDRNOTAVAIL indicates
409              that the data transfer was successfully received and  processed,
410              with the fi_cq_err_entry fields containing information about the
411              completion.  The err_data field will be set to  the  source  ad‐
412              dress  data.   The  source address will be in the same format as
413              specified through the fi_info addr_format field for  the  opened
414              domain.   This  may be passed directly into an fi_av_insert call
415              to add the source address to the address vector.
416
417   fi_cq_signal
418       The fi_cq_signal call will unblock any thread waiting in fi_cq_sread or
419       fi_cq_sreadfrom.   This may be used to wake-up a thread that is blocked
420       waiting to read a completion operation.  The fi_cq_signal operation  is
421       only available if the CQ was configured with a wait object.
422

COMPLETION FIELDS

424       The  CQ entry data structures share many of the same fields.  The mean‐
425       ings of these fields are the same for all CQ entry structure formats.
426
427       op_context
428              The operation context is the application specified context value
429              that  was  provided with an asynchronous operation.  The op_con‐
430              text field is valid for all completions that are associated with
431              an asynchronous operation.
432
433       For  completion events that are not associated with a posted operation,
434       this field will be set to NULL.  This includes completions generated at
435       the  target  in  response  to  RMA  write operations that carry CQ data
436       (FI_REMOTE_WRITE | FI_REMOTE_CQ_DATA flags set), when the FI_RX_CQ_DATA
437       mode bit is not required.
438
439       flags  This  specifies  flags  associated with the completed operation.
440              The Completion Flags section  below  lists  valid  flag  values.
441              Flags are set for all relevant completions.
442
443       len    This  len  field  only  applies  to completed receive operations
444              (e.g.  fi_recv, fi_trecv, etc.).  It indicates the size  of  re‐
445              ceived message data -- i.e.  how many data bytes were placed in‐
446              to  the   associated   receive   buffer   by   a   corresponding
447              fi_send/fi_tsend/et al call.  If an endpoint has been configured
448              with the FI_MSG_PREFIX mode, the len also reflects the  size  of
449              the prefix buffer.
450
451       buf    The  buf  field  is only valid for completed receive operations,
452              and only applies when the receive buffer  was  posted  with  the
453              FI_MULTI_RECV  flag.   In  this case, buf points to the starting
454              location where the receive data was placed.
455
456       data   The data field is only valid if the FI_REMOTE_CQ_DATA completion
457              flag is set, and only applies to receive completions.  If FI_RE‐
458              MOTE_CQ_DATA is set, this field will contain the completion data
459              provided  by  the  peer  as part of their transmit request.  The
460              completion data will be given in host byte order.
461
462       tag    A tag applies only to received messages  that  occur  using  the
463              tagged interfaces.  This field contains the tag that was includ‐
464              ed with the received message.  The tag will be in host byte  or‐
465              der.
466
467       olen   The  olen field applies to received messages.  It is used to in‐
468              dicate that a received message has overrun the available  buffer
469              space  and has been truncated.  The olen specifies the amount of
470              data that did not fit into the available receive buffer and  was
471              discarded.
472
473       err    This  err code is a positive fabric errno associated with a com‐
474              pletion.  The err value indicates the general reason for an  er‐
475              ror, if one occurred.  See fi_errno.3 for a list of possible er‐
476              ror codes.
477
478       prov_errno
479              On an error, prov_errno may contain a  provider  specific  error
480              code.  The use of this field and its meaning is provider specif‐
481              ic.  It is  intended  to  be  used  as  a  debugging  aid.   See
482              fi_cq_strerror  for  additional details on converting this error
483              value into a human readable string.
484
485       err_data
486              On an error, err_data may reference a provider  specific  amount
487              of data associated with an error.  The use of this field and its
488              meaning is provider specific.  It is intended to be  used  as  a
489              debugging  aid.   See  fi_cq_strerror  for additional details on
490              converting this error data into a human readable string.
491
492       err_data_size
493              On input, err_data_size indicates the size of the err_data  buf‐
494              fer  in bytes.  On output, err_data_size will be set to the num‐
495              ber of bytes copied to the err_data buffer.  The err_data infor‐
496              mation  is typically used with fi_cq_strerror to provide details
497              about the type of error that occurred.
498
499       For compatibility purposes, if err_data_size is 0 on input, or the fab‐
500       ric  was opened with release < 1.5, err_data will be set to a data buf‐
501       fer owned by the provider.  The contents  of  the  buffer  will  remain
502       valid  until  a subsequent read call against the CQ.  Applications must
503       serialize access to the CQ when processing errors to  ensure  that  the
504       buffer referenced by err_data does not change.
505

COMPLETION FLAGS

507       Completion flags provide additional details regarding the completed op‐
508       eration.  The following completion flags are defined.
509
510       FI_SEND
511              Indicates that the completion was for a  send  operation.   This
512              flag may be combined with an FI_MSG or FI_TAGGED flag.
513
514       FI_RECV
515              Indicates that the completion was for a receive operation.  This
516              flag may be combined with an FI_MSG or FI_TAGGED flag.
517
518       FI_RMA Indicates that an RMA operation completed.   This  flag  may  be
519              combined  with  an  FI_READ, FI_WRITE, FI_REMOTE_READ, or FI_RE‐
520              MOTE_WRITE flag.
521
522       FI_ATOMIC
523              Indicates that an atomic operation completed.  This flag may  be
524              combined  with  an  FI_READ, FI_WRITE, FI_REMOTE_READ, or FI_RE‐
525              MOTE_WRITE flag.
526
527       FI_MSG Indicates that a message-based operation completed.   This  flag
528              may be combined with an FI_SEND or FI_RECV flag.
529
530       FI_TAGGED
531              Indicates  that a tagged message operation completed.  This flag
532              may be combined with an FI_SEND or FI_RECV flag.
533
534       FI_MULTICAST
535              Indicates that a multicast operation completed.  This  flag  may
536              be  combined  with FI_MSG and relevant flags.  This flag is only
537              guaranteed to be valid for received messages if the endpoint has
538              been configured with FI_SOURCE.
539
540       FI_READ
541              Indicates  that a locally initiated RMA or atomic read operation
542              has completed.  This flag may be  combined  with  an  FI_RMA  or
543              FI_ATOMIC flag.
544
545       FI_WRITE
546              Indicates that a locally initiated RMA or atomic write operation
547              has completed.  This flag may be  combined  with  an  FI_RMA  or
548              FI_ATOMIC flag.
549
550       FI_REMOTE_READ
551              Indicates that a remotely initiated RMA or atomic read operation
552              has completed.  This flag may be  combined  with  an  FI_RMA  or
553              FI_ATOMIC flag.
554
555       FI_REMOTE_WRITE
556              Indicates  that  a remotely initiated RMA or atomic write opera‐
557              tion has completed.  This flag may be combined with an FI_RMA or
558              FI_ATOMIC flag.
559
560       FI_REMOTE_CQ_DATA
561              This  indicates  that remote CQ data is available as part of the
562              completion.
563
564       FI_MULTI_RECV
565              This flag applies to receive buffers that were posted  with  the
566              FI_MULTI_RECV flag set.  This completion flag indicates that the
567              original receive buffer referenced by the  completion  has  been
568              consumed  and  was  released by the provider.  Providers may set
569              this flag on the last message that is received into  the  multi-
570              recv  buffer,  or  may generate a separate completion that indi‐
571              cates that the buffer has been released.
572
573       Applications can distinguish between these two cases by  examining  the
574       completion  entry  flags  field.  If additional flags, such as FI_RECV,
575       are set, the completion is associated with a received message.  In this
576       case, the buf field will reference the location where the received mes‐
577       sage was placed into the multi-recv buffer.  Other fields in  the  com‐
578       pletion  entry  will  be  determined based on the received message.  If
579       other flag bits are zero, the provider is reporting that the multi-recv
580       buffer  has  been  released, and the completion entry is not associated
581       with a received message.
582
583       FI_MORE
584              See the 'Buffered Receives' section in fi_msg(3)  for  more  de‐
585              tails.  This flag is associated with receive completions on end‐
586              points that have FI_BUFFERED_RECV mode  enabled.   When  set  to
587              one,  it  indicates that the buffer referenced by the completion
588              is limited by the FI_OPT_BUFFERED_LIMIT threshold, and addition‐
589              al  message  data  must be retrieved by the application using an
590              FI_CLAIM operation.
591
592       FI_CLAIM
593              See the 'Buffered Receives' section in fi_msg(3)  for  more  de‐
594              tails.   This flag is set on completions associated with receive
595              operations that claim buffered receive  data.   Note  that  this
596              flag   only   applies   to   endpoints   configured   with   the
597              FI_BUFFERED_RECV mode bit.
598

COMPLETION EVENT SEMANTICS

600       Libfabric defines several completion 'levels', identified using  opera‐
601       tional  flags.  Each flag indicates the soonest that a completion event
602       may be generated by a provider, and the assumptions that an application
603       may  make  upon processing a completion.  The operational flags are de‐
604       fined below, along with an example of how a  provider  might  implement
605       the  semantic.   Note that only meeting the semantic is required of the
606       provider and not the implementation.  Providers may implement  stronger
607       completion semantics than necessary for a given operation, but only the
608       behavior defined by the completion level is guaranteed.
609
610       To help understand the conceptual  differences  in  completion  levels,
611       consider  mailing  a letter.  Placing the letter into the local mailbox
612       for pick-up is similar to 'inject complete'.  Having the letter  picked
613       up  and dropped off at the destination mailbox is equivalent to 'trans‐
614       mit complete'.  The 'delivery complete' semantic is a stronger  guaran‐
615       tee, with a person at the destination signing for the letter.  However,
616       the person who signed for the letter is not  necessarily  the  intended
617       recipient.   The  'match  complete'  option is similar to delivery com‐
618       plete, but requires the intended recipient to sign for the letter.
619
620       The 'commit complete' level has different semantics than the previously
621       mentioned levels.  Commit complete would be closer to the letter arriv‐
622       ing at the destination and being placed into a fire proof safe.
623
624       The operational flags for the described completion levels  are  defined
625       below.
626
627       FI_INJECT_COMPLETE
628              Indicates  that a completion should be generated when the source
629              buffer(s) may be reused.  A completion guarantees that the  buf‐
630              fers will not be read from again and the application may reclaim
631              them.  No other guarantees are made with respect to the state of
632              the operation.
633
634       Example:  A  provider  may generate this completion event after copying
635       the source buffer into a network buffer, either in host  memory  or  on
636       the NIC.  An inject completion does not indicate that the data has been
637       transmitted onto the network, and a local error could occur  after  the
638       completion  event  has  been generated that could prevent it from being
639       transmitted.
640
641       Inject complete allows  for  the  fastest  completion  reporting  (and,
642       hence,  buffer reuse), but provides the weakest guarantees against net‐
643       work errors.
644
645       Note: This flag is used to control when a completion entry is  inserted
646       into  a  completion queue.  It does not apply to operations that do not
647       generate a completion queue entry, such as the fi_inject operation, and
648       is not subject to the inject_size message limit restriction.
649
650       FI_TRANSMIT_COMPLETE
651              Indicates  that a completion should be generated when the trans‐
652              mit operation has completed relative to the local provider.  The
653              exact behavior is dependent on the endpoint type.
654
655       For reliable endpoints:
656
657       Indicates  that a completion should be generated when the operation has
658       been delivered to the peer endpoint.  A completion guarantees that  the
659       operation is no longer dependent on the fabric or local resources.  The
660       state of the operation at the peer endpoint is not defined.
661
662       Example: A provider may generate a transmit complete event upon receiv‐
663       ing  an  ack  from  the peer endpoint.  The state of the message at the
664       peer is unknown and may be buffered in the target NIC at the  time  the
665       ack has been generated.
666
667       For unreliable endpoints:
668
669       Indicates  that a completion should be generated when the operation has
670       been delivered to the fabric.  A completion guarantees that the  opera‐
671       tion is no longer dependent on local resources.  The state of the oper‐
672       ation within the fabric is not defined.
673
674       FI_DELIVERY_COMPLETE
675              Indicates that a completion should not be generated until an op‐
676              eration  has  been  processed by the destination endpoint(s).  A
677              completion guarantees that the result of the operation is avail‐
678              able; however, additional steps may need to be taken at the des‐
679              tination to retrieve the results.  For example,  an  application
680              may  need to provide a receive buffers in order to retrieve mes‐
681              sages that were buffered by the provider.
682
683       Delivery complete indicates that the message has been processed by  the
684       peer.  If an application buffer was ready to receive the results of the
685       message when it arrived, then delivery complete indicates that the data
686       was placed into the application's buffer.
687
688       This  completion  mode  applies only to reliable endpoints.  For opera‐
689       tions that return data to the initiator, such  as  RMA  read  or  atom‐
690       ic-fetch,  the  source  endpoint  is also considered a destination end‐
691       point.  This is the default completion mode for such operations.
692
693       FI_MATCH_COMPLETE
694              Indicates that a completion should be generated only  after  the
695              operation has been matched with an application specified buffer.
696              Operations using this completion semantic are dependent  on  the
697              application at the target claiming the message or results.  As a
698              result, match complete may involve additional provider level ac‐
699              knowledgements or lengthy delays.  However, this completion mod‐
700              el enables peer applications to synchronize their execution.
701
702       FI_COMMIT_COMPLETE
703              Indicates that a completion should not be generated (locally  or
704              at  the  peer)  until  the result of an operation have been made
705              persistent.  A completion guarantees that  the  result  is  both
706              available and durable, in the case of power failure.
707
708       This  completion mode applies only to operations that target persistent
709       memory regions over reliable endpoints.  This completion mode is exper‐
710       imental.
711

NOTES

713       A  completion  queue must be bound to at least one enabled endpoint be‐
714       fore any operation such  as  fi_cq_read,  fi_cq_readfrom,  fi_cq_sread,
715       fi_cq_sreadfrom etc.  can be called on it.
716
717       Completion flags may be suppressed if the FI_NOTIFY_FLAGS_ONLY mode bit
718       has been set.  When enabled, only the following flags are guaranteed to
719       be  set  in  completion  data  when  they are valid: FI_REMOTE_READ and
720       FI_REMOTE_WRITE (when FI_RMA_EVENT capability bit has been set), FI_RE‐
721       MOTE_CQ_DATA, and FI_MULTI_RECV.
722
723       If  a  completion  queue  has  been  overrun, it will be placed into an
724       'overrun' state.  Read operations will continue to  return  any  valid,
725       non-corrupted  completions,  if available.  After all valid completions
726       have been retrieved, any attempt to read the CQ will result in  it  re‐
727       turning an FI_EOVERRUN error event.  Overrun completion queues are con‐
728       sidered fatal and may not be used to report additional completions once
729       the overrun occurs.
730

RETURN VALUES

732       fi_cq_open / fi_cq_signal
733              Returns  0 on success.  On error, a negative value corresponding
734              to fabric errno is returned.
735
736       fi_cq_read / fi_cq_readfrom / fi_cq_readerr fi_cq_sread /  fi_cq_sread‐
737       from  :  On  success, returns the number of completion events retrieved
738       from the completion queue.  On error, a negative value corresponding to
739       fabric  errno  is  returned.  If no completions are available to return
740       from the CQ, -FI_EAGAIN will be returned.
741
742       fi_cq_sread / fi_cq_sreadfrom
743              On success, returns the number of  completion  events  retrieved
744              from  the  completion  queue.  On error, a negative value corre‐
745              sponding to fabric errno is returned.  If the timeout expires or
746              the  calling  thread  is signaled and no data is available to be
747              read from the completion queue, -FI_EAGAIN is returned.
748
749       fi_cq_strerror
750              Returns a character string interpretation of the  provider  spe‐
751              cific error returned with a completion.
752
753       Fabric errno values are defined in rdma/fi_errno.h.
754

AUTHORS

760       OpenFabrics.
761
762
763
764Libfabric Programmer's Manual     2019-02-27                          fi_cq(3)