1fi_cq(3) Libfabric v1.8.0 fi_cq(3)
2
3
4
6 fi_cq - Completion queue operations
7
8 fi_cq_open / fi_close
9 Open/close a completion queue
10
11 fi_control
12 Control CQ operation or attributes.
13
14 fi_cq_read / fi_cq_readfrom / fi_cq_readerr
15 Read a completion from a completion queue
16
17 fi_cq_sread / fi_cq_sreadfrom
18 A synchronous (blocking) read that waits until a specified con‐
19 dition has been met before reading a completion from a comple‐
20 tion queue.
21
22 fi_cq_signal
23 Unblock any thread waiting in fi_cq_sread or fi_cq_sreadfrom.
24
25 fi_cq_strerror
26 Converts provider specific error information into a printable
27 string
28
30 #include <rdma/fi_domain.h>
31
32 int fi_cq_open(struct fid_domain *domain, struct fi_cq_attr *attr,
33 struct fid_cq **cq, void *context);
34
35 int fi_close(struct fid *cq);
36
37 int fi_control(struct fid *cq, int command, void *arg);
38
39 ssize_t fi_cq_read(struct fid_cq *cq, void *buf, size_t count);
40
41 ssize_t fi_cq_readfrom(struct fid_cq *cq, void *buf, size_t count,
42 fi_addr_t *src_addr);
43
44 ssize_t fi_cq_readerr(struct fid_cq *cq, struct fi_cq_err_entry *buf,
45 uint64_t flags);
46
47 ssize_t fi_cq_sread(struct fid_cq *cq, void *buf, size_t count,
48 const void *cond, int timeout);
49
50 ssize_t fi_cq_sreadfrom(struct fid_cq *cq, void *buf, size_t count,
51 fi_addr_t *src_addr, const void *cond, int timeout);
52
53 int fi_cq_signal(struct fid_cq *cq);
54
55 const char * fi_cq_strerror(struct fid_cq *cq, int prov_errno,
56 const void *err_data, char *buf, size_t len);
57
59 domain Open resource domain
60
61 cq Completion queue
62
63 attr Completion queue attributes
64
65 context
66 User specified context associated with the completion queue.
67
68 buf For read calls, the data buffer to write completions into. For
69 write calls, a completion to insert into the completion queue.
70 For fi_cq_strerror, an optional buffer that receives printable
71 error information.
72
73 count Number of CQ entries.
74
75 len Length of data buffer
76
77 src_addr
78 Source address of a completed receive operation
79
80 flags Additional flags to apply to the operation
81
82 command
83 Command of control operation to perform on CQ.
84
85 arg Optional control argument
86
87 cond Condition that must be met before a completion is generated
88
89 timeout
90 Time in milliseconds to wait. A negative value indicates infi‐
91 nite timeout.
92
93 prov_errno
94 Provider specific error value
95
96 err_data
97 Provider specific error data related to a completion
98
100 Completion queues are used to report events associated with data trans‐
101 fers. They are associated with message sends and receives, RMA, atom‐
102 ic, tagged messages, and triggered events. Reported events are usually
103 associated with a fabric endpoint, but may also refer to memory regions
104 used as the target of an RMA or atomic operation.
105
106 fi_cq_open
107 fi_cq_open allocates a new completion queue. Unlike event queues, com‐
108 pletion queues are associated with a resource domain and may be off‐
109 loaded entirely in provider hardware.
110
111 The properties and behavior of a completion queue are defined by
112 struct fi_cq_attr.
113
114 struct fi_cq_attr {
115 size_t size; /* # entries for CQ */
116 uint64_t flags; /* operation flags */
117 enum fi_cq_format format; /* completion format */
118 enum fi_wait_obj wait_obj; /* requested wait object */
119 int signaling_vector; /* interrupt affinity */
120 enum fi_cq_wait_cond wait_cond; /* wait condition format */
121 struct fid_wait *wait_set; /* optional wait set */
122 };
123
124 size Specifies the minimum size of a completion queue. A value of 0
125 indicates that the provider may choose a default value.
126
127 flags Flags that control the configuration of the CQ.
128
129 - FI_AFFINITY
130 Indicates that the signaling_vector field (see below) is valid.
131
132 format Completion queues allow the application to select the amount of
133 detail that it must store and report. The format attribute al‐
134 lows the application to select one of several completion for‐
135 mats, indicating the structure of the data that the completion
136 queue should return when read. Supported formats and the struc‐
137 tures that correspond to each are listed below. The meaning of
138 the CQ entry fields are defined in the Completion Fields sec‐
139 tion.
140
141 - FI_CQ_FORMAT_UNSPEC
142 If an unspecified format is requested, then the CQ will use a
143 provider selected default format.
144
145 - FI_CQ_FORMAT_CONTEXT
146 Provides only user specified context that was associated with
147 the completion.
148
149 struct fi_cq_entry {
150 void *op_context; /* operation context */
151 };
152 · .RS 2
153
154 FI_CQ_FORMAT_MSG
155 Provides minimal data for processing completions, with expanded
156 support for reporting information about received messages.
157
158 struct fi_cq_msg_entry {
159 void *op_context; /* operation context */
160 uint64_t flags; /* completion flags */
161 size_t len; /* size of received data */
162 };
163 · .RS 2
164
165 FI_CQ_FORMAT_DATA
166 Provides data associated with a completion. Includes support
167 for received message length, remote CQ data, and multi-receive
168 buffers.
169
170 struct fi_cq_data_entry {
171 void *op_context; /* operation context */
172 uint64_t flags; /* completion flags */
173 size_t len; /* size of received data */
174 void *buf; /* receive data buffer */
175 uint64_t data; /* completion data */
176 };
177 · .RS 2
178
179 FI_CQ_FORMAT_TAGGED
180 Expands completion data to include support for the tagged mes‐
181 sage interfaces.
182
183 struct fi_cq_tagged_entry {
184 void *op_context; /* operation context */
185 uint64_t flags; /* completion flags */
186 size_t len; /* size of received data */
187 void *buf; /* receive data buffer */
188 uint64_t data; /* completion data */
189 uint64_t tag; /* received tag */
190 };
191
192 wait_obj
193 CQ's may be associated with a specific wait object. Wait ob‐
194 jects allow applications to block until the wait object is sig‐
195 naled, indicating that a completion is available to be read.
196 Users may use fi_control to retrieve the underlying wait object
197 associated with a CQ, in order to use it in other system calls.
198 The following values may be used to specify the type of wait ob‐
199 ject associated with a CQ: FI_WAIT_NONE, FI_WAIT_UNSPEC,
200 FI_WAIT_SET, FI_WAIT_FD, and FI_WAIT_MUTEX_COND. The default is
201 FI_WAIT_NONE.
202
203 - FI_WAIT_NONE
204 Used to indicate that the user will not block (wait) for comple‐
205 tions on the CQ. When FI_WAIT_NONE is specified, the applica‐
206 tion may not call fi_cq_sread or fi_cq_sreadfrom.
207
208 - FI_WAIT_UNSPEC
209 Specifies that the user will only wait on the CQ using fabric
210 interface calls, such as fi_cq_sread or fi_cq_sreadfrom. In
211 this case, the underlying provider may select the most appropri‐
212 ate or highest performing wait object available, including cus‐
213 tom wait mechanisms. Applications that select FI_WAIT_UNSPEC
214 are not guaranteed to retrieve the underlying wait object.
215
216 - FI_WAIT_SET
217 Indicates that the completion queue should use a wait set object
218 to wait for completions. If specified, the wait_set field must
219 reference an existing wait set object.
220
221 - FI_WAIT_FD
222 Indicates that the CQ should use a file descriptor as its wait
223 mechanism. A file descriptor wait object must be usable in se‐
224 lect, poll, and epoll routines. However, a provider may signal
225 an FD wait object by marking it as readable, writable, or with
226 an error.
227
228 - FI_WAIT_MUTEX_COND
229 Specifies that the CQ should use a pthread mutex and cond vari‐
230 able as a wait object.
231
232 - FI_WAIT_CRITSEC_COND
233 Windows specific. Specifies that the CQ should use a critical
234 section and condition variable as a wait object.
235
236 signaling_vector
237 If the FI_AFFINITY flag is set, this indicates the logical cpu
238 number (0..max cpu - 1) that interrupts associated with the CQ
239 should target. This field should be treated as a hint to the
240 provider and may be ignored if the provider does not support in‐
241 terrupt affinity.
242
243 wait_cond
244 By default, when a completion is inserted into a CQ that sup‐
245 ports blocking reads (fi_cq_sread/fi_cq_sreadfrom), the corre‐
246 sponding wait object is signaled. Users may specify a condition
247 that must first be met before the wait is satisfied. This field
248 indicates how the provider should interpret the cond field,
249 which describes the condition needed to signal the wait object.
250
251 A wait condition should be treated as an optimization. Providers are
252 not required to meet the requirements of the condition before signaling
253 the wait object. Applications should not rely on the condition neces‐
254 sarily being true when a blocking read call returns.
255
256 If wait_cond is set to FI_CQ_COND_NONE, then no additional conditions
257 are applied to the signaling of the CQ wait object, and the insertion
258 of any new entry will trigger the wait condition. If wait_cond is set
259 to FI_CQ_COND_THRESHOLD, then the cond field is interpreted as a size_t
260 threshold value. The threshold indicates the number of entries that
261 are to be queued before at the CQ before the wait is satisfied.
262
263 This field is ignored if wait_obj is set to FI_WAIT_NONE.
264
265 wait_set
266 If wait_obj is FI_WAIT_SET, this field references a wait object
267 to which the completion queue should attach. When an event is
268 inserted into the completion queue, the corresponding wait set
269 will be signaled if all necessary conditions are met. The use
270 of a wait_set enables an optimized method of waiting for events
271 across multiple event and completion queues. This field is ig‐
272 nored if wait_obj is not FI_WAIT_SET.
273
274 fi_close
275 The fi_close call releases all resources associated with a completion
276 queue. Any completions which remain on the CQ when it is closed are
277 lost.
278
279 When closing the CQ, there must be no opened endpoints, transmit con‐
280 texts, or receive contexts associated with the CQ. If resources are
281 still associated with the CQ when attempting to close, the call will
282 return -FI_EBUSY.
283
284 fi_control
285 The fi_control call is used to access provider or implementation spe‐
286 cific details of the completion queue. Access to the CQ should be se‐
287 rialized across all calls when fi_control is invoked, as it may redi‐
288 rect the implementation of CQ operations. The following control com‐
289 mands are usable with a CQ.
290
291 FI_GETWAIT (void **)
292 This command allows the user to retrieve the low-level wait ob‐
293 ject associated with the CQ. The format of the wait-object is
294 specified during CQ creation, through the CQ attributes. The
295 fi_control arg parameter should be an address where a pointer to
296 the returned wait object will be written. See fi_eq.3 for addi‐
297 tion details using fi_control with FI_GETWAIT.
298
299 fi_cq_read
300 The fi_cq_read operation performs a non-blocking read of completion da‐
301 ta from the CQ. The format of the completion event is determined using
302 the fi_cq_format option that was specified when the CQ was opened.
303 Multiple completions may be retrieved from a CQ in a single call. The
304 maximum number of entries to return is limited to the specified count
305 parameter, with the number of entries successfully read from the CQ re‐
306 turned by the call. (See return values section below.)
307
308 CQs are optimized to report operations which have completed successful‐
309 ly. Operations which fail are reported 'out of band'. Such operations
310 are retrieved using the fi_cq_readerr function. When an operation that
311 has completed with an unexpected error is encountered, it is placed in‐
312 to a temporary error queue. Attempting to read from a CQ while an item
313 is in the error queue results in fi_cq_read failing with a return code
314 of -FI_EAVAIL. Applications may use this return code to determine when
315 to call fi_cq_readerr.
316
317 fi_cq_readfrom
318 The fi_cq_readfrom call behaves identical to fi_cq_read, with the ex‐
319 ception that it allows the CQ to return source address information to
320 the user for any received data. Source address data is only available
321 for those endpoints configured with FI_SOURCE capability. If
322 fi_cq_readfrom is called on an endpoint for which source addressing da‐
323 ta is not available, the source address will be set to FI_ADDR_NO‐
324 TAVAIL. The number of input src_addr entries must be the same as the
325 count parameter.
326
327 Returned source addressing data is converted from the native address
328 used by the underlying fabric into an fi_addr_t, which may be used in
329 transmit operations. Under most circumstances, returning fi_addr_t re‐
330 quires that the source address already have been inserted into the ad‐
331 dress vector associated with the receiving endpoint. This is true for
332 address vectors of type FI_AV_TABLE. In select providers when
333 FI_AV_MAP is used, source addresses may be converted algorithmically
334 into a usable fi_addr_t, even though the source address has not been
335 inserted into the address vector. This is permitted by the API, as it
336 allows the provider to avoid address look-up as part of receive message
337 processing. In no case do providers insert addresses into an AV sepa‐
338 rate from an application calling fi_av_insert or similar call.
339
340 For endpoints allocated using the FI_SOURCE_ERR capability, if the
341 source address cannot be converted into a valid fi_addr_t value,
342 fi_cq_readfrom will return -FI_EAVAIL, even if the data were received
343 successfully. The completion will then be reported through fi_cq_read‐
344 err with error code -FI_EADDRNOTAVAIL. See fi_cq_readerr for details.
345
346 If FI_SOURCE is specified without FI_SOURCE_ERR, source addresses which
347 cannot be mapped to a usable fi_addr_t will be reported as FI_ADDR_NO‐
348 TAVAIL.
349
350 fi_cq_sread / fi_cq_sreadfrom
351 The fi_cq_sread and fi_cq_sreadfrom calls are the blocking equivalent
352 operations to fi_cq_read and fi_cq_readfrom. Their behavior is similar
353 to the non-blocking calls, with the exception that the calls will not
354 return until either a completion has been read from the CQ or an error
355 or timeout occurs.
356
357 Threads blocking in this function will return to the caller if they are
358 signaled by some external source. This is true even if the timeout has
359 not occurred or was specified as infinite.
360
361 It is invalid for applications to call these functions if the CQ has
362 been configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
363
364 fi_cq_readerr
365 The read error function, fi_cq_readerr, retrieves information regarding
366 any asynchronous operation which has completed with an unexpected er‐
367 ror. fi_cq_readerr is a non-blocking call, returning immediately
368 whether an error completion was found or not.
369
370 Error information is reported to the user through struct fi_cq_err_en‐
371 try. The format of this structure is defined below.
372
373 struct fi_cq_err_entry {
374 void *op_context; /* operation context */
375 uint64_t flags; /* completion flags */
376 size_t len; /* size of received data */
377 void *buf; /* receive data buffer */
378 uint64_t data; /* completion data */
379 uint64_t tag; /* message tag */
380 size_t olen; /* overflow length */
381 int err; /* positive error code */
382 int prov_errno; /* provider error code */
383 void *err_data; /* error data */
384 size_t err_data_size; /* size of err_data */
385 };
386
387 The general reason for the error is provided through the err field.
388 Provider specific error information may also be available through the
389 prov_errno and err_data fields. Users may call fi_cq_strerror to con‐
390 vert provider specific error information into a printable string for
391 debugging purposes. See field details below for more information on
392 the use of err_data and err_data_size.
393
394 Note that error completions are generated for all operations, including
395 those for which a completion was not requested (e.g. an endpoint is
396 configured with FI_SELECTIVE_COMPLETION, but the request did not have
397 the FI_COMPLETION flag set). In such cases, providers will return as
398 much information as made available by the underlying software and hard‐
399 ware about the failure, other fields will be set to NULL or 0. This
400 includes the op_context value, which may not have been provided or was
401 ignored on input as part of the transfer.
402
403 Notable completion error codes are given below.
404
405 FI_EADDRNOTAVAIL
406 This error code is used by CQs configured with FI_SOURCE_ERR to
407 report completions for which a usable fi_addr_t source address
408 could not be found. An error code of FI_EADDRNOTAVAIL indicates
409 that the data transfer was successfully received and processed,
410 with the fi_cq_err_entry fields containing information about the
411 completion. The err_data field will be set to the source ad‐
412 dress data. The source address will be in the same format as
413 specified through the fi_info addr_format field for the opened
414 domain. This may be passed directly into an fi_av_insert call
415 to add the source address to the address vector.
416
417 fi_cq_signal
418 The fi_cq_signal call will unblock any thread waiting in fi_cq_sread or
419 fi_cq_sreadfrom. This may be used to wake-up a thread that is blocked
420 waiting to read a completion operation. The fi_cq_signal operation is
421 only available if the CQ was configured with a wait object.
422
424 The CQ entry data structures share many of the same fields. The mean‐
425 ings of these fields are the same for all CQ entry structure formats.
426
427 op_context
428 The operation context is the application specified context value
429 that was provided with an asynchronous operation. The op_con‐
430 text field is valid for all completions that are associated with
431 an asynchronous operation.
432
433 For completion events that are not associated with a posted operation,
434 this field will be set to NULL. This includes completions generated at
435 the target in response to RMA write operations that carry CQ data
436 (FI_REMOTE_WRITE | FI_REMOTE_CQ_DATA flags set), when the FI_RX_CQ_DATA
437 mode bit is not required.
438
439 flags This specifies flags associated with the completed operation.
440 The Completion Flags section below lists valid flag values.
441 Flags are set for all relevant completions.
442
443 len This len field only applies to completed receive operations
444 (e.g. fi_recv, fi_trecv, etc.). It indicates the size of re‐
445 ceived message data -- i.e. how many data bytes were placed in‐
446 to the associated receive buffer by a corresponding
447 fi_send/fi_tsend/et al call. If an endpoint has been configured
448 with the FI_MSG_PREFIX mode, the len also reflects the size of
449 the prefix buffer.
450
451 buf The buf field is only valid for completed receive operations,
452 and only applies when the receive buffer was posted with the
453 FI_MULTI_RECV flag. In this case, buf points to the starting
454 location where the receive data was placed.
455
456 data The data field is only valid if the FI_REMOTE_CQ_DATA completion
457 flag is set, and only applies to receive completions. If FI_RE‐
458 MOTE_CQ_DATA is set, this field will contain the completion data
459 provided by the peer as part of their transmit request. The
460 completion data will be given in host byte order.
461
462 tag A tag applies only to received messages that occur using the
463 tagged interfaces. This field contains the tag that was includ‐
464 ed with the received message. The tag will be in host byte or‐
465 der.
466
467 olen The olen field applies to received messages. It is used to in‐
468 dicate that a received message has overrun the available buffer
469 space and has been truncated. The olen specifies the amount of
470 data that did not fit into the available receive buffer and was
471 discarded.
472
473 err This err code is a positive fabric errno associated with a com‐
474 pletion. The err value indicates the general reason for an er‐
475 ror, if one occurred. See fi_errno.3 for a list of possible er‐
476 ror codes.
477
478 prov_errno
479 On an error, prov_errno may contain a provider specific error
480 code. The use of this field and its meaning is provider specif‐
481 ic. It is intended to be used as a debugging aid. See
482 fi_cq_strerror for additional details on converting this error
483 value into a human readable string.
484
485 err_data
486 On an error, err_data may reference a provider specific amount
487 of data associated with an error. The use of this field and its
488 meaning is provider specific. It is intended to be used as a
489 debugging aid. See fi_cq_strerror for additional details on
490 converting this error data into a human readable string.
491
492 err_data_size
493 On input, err_data_size indicates the size of the err_data buf‐
494 fer in bytes. On output, err_data_size will be set to the num‐
495 ber of bytes copied to the err_data buffer. The err_data infor‐
496 mation is typically used with fi_cq_strerror to provide details
497 about the type of error that occurred.
498
499 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
500 ric was opened with release < 1.5, err_data will be set to a data buf‐
501 fer owned by the provider. The contents of the buffer will remain
502 valid until a subsequent read call against the CQ. Applications must
503 serialize access to the CQ when processing errors to ensure that the
504 buffer referenced by err_data does not change.
505
507 Completion flags provide additional details regarding the completed op‐
508 eration. The following completion flags are defined.
509
510 FI_SEND
511 Indicates that the completion was for a send operation. This
512 flag may be combined with an FI_MSG or FI_TAGGED flag.
513
514 FI_RECV
515 Indicates that the completion was for a receive operation. This
516 flag may be combined with an FI_MSG or FI_TAGGED flag.
517
518 FI_RMA Indicates that an RMA operation completed. This flag may be
519 combined with an FI_READ, FI_WRITE, FI_REMOTE_READ, or FI_RE‐
520 MOTE_WRITE flag.
521
522 FI_ATOMIC
523 Indicates that an atomic operation completed. This flag may be
524 combined with an FI_READ, FI_WRITE, FI_REMOTE_READ, or FI_RE‐
525 MOTE_WRITE flag.
526
527 FI_MSG Indicates that a message-based operation completed. This flag
528 may be combined with an FI_SEND or FI_RECV flag.
529
530 FI_TAGGED
531 Indicates that a tagged message operation completed. This flag
532 may be combined with an FI_SEND or FI_RECV flag.
533
534 FI_MULTICAST
535 Indicates that a multicast operation completed. This flag may
536 be combined with FI_MSG and relevant flags. This flag is only
537 guaranteed to be valid for received messages if the endpoint has
538 been configured with FI_SOURCE.
539
540 FI_READ
541 Indicates that a locally initiated RMA or atomic read operation
542 has completed. This flag may be combined with an FI_RMA or
543 FI_ATOMIC flag.
544
545 FI_WRITE
546 Indicates that a locally initiated RMA or atomic write operation
547 has completed. This flag may be combined with an FI_RMA or
548 FI_ATOMIC flag.
549
550 FI_REMOTE_READ
551 Indicates that a remotely initiated RMA or atomic read operation
552 has completed. This flag may be combined with an FI_RMA or
553 FI_ATOMIC flag.
554
555 FI_REMOTE_WRITE
556 Indicates that a remotely initiated RMA or atomic write opera‐
557 tion has completed. This flag may be combined with an FI_RMA or
558 FI_ATOMIC flag.
559
560 FI_REMOTE_CQ_DATA
561 This indicates that remote CQ data is available as part of the
562 completion.
563
564 FI_MULTI_RECV
565 This flag applies to receive buffers that were posted with the
566 FI_MULTI_RECV flag set. This completion flag indicates that the
567 original receive buffer referenced by the completion has been
568 consumed and was released by the provider. Providers may set
569 this flag on the last message that is received into the multi-
570 recv buffer, or may generate a separate completion that indi‐
571 cates that the buffer has been released.
572
573 Applications can distinguish between these two cases by examining the
574 completion entry flags field. If additional flags, such as FI_RECV,
575 are set, the completion is associated with a received message. In this
576 case, the buf field will reference the location where the received mes‐
577 sage was placed into the multi-recv buffer. Other fields in the com‐
578 pletion entry will be determined based on the received message. If
579 other flag bits are zero, the provider is reporting that the multi-recv
580 buffer has been released, and the completion entry is not associated
581 with a received message.
582
583 FI_MORE
584 See the 'Buffered Receives' section in fi_msg(3) for more de‐
585 tails. This flag is associated with receive completions on end‐
586 points that have FI_BUFFERED_RECV mode enabled. When set to
587 one, it indicates that the buffer referenced by the completion
588 is limited by the FI_OPT_BUFFERED_LIMIT threshold, and addition‐
589 al message data must be retrieved by the application using an
590 FI_CLAIM operation.
591
592 FI_CLAIM
593 See the 'Buffered Receives' section in fi_msg(3) for more de‐
594 tails. This flag is set on completions associated with receive
595 operations that claim buffered receive data. Note that this
596 flag only applies to endpoints configured with the
597 FI_BUFFERED_RECV mode bit.
598
600 Libfabric defines several completion 'levels', identified using opera‐
601 tional flags. Each flag indicates the soonest that a completion event
602 may be generated by a provider, and the assumptions that an application
603 may make upon processing a completion. The operational flags are de‐
604 fined below, along with an example of how a provider might implement
605 the semantic. Note that only meeting the semantic is required of the
606 provider and not the implementation. Providers may implement stronger
607 completion semantics than necessary for a given operation, but only the
608 behavior defined by the completion level is guaranteed.
609
610 To help understand the conceptual differences in completion levels,
611 consider mailing a letter. Placing the letter into the local mailbox
612 for pick-up is similar to 'inject complete'. Having the letter picked
613 up and dropped off at the destination mailbox is equivalent to 'trans‐
614 mit complete'. The 'delivery complete' semantic is a stronger guaran‐
615 tee, with a person at the destination signing for the letter. However,
616 the person who signed for the letter is not necessarily the intended
617 recipient. The 'match complete' option is similar to delivery com‐
618 plete, but requires the intended recipient to sign for the letter.
619
620 The 'commit complete' level has different semantics than the previously
621 mentioned levels. Commit complete would be closer to the letter arriv‐
622 ing at the destination and being placed into a fire proof safe.
623
624 The operational flags for the described completion levels are defined
625 below.
626
627 FI_INJECT_COMPLETE
628 Indicates that a completion should be generated when the source
629 buffer(s) may be reused. A completion guarantees that the buf‐
630 fers will not be read from again and the application may reclaim
631 them. No other guarantees are made with respect to the state of
632 the operation.
633
634 Example: A provider may generate this completion event after copying
635 the source buffer into a network buffer, either in host memory or on
636 the NIC. An inject completion does not indicate that the data has been
637 transmitted onto the network, and a local error could occur after the
638 completion event has been generated that could prevent it from being
639 transmitted.
640
641 Inject complete allows for the fastest completion reporting (and,
642 hence, buffer reuse), but provides the weakest guarantees against net‐
643 work errors.
644
645 Note: This flag is used to control when a completion entry is inserted
646 into a completion queue. It does not apply to operations that do not
647 generate a completion queue entry, such as the fi_inject operation, and
648 is not subject to the inject_size message limit restriction.
649
650 FI_TRANSMIT_COMPLETE
651 Indicates that a completion should be generated when the trans‐
652 mit operation has completed relative to the local provider. The
653 exact behavior is dependent on the endpoint type.
654
655 For reliable endpoints:
656
657 Indicates that a completion should be generated when the operation has
658 been delivered to the peer endpoint. A completion guarantees that the
659 operation is no longer dependent on the fabric or local resources. The
660 state of the operation at the peer endpoint is not defined.
661
662 Example: A provider may generate a transmit complete event upon receiv‐
663 ing an ack from the peer endpoint. The state of the message at the
664 peer is unknown and may be buffered in the target NIC at the time the
665 ack has been generated.
666
667 For unreliable endpoints:
668
669 Indicates that a completion should be generated when the operation has
670 been delivered to the fabric. A completion guarantees that the opera‐
671 tion is no longer dependent on local resources. The state of the oper‐
672 ation within the fabric is not defined.
673
674 FI_DELIVERY_COMPLETE
675 Indicates that a completion should not be generated until an op‐
676 eration has been processed by the destination endpoint(s). A
677 completion guarantees that the result of the operation is avail‐
678 able; however, additional steps may need to be taken at the des‐
679 tination to retrieve the results. For example, an application
680 may need to provide a receive buffers in order to retrieve mes‐
681 sages that were buffered by the provider.
682
683 Delivery complete indicates that the message has been processed by the
684 peer. If an application buffer was ready to receive the results of the
685 message when it arrived, then delivery complete indicates that the data
686 was placed into the application's buffer.
687
688 This completion mode applies only to reliable endpoints. For opera‐
689 tions that return data to the initiator, such as RMA read or atom‐
690 ic-fetch, the source endpoint is also considered a destination end‐
691 point. This is the default completion mode for such operations.
692
693 FI_MATCH_COMPLETE
694 Indicates that a completion should be generated only after the
695 operation has been matched with an application specified buffer.
696 Operations using this completion semantic are dependent on the
697 application at the target claiming the message or results. As a
698 result, match complete may involve additional provider level ac‐
699 knowledgements or lengthy delays. However, this completion mod‐
700 el enables peer applications to synchronize their execution.
701
702 FI_COMMIT_COMPLETE
703 Indicates that a completion should not be generated (locally or
704 at the peer) until the result of an operation have been made
705 persistent. A completion guarantees that the result is both
706 available and durable, in the case of power failure.
707
708 This completion mode applies only to operations that target persistent
709 memory regions over reliable endpoints. This completion mode is exper‐
710 imental.
711
713 A completion queue must be bound to at least one enabled endpoint be‐
714 fore any operation such as fi_cq_read, fi_cq_readfrom, fi_cq_sread,
715 fi_cq_sreadfrom etc. can be called on it.
716
717 Completion flags may be suppressed if the FI_NOTIFY_FLAGS_ONLY mode bit
718 has been set. When enabled, only the following flags are guaranteed to
719 be set in completion data when they are valid: FI_REMOTE_READ and
720 FI_REMOTE_WRITE (when FI_RMA_EVENT capability bit has been set), FI_RE‐
721 MOTE_CQ_DATA, and FI_MULTI_RECV.
722
723 If a completion queue has been overrun, it will be placed into an
724 'overrun' state. Read operations will continue to return any valid,
725 non-corrupted completions, if available. After all valid completions
726 have been retrieved, any attempt to read the CQ will result in it re‐
727 turning an FI_EOVERRUN error event. Overrun completion queues are con‐
728 sidered fatal and may not be used to report additional completions once
729 the overrun occurs.
730
732 fi_cq_open / fi_cq_signal
733 Returns 0 on success. On error, a negative value corresponding
734 to fabric errno is returned.
735
736 fi_cq_read / fi_cq_readfrom / fi_cq_readerr fi_cq_sread / fi_cq_sread‐
737 from : On success, returns the number of completion events retrieved
738 from the completion queue. On error, a negative value corresponding to
739 fabric errno is returned. If no completions are available to return
740 from the CQ, -FI_EAGAIN will be returned.
741
742 fi_cq_sread / fi_cq_sreadfrom
743 On success, returns the number of completion events retrieved
744 from the completion queue. On error, a negative value corre‐
745 sponding to fabric errno is returned. If the timeout expires or
746 the calling thread is signaled and no data is available to be
747 read from the completion queue, -FI_EAGAIN is returned.
748
749 fi_cq_strerror
750 Returns a character string interpretation of the provider spe‐
751 cific error returned with a completion.
752
753 Fabric errno values are defined in rdma/fi_errno.h.
754
756 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_eq(3), fi_cntr(3),
757 fi_poll(3)
758
760 OpenFabrics.
761
762
763
764Libfabric Programmer's Manual 2019-02-27 fi_cq(3)