1fi_cq(3) Libfabric v1.6.1 fi_cq(3)
2
3
4
6 fi_cq - Completion queue operations
7
8 fi_cq_open / fi_close : Open/close a completion queue
9
10 fi_control : Control CQ operation or attributes.
11
12 fi_cq_read / fi_cq_readfrom / fi_cq_readerr : Read a completion from a
13 completion queue
14
15 fi_cq_sread / fi_cq_sreadfrom : A synchronous (blocking) read that
16 waits until a specified condition has been met before reading a comple‐
17 tion from a completion queue.
18
19 fi_cq_signal : Unblock any thread waiting in fi_cq_sread or
20 fi_cq_sreadfrom.
21
22 fi_cq_strerror : Converts provider specific error information into a
23 printable string
24
26 #include <rdma/fi_domain.h>
27
28 int fi_cq_open(struct fid_domain *domain, struct fi_cq_attr *attr,
29 struct fid_cq **cq, void *context);
30
31 int fi_close(struct fid *cq);
32
33 int fi_control(struct fid *cq, int command, void *arg);
34
35 ssize_t fi_cq_read(struct fid_cq *cq, void *buf, size_t count);
36
37 ssize_t fi_cq_readfrom(struct fid_cq *cq, void *buf, size_t count,
38 fi_addr_t *src_addr);
39
40 ssize_t fi_cq_readerr(struct fid_cq *cq, struct fi_cq_err_entry *buf,
41 uint64_t flags);
42
43 ssize_t fi_cq_sread(struct fid_cq *cq, void *buf, size_t count,
44 const void *cond, int timeout);
45
46 ssize_t fi_cq_sreadfrom(struct fid_cq *cq, void *buf, size_t count,
47 fi_addr_t *src_addr, const void *cond, int timeout);
48
49 int fi_cq_signal(struct fid_cq *cq);
50
51 const char * fi_cq_strerror(struct fid_cq *cq, int prov_errno,
52 const void *err_data, char *buf, size_t len);
53
55 domain : Open resource domain
56
57 cq : Completion queue
58
59 attr : Completion queue attributes
60
61 context : User specified context associated with the completion queue.
62
63 buf : For read calls, the data buffer to write completions into. For
64 write calls, a completion to insert into the completion queue. For
65 fi_cq_strerror, an optional buffer that receives printable error infor‐
66 mation.
67
68 count : Number of CQ entries.
69
70 len : Length of data buffer
71
72 src_addr : Source address of a completed receive operation
73
74 flags : Additional flags to apply to the operation
75
76 command : Command of control operation to perform on CQ.
77
78 arg : Optional control argument
79
80 cond : Condition that must be met before a completion is generated
81
82 timeout : Time in milliseconds to wait. A negative value indicates
83 infinite timeout.
84
85 prov_errno : Provider specific error value
86
87 err_data : Provider specific error data related to a completion
88
90 Completion queues are used to report events associated with data trans‐
91 fers. They are associated with message sends and receives, RMA,
92 atomic, tagged messages, and triggered events. Reported events are
93 usually associated with a fabric endpoint, but may also refer to memory
94 regions used as the target of an RMA or atomic operation.
95
96 fi_cq_open
97 fi_cq_open allocates a new completion queue. Unlike event queues, com‐
98 pletion queues are associated with a resource domain and may be off‐
99 loaded entirely in provider hardware.
100
101 The properties and behavior of a completion queue are defined by
102 struct fi_cq_attr.
103
104 struct fi_cq_attr {
105 size_t size; /* # entries for CQ */
106 uint64_t flags; /* operation flags */
107 enum fi_cq_format format; /* completion format */
108 enum fi_wait_obj wait_obj; /* requested wait object */
109 int signaling_vector; /* interrupt affinity */
110 enum fi_cq_wait_cond wait_cond; /* wait condition format */
111 struct fid_wait *wait_set; /* optional wait set */
112 };
113
114 size : Specifies the minimum size of a completion queue. A value of 0
115 indicates that the provider may choose a default value.
116
117 flags : Flags that control the configuration of the CQ.
118
119 · FI_AFFINITY : Indicates that the signaling_vector field (see below)
120 is valid.
121
122 format : Completion queues allow the application to select the amount
123 of detail that it must store and report. The format attribute allows
124 the application to select one of several completion formats, indicating
125 the structure of the data that the completion queue should return when
126 read. Supported formats and the structures that correspond to each are
127 listed below. The meaning of the CQ entry fields are defined in the
128 Completion Fields section.
129
130 · FI_CQ_FORMAT_UNSPEC : If an unspecified format is requested, then the
131 CQ will use a provider selected default format.
132
133 · FI_CQ_FORMAT_CONTEXT : Provides only user specified context that was
134 associated with the completion.
135
136 struct fi_cq_entry {
137 void *op_context; /* operation context */
138 };
139
140 · FI_CQ_FORMAT_MSG : Provides minimal data for processing completions,
141 with expanded support for reporting information about received mes‐
142 sages.
143
144 struct fi_cq_msg_entry {
145 void *op_context; /* operation context */
146 uint64_t flags; /* completion flags */
147 size_t len; /* size of received data */
148 };
149
150 · FI_CQ_FORMAT_DATA : Provides data associated with a completion.
151 Includes support for received message length, remote CQ data, and
152 multi-receive buffers.
153
154 struct fi_cq_data_entry {
155 void *op_context; /* operation context */
156 uint64_t flags; /* completion flags */
157 size_t len; /* size of received data */
158 void *buf; /* receive data buffer */
159 uint64_t data; /* completion data */
160 };
161
162 · FI_CQ_FORMAT_TAGGED : Expands completion data to include support for
163 the tagged message interfaces.
164
165 struct fi_cq_tagged_entry {
166 void *op_context; /* operation context */
167 uint64_t flags; /* completion flags */
168 size_t len; /* size of received data */
169 void *buf; /* receive data buffer */
170 uint64_t data; /* completion data */
171 uint64_t tag; /* received tag */
172 };
173
174 wait_obj : CQ's may be associated with a specific wait object. Wait
175 objects allow applications to block until the wait object is signaled,
176 indicating that a completion is available to be read. Users may use
177 fi_control to retrieve the underlying wait object associated with a CQ,
178 in order to use it in other system calls. The following values may be
179 used to specify the type of wait object associated with a CQ:
180 FI_WAIT_NONE, FI_WAIT_UNSPEC, FI_WAIT_SET, FI_WAIT_FD, and
181 FI_WAIT_MUTEX_COND. The default is FI_WAIT_NONE.
182
183 · FI_WAIT_NONE : Used to indicate that the user will not block (wait)
184 for completions on the CQ. When FI_WAIT_NONE is specified, the
185 application may not call fi_cq_sread or fi_cq_sreadfrom.
186
187 · FI_WAIT_UNSPEC : Specifies that the user will only wait on the CQ
188 using fabric interface calls, such as fi_cq_sread or fi_cq_sreadfrom.
189 In this case, the underlying provider may select the most appropriate
190 or highest performing wait object available, including custom wait
191 mechanisms. Applications that select FI_WAIT_UNSPEC are not guaran‐
192 teed to retrieve the underlying wait object.
193
194 · FI_WAIT_SET : Indicates that the completion queue should use a wait
195 set object to wait for completions. If specified, the wait_set field
196 must reference an existing wait set object.
197
198 · FI_WAIT_FD : Indicates that the CQ should use a file descriptor as
199 its wait mechanism. A file descriptor wait object must be usable in
200 select, poll, and epoll routines. However, a provider may signal an
201 FD wait object by marking it as readable, writable, or with an error.
202
203 · FI_WAIT_MUTEX_COND : Specifies that the CQ should use a pthread mutex
204 and cond variable as a wait object.
205
206 · FI_WAIT_CRITSEC_COND : Windows specific. Specifies that the CQ
207 should use a critical section and condition variable as a wait
208 object.
209
210 signaling_vector : If the FI_AFFINITY flag is set, this indicates the
211 logical cpu number (0..max cpu - 1) that interrupts associated with the
212 CQ should target. This field should be treated as a hint to the
213 provider and may be ignored if the provider does not support interrupt
214 affinity.
215
216 wait_cond : By default, when a completion is inserted into a CQ that
217 supports blocking reads (fi_cq_sread/fi_cq_sreadfrom), the correspond‐
218 ing wait object is signaled. Users may specify a condition that must
219 first be met before the wait is satisfied. This field indicates how
220 the provider should interpret the cond field, which describes the con‐
221 dition needed to signal the wait object.
222
223 A wait condition should be treated as an optimization. Providers are
224 not required to meet the requirements of the condition before signaling
225 the wait object. Applications should not rely on the condition neces‐
226 sarily being true when a blocking read call returns.
227
228 If wait_cond is set to FI_CQ_COND_NONE, then no additional conditions
229 are applied to the signaling of the CQ wait object, and the insertion
230 of any new entry will trigger the wait condition. If wait_cond is set
231 to FI_CQ_COND_THRESHOLD, then the cond field is interpreted as a size_t
232 threshold value. The threshold indicates the number of entries that
233 are to be queued before at the CQ before the wait is satisfied.
234
235 This field is ignored if wait_obj is set to FI_WAIT_NONE.
236
237 wait_set : If wait_obj is FI_WAIT_SET, this field references a wait
238 object to which the completion queue should attach. When an event is
239 inserted into the completion queue, the corresponding wait set will be
240 signaled if all necessary conditions are met. The use of a wait_set
241 enables an optimized method of waiting for events across multiple event
242 and completion queues. This field is ignored if wait_obj is not
243 FI_WAIT_SET.
244
245 fi_close
246 The fi_close call releases all resources associated with a completion
247 queue. Any completions which remain on the CQ when it is closed are
248 lost.
249
250 When closing the CQ, there must be no opened endpoints, transmit con‐
251 texts, or receive contexts associated with the CQ. If resources are
252 still associated with the CQ when attempting to close, the call will
253 return -FI_EBUSY.
254
255 fi_control
256 The fi_control call is used to access provider or implementation spe‐
257 cific details of the completion queue. Access to the CQ should be
258 serialized across all calls when fi_control is invoked, as it may redi‐
259 rect the implementation of CQ operations. The following control com‐
260 mands are usable with a CQ.
261
262 FI_GETWAIT (void **) : This command allows the user to retrieve the
263 low-level wait object associated with the CQ. The format of the
264 wait-object is specified during CQ creation, through the CQ attributes.
265 The fi_control arg parameter should be an address where a pointer to
266 the returned wait object will be written. See fi_eq.3 for addition
267 details using fi_control with FI_GETWAIT.
268
269 fi_cq_read
270 The fi_cq_read operation performs a non-blocking read of completion
271 data from the CQ. The format of the completion event is determined
272 using the fi_cq_format option that was specified when the CQ was
273 opened. Multiple completions may be retrieved from a CQ in a single
274 call. The maximum number of entries to return is limited to the speci‐
275 fied count parameter, with the number of entries successfully read from
276 the CQ returned by the call. (See return values section below.)
277
278 CQs are optimized to report operations which have completed success‐
279 fully. Operations which fail are reported 'out of band'. Such opera‐
280 tions are retrieved using the fi_cq_readerr function. When an opera‐
281 tion that has completed with an unexpected error is encountered, it is
282 placed into a temporary error queue. Attempting to read from a CQ
283 while an item is in the error queue results in fi_cq_read failing with
284 a return code of -FI_EAVAIL. Applications may use this return code to
285 determine when to call fi_cq_readerr.
286
287 fi_cq_readfrom
288 The fi_cq_readfrom call behaves identical to fi_cq_read, with the
289 exception that it allows the CQ to return source address information to
290 the user for any received data. Source address data is only available
291 for those endpoints configured with FI_SOURCE capability. If
292 fi_cq_readfrom is called on an endpoint for which source addressing
293 data is not available, the source address will be set to
294 FI_ADDR_NOTAVAIL. The number of input src_addr entries must be the
295 same as the count parameter.
296
297 Returned source addressing data is converted from the native address
298 used by the underlying fabric into an fi_addr_t, which may be used in
299 transmit operations. Typically, returning fi_addr_t requires that the
300 source address be inserted into the address vector associated with the
301 receiving endpoint. For endpoints allocated using the FI_SOURCE_ERR
302 capability, if the source address has not been inserted into the
303 address vector, fi_cq_readfrom will return -FI_EAVAIL. The completion
304 will then be reported through fi_cq_readerr with error code -FI_EAD‐
305 DRNOTAVAIL. See fi_cq_readerr for details.
306
307 If FI_SOURCE is specified without FI_SOURCE_ERR, source addresses which
308 cannot be mapped to a local fi_addr_t will be reported as
309 FI_ADDR_NOTAVAIL. The behavior is dependent on the type of address
310 vector in use. For AVs of type FI_AV_MAP, source addresses may be
311 mapped directly to an fi_addr_t value, even if the source address were
312 not inserted into the AV. This allows the provider to optimize the
313 reporting of the source fi_addr_t without the overhead of verifying
314 whether the address is in the AV. If full address validation is neces‐
315 sary, FI_SOURCE_ERR must be used.
316
317 fi_cq_sread / fi_cq_sreadfrom
318 The fi_cq_sread and fi_cq_sreadfrom calls are the blocking equivalent
319 operations to fi_cq_read and fi_cq_readfrom. Their behavior is similar
320 to the non-blocking calls, with the exception that the calls will not
321 return until either a completion has been read from the CQ or an error
322 or timeout occurs.
323
324 It is invalid for applications to call these functions if the CQ has
325 been configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
326
327 fi_cq_readerr
328 The read error function, fi_cq_readerr, retrieves information regarding
329 any asynchronous operation which has completed with an unexpected
330 error. fi_cq_readerr is a non-blocking call, returning immediately
331 whether an error completion was found or not.
332
333 Error information is reported to the user through
334 struct fi_cq_err_entry. The format of this structure is defined below.
335
336 struct fi_cq_err_entry {
337 void *op_context; /* operation context */
338 uint64_t flags; /* completion flags */
339 size_t len; /* size of received data */
340 void *buf; /* receive data buffer */
341 uint64_t data; /* completion data */
342 uint64_t tag; /* message tag */
343 size_t olen; /* overflow length */
344 int err; /* positive error code */
345 int prov_errno; /* provider error code */
346 void *err_data; /* error data */
347 size_t err_data_size; /* size of err_data */
348 };
349
350 The general reason for the error is provided through the err field.
351 Provider specific error information may also be available through the
352 prov_errno and err_data fields. Users may call fi_cq_strerror to con‐
353 vert provider specific error information into a printable string for
354 debugging purposes. See field details below for more information on
355 the use of err_data and err_data_size.
356
357 Notable completion error codes are given below.
358
359 FI_EADDRNOTAVAIL : This error code is used by CQs configured with
360 FI_SOURCE_ERR to report completions for which a matching fi_addr_t
361 source address could not be found. An error code of FI_EADDRNOTAVAIL
362 indicates that the data transfer was successfully received and pro‐
363 cessed, with the fi_cq_err_entry fields containing information about
364 the completion. The err_data field will be set to the source address
365 data. The source address will be in the same format as specified
366 through the fi_info addr_format field for the opened domain. This may
367 be passed directly into an fi_av_insert call to add the source address
368 to the address vector.
369
370 fi_cq_signal
371 The fi_cq_signal call will unblock any thread waiting in fi_cq_sread or
372 fi_cq_sreadfrom. This may be used to wake-up a thread that is blocked
373 waiting to read a completion operation. The fi_cq_signal operation is
374 only available if the CQ was configured with a wait object.
375
377 The CQ entry data structures share many of the same fields. The mean‐
378 ings of these fields are the same for all CQ entry structure formats.
379
380 op_context : The operation context is the application specified context
381 value that was provided with an asynchronous operation. The op_context
382 field is valid for all completions that are associated with an asyn‐
383 chronous operation.
384
385 For completion events that are not associated with a posted operation,
386 this field will be set to NULL. This includes completions generated at
387 the target in response to RMA write operations that carry CQ data
388 (FI_REMOTE_WRITE | FI_REMOTE_CQ_DATA flags set), when the FI_RX_CQ_DATA
389 mode bit is not required.
390
391 flags : This specifies flags associated with the completed operation.
392 The Completion Flags section below lists valid flag values. Flags are
393 set for all relevant completions.
394
395 len : This len field only applies to completed receive operations (e.g.
396 fi_recv, fi_trecv, etc.). It indicates the size of received message
397 data -- i.e. how many data bytes were placed into the associated
398 receive buffer by a corresponding fi_send/fi_tsend/et al call. If an
399 endpoint has been configured with the FI_MSG_PREFIX mode, the len also
400 reflects the size of the prefix buffer.
401
402 buf : The buf field is only valid for completed receive operations, and
403 only applies when the receive buffer was posted with the FI_MULTI_RECV
404 flag. In this case, buf points to the starting location where the
405 receive data was placed.
406
407 data : The data field is only valid if the FI_REMOTE_CQ_DATA completion
408 flag is set, and only applies to receive completions. If
409 FI_REMOTE_CQ_DATA is set, this field will contain the completion data
410 provided by the peer as part of their transmit request. The completion
411 data will be given in host byte order.
412
413 tag : A tag applies only to received messages that occur using the
414 tagged interfaces. This field contains the tag that was included with
415 the received message. The tag will be in host byte order.
416
417 olen : The olen field applies to received messages. It is used to
418 indicate that a received message has overrun the available buffer space
419 and has been truncated. The olen specifies the amount of data that did
420 not fit into the available receive buffer and was discarded.
421
422 err : This err code is a positive fabric errno associated with a com‐
423 pletion. The err value indicates the general reason for an error, if
424 one occurred. See fi_errno.3 for a list of possible error codes.
425
426 prov_errno : On an error, prov_errno may contain a provider specific
427 error code. The use of this field and its meaning is provider spe‐
428 cific. It is intended to be used as a debugging aid. See fi_cq_str‐
429 error for additional details on converting this error value into a
430 human readable string.
431
432 err_data : On an error, err_data may reference a provider specific
433 amount of data associated with an error. The use of this field and its
434 meaning is provider specific. It is intended to be used as a debugging
435 aid. See fi_cq_strerror for additional details on converting this
436 error data into a human readable string.
437
438 err_data_size : On input, err_data_size indicates the size of the
439 err_data buffer in bytes. On output, err_data_size will be set to the
440 number of bytes copied to the err_data buffer. The err_data informa‐
441 tion is typically used with fi_cq_strerror to provide details about the
442 type of error that occurred.
443
444 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
445 ric was opened with release < 1.5, err_data will be set to a data buf‐
446 fer owned by the provider. The contents of the buffer will remain
447 valid until a subsequent read call against the CQ. Applications must
448 serialize access to the CQ when processing errors to ensure that the
449 buffer referenced by err_data does not change.
450
452 Completion flags provide additional details regarding the completed
453 operation. The following completion flags are defined.
454
455 FI_SEND : Indicates that the completion was for a send operation. This
456 flag may be combined with an FI_MSG or FI_TAGGED flag.
457
458 FI_RECV : Indicates that the completion was for a receive operation.
459 This flag may be combined with an FI_MSG or FI_TAGGED flag.
460
461 FI_RMA : Indicates that an RMA operation completed. This flag may be
462 combined with an FI_READ, FI_WRITE, FI_REMOTE_READ, or FI_REMOTE_WRITE
463 flag.
464
465 FI_ATOMIC : Indicates that an atomic operation completed. This flag
466 may be combined with an FI_READ, FI_WRITE, FI_REMOTE_READ, or
467 FI_REMOTE_WRITE flag.
468
469 FI_MSG : Indicates that a message-based operation completed. This flag
470 may be combined with an FI_SEND or FI_RECV flag.
471
472 FI_TAGGED : Indicates that a tagged message operation completed. This
473 flag may be combined with an FI_SEND or FI_RECV flag.
474
475 FI_MULTICAST : Indicates that a multicast operation completed. This
476 flag may be combined with FI_MSG and relevant flags. This flag is only
477 guaranteed to be valid for received messages if the endpoint has been
478 configured with FI_SOURCE.
479
480 FI_READ : Indicates that a locally initiated RMA or atomic read opera‐
481 tion has completed. This flag may be combined with an FI_RMA or
482 FI_ATOMIC flag.
483
484 FI_WRITE : Indicates that a locally initiated RMA or atomic write oper‐
485 ation has completed. This flag may be combined with an FI_RMA or
486 FI_ATOMIC flag.
487
488 FI_REMOTE_READ : Indicates that a remotely initiated RMA or atomic read
489 operation has completed. This flag may be combined with an FI_RMA or
490 FI_ATOMIC flag.
491
492 FI_REMOTE_WRITE : Indicates that a remotely initiated RMA or atomic
493 write operation has completed. This flag may be combined with an
494 FI_RMA or FI_ATOMIC flag.
495
496 FI_REMOTE_CQ_DATA : This indicates that remote CQ data is available as
497 part of the completion.
498
499 FI_MULTI_RECV : This flag applies to receive buffers that were posted
500 with the FI_MULTI_RECV flag set. This completion flag indicates that
501 the original receive buffer referenced by the completion has been con‐
502 sumed and was released by the provider. Providers may set this flag on
503 the last message that is received into the multi- recv buffer, or may
504 generate a separate completion that indicates that the buffer has been
505 released.
506
507 Applications can distinguish between these two cases by examining the
508 completion entry flags field. If additional flags, such as FI_RECV,
509 are set, the completion is associated with a received message. In this
510 case, the buf field will reference the location where the received mes‐
511 sage was placed into the multi-recv buffer. Other fields in the com‐
512 pletion entry will be determined based on the received message. If
513 other flag bits are zero, the provider is reporting that the multi-recv
514 buffer has been released, and the completion entry is not associated
515 with a received message.
516
518 A completion queue must be bound to at least one enabled endpoint
519 before any operation such as fi_cq_read, fi_cq_readfrom, fi_cq_sread,
520 fi_cq_sreadfrom etc. can be called on it.
521
522 Completion flags may be suppressed if the FI_NOTIFY_FLAGS_ONLY mode bit
523 has been set. When enabled, only the following flags are guaranteed to
524 be set in completion data when they are valid: FI_REMOTE_READ and
525 FI_REMOTE_WRITE (when FI_RMA_EVENT capability bit has been set),
526 FI_REMOTE_CQ_DATA, and FI_MULTI_RECV.
527
528 If a completion queue has been overrun, it will be placed into an
529 'overrun' state. Read operations will continue to return any valid,
530 non-corrupted completions, if available. After all valid completions
531 have been retrieved, any attempt to read the CQ will result in it
532 returning an FI_EOVERRUN error event. Overrun completion queues are
533 considered fatal and may not be used to report additional completions
534 once the overrun occurs.
535
537 fi_cq_open / fi_cq_signal : Returns 0 on success. On error, a negative
538 value corresponding to fabric errno is returned.
539
540 fi_cq_read / fi_cq_readfrom / fi_cq_readerr fi_cq_sread / fi_cq_sread‐
541 from : On success, returns the number of completion events retrieved
542 from the completion queue. On error, a negative value corresponding to
543 fabric errno is returned. If no completions are available to return
544 from the CQ, -FI_EAGAIN will be returned.
545
546 fi_cq_strerror : Returns a character string interpretation of the
547 provider specific error returned with a completion.
548
549 Fabric errno values are defined in rdma/fi_errno.h.
550
552 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_eq(3), fi_cntr(3),
553 fi_poll(3)
554
556 OpenFabrics.
557
558
559
560Libfabric Programmer's Manual 2017-12-06 fi_cq(3)