1fi_eq(3) Libfabric v1.12.1 fi_eq(3)
2
3
4
6 fi_eq - Event queue operations
7
8 fi_eq_open / fi_close
9 Open/close an event queue
10
11 fi_control
12 Control operation of EQ
13
14 fi_eq_read / fi_eq_readerr
15 Read an event from an event queue
16
17 fi_eq_write
18 Writes an event to an event queue
19
20 fi_eq_sread
21 A synchronous (blocking) read of an event queue
22
23 fi_eq_strerror
24 Converts provider specific error information into a printable
25 string
26
28 #include <rdma/fi_domain.h>
29
30 int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr,
31 struct fid_eq **eq, void *context);
32
33 int fi_close(struct fid *eq);
34
35 int fi_control(struct fid *eq, int command, void *arg);
36
37 ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event,
38 void *buf, size_t len, uint64_t flags);
39
40 ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf,
41 uint64_t flags);
42
43 ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event,
44 const void *buf, size_t len, uint64_t flags);
45
46 ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event,
47 void *buf, size_t len, int timeout, uint64_t flags);
48
49 const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno,
50 const void *err_data, char *buf, size_t len);
51
53 fabric Opened fabric descriptor
54
55 eq Event queue
56
57 attr Event queue attributes
58
59 context
60 User specified context associated with the event queue.
61
62 event Reported event
63
64 buf For read calls, the data buffer to write events into. For write
65 calls, an event to insert into the event queue. For fi_eq_str‐
66 error, an optional buffer that receives printable error informa‐
67 tion.
68
69 len Length of data buffer
70
71 flags Additional flags to apply to the operation
72
73 command
74 Command of control operation to perform on EQ.
75
76 arg Optional control argument
77
78 prov_errno
79 Provider specific error value
80
81 err_data
82 Provider specific error data related to a completion
83
84 timeout
85 Timeout specified in milliseconds
86
88 Event queues are used to report events associated with control opera‐
89 tions. They are associated with memory registration, address vectors,
90 connection management, and fabric and domain level events. Reported
91 events are either associated with a requested operation or affiliated
92 with a call that registers for specific types of events, such as lis‐
93 tening for connection requests.
94
95 fi_eq_open
96 fi_eq_open allocates a new event queue.
97
98 The properties and behavior of an event queue are defined by
99 struct fi_eq_attr.
100
101 struct fi_eq_attr {
102 size_t size; /* # entries for EQ */
103 uint64_t flags; /* operation flags */
104 enum fi_wait_obj wait_obj; /* requested wait object */
105 int signaling_vector; /* interrupt affinity */
106 struct fid_wait *wait_set; /* optional wait set */
107 };
108
109 size Specifies the minimum size of an event queue.
110
111 flags Flags that control the configuration of the EQ.
112
113 - FI_WRITE
114 Indicates that the application requires support for inserting
115 user events into the EQ. If this flag is set, then the
116 fi_eq_write operation must be supported by the provider. If the
117 FI_WRITE flag is not set, then the application may not invoke
118 fi_eq_write.
119
120 - FI_AFFINITY
121 Indicates that the signaling_vector field (see below) is valid.
122
123 wait_obj
124 EQ's may be associated with a specific wait object. Wait ob‐
125 jects allow applications to block until the wait object is sig‐
126 naled, indicating that an event is available to be read. Users
127 may use fi_control to retrieve the underlying wait object asso‐
128 ciated with an EQ, in order to use it in other system calls.
129 The following values may be used to specify the type of wait ob‐
130 ject associated with an EQ:
131
132 - FI_WAIT_NONE
133 Used to indicate that the user will not block (wait) for events
134 on the EQ. When FI_WAIT_NONE is specified, the application may
135 not call fi_eq_sread. This is the default is no wait object is
136 specified.
137
138 - FI_WAIT_UNSPEC
139 Specifies that the user will only wait on the EQ using fabric
140 interface calls, such as fi_eq_sread. In this case, the under‐
141 lying provider may select the most appropriate or highest per‐
142 forming wait object available, including custom wait mechanisms.
143 Applications that select FI_WAIT_UNSPEC are not guaranteed to
144 retrieve the underlying wait object.
145
146 - FI_WAIT_SET
147 Indicates that the event queue should use a wait set object to
148 wait for events. If specified, the wait_set field must refer‐
149 ence an existing wait set object.
150
151 - FI_WAIT_FD
152 Indicates that the EQ should use a file descriptor as its wait
153 mechanism. A file descriptor wait object must be usable in se‐
154 lect, poll, and epoll routines. However, a provider may signal
155 an FD wait object by marking it as readable or with an error.
156
157 - FI_WAIT_MUTEX_COND
158 Specifies that the EQ should use a pthread mutex and cond vari‐
159 able as a wait object.
160
161 - FI_WAIT_YIELD
162 Indicates that the EQ will wait without a wait object but in‐
163 stead yield on every wait. Allows usage of fi_eq_sread through
164 a spin.
165
166 signaling_vector
167 If the FI_AFFINITY flag is set, this indicates the logical cpu
168 number (0..max cpu - 1) that interrupts associated with the EQ
169 should target. This field should be treated as a hint to the
170 provider and may be ignored if the provider does not support in‐
171 terrupt affinity.
172
173 wait_set
174 If wait_obj is FI_WAIT_SET, this field references a wait object
175 to which the event queue should attach. When an event is in‐
176 serted into the event queue, the corresponding wait set will be
177 signaled if all necessary conditions are met. The use of a
178 wait_set enables an optimized method of waiting for events
179 across multiple event queues. This field is ignored if wait_obj
180 is not FI_WAIT_SET.
181
182 fi_close
183 The fi_close call releases all resources associated with an event
184 queue. Any events which remain on the EQ when it is closed are lost.
185
186 The EQ must not be bound to any other objects prior to being closed,
187 otherwise the call will return -FI_EBUSY.
188
189 fi_control
190 The fi_control call is used to access provider or implementation spe‐
191 cific details of the event queue. Access to the EQ should be serial‐
192 ized across all calls when fi_control is invoked, as it may redirect
193 the implementation of EQ operations. The following control commands
194 are usable with an EQ.
195
196 FI_GETWAIT (void **)
197 This command allows the user to retrieve the low-level wait ob‐
198 ject associated with the EQ. The format of the wait-object is
199 specified during EQ creation, through the EQ attributes. The
200 fi_control arg parameter should be an address where a pointer to
201 the returned wait object will be written. This should be an
202 'int *' for FI_WAIT_FD, or 'struct fi_mutex_cond' for
203 FI_WAIT_MUTEX_COND.
204
205 struct fi_mutex_cond {
206 pthread_mutex_t *mutex;
207 pthread_cond_t *cond;
208 };
209
210 fi_eq_read
211 The fi_eq_read operations performs a non-blocking read of event data
212 from the EQ. The format of the event data is based on the type of
213 event retrieved from the EQ, with all events starting with a struct
214 fi_eq_entry header. At most one event will be returned per EQ read op‐
215 eration. The number of bytes successfully read from the EQ is returned
216 from the read. The FI_PEEK flag may be used to indicate that event da‐
217 ta should be read from the EQ without being consumed. A subsequent
218 read without the FI_PEEK flag would then remove the event from the EQ.
219
220 The following types of events may be reported to an EQ, along with in‐
221 formation regarding the format associated with each event.
222
223 Asynchronous Control Operations
224 Asynchronous control operations are basic requests that simply
225 need to generate an event to indicate that they have completed.
226 These include the following types of events: memory registra‐
227 tion, address vector resolution, and multicast joins.
228
229 Control requests report their completion by inserting a
230 struct fi_eq_entry into the EQ. The format of this structure is:
231
232 struct fi_eq_entry {
233 fid_t fid; /* fid associated with request */
234 void *context; /* operation context */
235 uint64_t data; /* completion-specific data */
236 };
237
238 For the completion of basic asynchronous control operations, the re‐
239 turned event will indicate the operation that has completed, and the
240 fid will reference the fabric descriptor associated with the event.
241 For memory registration, this will be an FI_MR_COMPLETE event and the
242 fid_mr. Address resolution will reference an FI_AV_COMPLETE event and
243 fid_av. Multicast joins will report an FI_JOIN_COMPLETE and fid_mc.
244 The context field will be set to the context specified as part of the
245 operation, if available, otherwise the context will be associated with
246 the fabric descriptor. The data field will be set as described in the
247 man page for the corresponding object type (e.g., see fi_av(3) for a
248 description of how asynchronous address vector insertions are complet‐
249 ed).
250
251 Connection Notification
252 Connection notifications are connection management notifications
253 used to setup or tear down connections between endpoints. There
254 are three connection notification events: FI_CONNREQ, FI_CON‐
255 NECTED, and FI_SHUTDOWN. Connection notifications are reported
256 using struct fi_eq_cm_entry:
257
258 struct fi_eq_cm_entry {
259 fid_t fid; /* fid associated with request */
260 struct fi_info *info; /* endpoint information */
261 uint8_t data[]; /* app connection data */
262 };
263
264 A connection request (FI_CONNREQ) event indicates that a remote end‐
265 point wishes to establish a new connection to a listening, or passive,
266 endpoint. The fid is the passive endpoint. Information regarding the
267 requested, active endpoint's capabilities and attributes are available
268 from the info field. The application is responsible for freeing this
269 structure by calling fi_freeinfo when it is no longer needed. The
270 fi_info connreq field will reference the connection request associated
271 with this event. To accept a connection, an endpoint must first be
272 created by passing an fi_info structure referencing this connreq field
273 to fi_endpoint(). This endpoint is then passed to fi_accept() to com‐
274 plete the acceptance of the connection attempt. Creating the endpoint
275 is most easily accomplished by passing the fi_info returned as part of
276 the CM event into fi_endpoint(). If the connection is to be rejected,
277 the connreq is passed to fi_reject().
278
279 Any application data exchanged as part of the connection request is
280 placed beyond the fi_eq_cm_entry structure. The amount of data avail‐
281 able is application dependent and limited to the buffer space provided
282 by the application when fi_eq_read is called. The amount of returned
283 data may be calculated using the return value to fi_eq_read. Note that
284 the amount of returned data is limited by the underlying connection
285 protocol, and the length of any data returned may include protocol pad‐
286 ding. As a result, the returned length may be larger than that speci‐
287 fied by the connecting peer.
288
289 If a connection request has been accepted, an FI_CONNECTED event will
290 be generated on both sides of the connection. The active side -- one
291 that called fi_connect() -- may receive user data as part of the
292 FI_CONNECTED event. The user data is passed to the connection manager
293 on the passive side through the fi_accept call. User data is not pro‐
294 vided with an FI_CONNECTED event on the listening side of the connec‐
295 tion.
296
297 Notification that a remote peer has disconnected from an active end‐
298 point is done through the FI_SHUTDOWN event. Shutdown notification us‐
299 es struct fi_eq_cm_entry as declared above. The fid field for a shut‐
300 down notification refers to the active endpoint's fid_ep.
301
302 Asynchronous Error Notification
303 Asynchronous errors are used to report problems with fabric re‐
304 sources. Reported errors may be fatal or transient, based on
305 the error, and result in the resource becoming disabled. Dis‐
306 abled resources will fail operations submitted against them un‐
307 til they are explicitly re-enabled by the application.
308
309 Asynchronous errors may be reported for completion queues and endpoints
310 of all types. CQ errors can result when resource management has been
311 disabled, and the provider has detected a queue overrun. Endpoint er‐
312 rors may be result of numerous actions, but are often associated with a
313 failed operation. Operations may fail because of buffer overruns, in‐
314 valid permissions, incorrect memory access keys, network routing fail‐
315 ures, network reach-ability issues, etc.
316
317 Asynchronous errors are reported using struct fi_eq_err_entry, as de‐
318 fined below. The fabric descriptor (fid) associated with the error is
319 provided as part of the error data. An error code is also available to
320 determine the cause of the error.
321
322 fi_eq_sread
323 The fi_eq_sread call is the blocking (or synchronous) equivalent to
324 fi_eq_read. It behaves is similar to the non-blocking call, with the
325 exception that the calls will not return until either an event has been
326 read from the EQ or an error or timeout occurs. Specifying a negative
327 timeout means an infinite timeout.
328
329 Threads blocking in this function will return to the caller if they are
330 signaled by some external source. This is true even if the timeout has
331 not occurred or was specified as infinite.
332
333 It is invalid for applications to call this function if the EQ has been
334 configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
335
336 fi_eq_readerr
337 The read error function, fi_eq_readerr, retrieves information regarding
338 any asynchronous operation which has completed with an unexpected er‐
339 ror. fi_eq_readerr is a non-blocking call, returning immediately
340 whether an error completion was found or not.
341
342 EQs are optimized to report operations which have completed successful‐
343 ly. Operations which fail are reported 'out of band'. Such operations
344 are retrieved using the fi_eq_readerr function. When an operation that
345 completes with an unexpected error is inserted into an EQ, it is placed
346 into a temporary error queue. Attempting to read from an EQ while an
347 item is in the error queue results in an FI_EAVAIL failure. Applica‐
348 tions may use this return code to determine when to call fi_eq_readerr.
349
350 Error information is reported to the user through struct fi_eq_err_en‐
351 try. The format of this structure is defined below.
352
353 struct fi_eq_err_entry {
354 fid_t fid; /* fid associated with error */
355 void *context; /* operation context */
356 uint64_t data; /* completion-specific data */
357 int err; /* positive error code */
358 int prov_errno; /* provider error code */
359 void *err_data; /* additional error data */
360 size_t err_data_size; /* size of err_data */
361 };
362
363 The fid will reference the fabric descriptor associated with the event.
364 For memory registration, this will be the fid_mr, address resolution
365 will reference a fid_av, and CM events will refer to a fid_ep. The
366 context field will be set to the context specified as part of the oper‐
367 ation.
368
369 The data field will be set as described in the man page for the corre‐
370 sponding object type (e.g., see fi_av(3) for a description of how asyn‐
371 chronous address vector insertions are completed).
372
373 The general reason for the error is provided through the err field.
374 Provider or operational specific error information may also be avail‐
375 able through the prov_errno and err_data fields. Users may call
376 fi_eq_strerror to convert provider specific error information into a
377 printable string for debugging purposes.
378
379 On input, err_data_size indicates the size of the err_data buffer in
380 bytes. On output, err_data_size will be set to the number of bytes
381 copied to the err_data buffer. The err_data information is typically
382 used with fi_eq_strerror to provide details about the type of error
383 that occurred.
384
385 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
386 ric was opened with release < 1.5, err_data will be set to a data buf‐
387 fer owned by the provider. The contents of the buffer will remain
388 valid until a subsequent read call against the EQ. Applications must
389 serialize access to the EQ when processing errors to ensure that the
390 buffer referenced by err_data does not change.
391
393 The EQ entry data structures share many of the same fields. The mean‐
394 ings are the same or similar for all EQ structure formats, with specif‐
395 ic details described below.
396
397 fid This corresponds to the fabric descriptor associated with the
398 event. The type of fid depends on the event being reported.
399 For FI_CONNREQ this will be the fid of the passive endpoint.
400 FI_CONNECTED and FI_SHUTDOWN will reference the active endpoint.
401 FI_MR_COMPLETE and FI_AV_COMPLETE will refer to the MR or AV
402 fabric descriptor, respectively. FI_JOIN_COMPLETE will point to
403 the multicast descriptor returned as part of the join operation.
404 Applications can use fid->context value to retrieve the context
405 associated with the fabric descriptor.
406
407 context
408 The context value is set to the context parameter specified with
409 the operation that generated the event. If no context parameter
410 is associated with the operation, this field will be NULL.
411
412 data Data is an operation specific value or set of bytes. For con‐
413 nection events, data is application data exchanged as part of
414 the connection protocol.
415
416 err This err code is a positive fabric errno associated with an
417 event. The err value indicates the general reason for an error,
418 if one occurred. See fi_errno.3 for a list of possible error
419 codes.
420
421 prov_errno
422 On an error, prov_errno may contain a provider specific error
423 code. The use of this field and its meaning is provider specif‐
424 ic. It is intended to be used as a debugging aid. See
425 fi_eq_strerror for additional details on converting this error
426 value into a human readable string.
427
428 err_data
429 On an error, err_data may reference a provider specific amount
430 of data associated with an error. The use of this field and its
431 meaning is provider specific. It is intended to be used as a
432 debugging aid. See fi_eq_strerror for additional details on
433 converting this error data into a human readable string.
434
435 err_data_size
436 On input, err_data_size indicates the size of the err_data buf‐
437 fer in bytes. On output, err_data_size will be set to the num‐
438 ber of bytes copied to the err_data buffer. The err_data infor‐
439 mation is typically used with fi_eq_strerror to provide details
440 about the type of error that occurred.
441
442 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
443 ric was opened with release < 1.5, err_data will be set to a data buf‐
444 fer owned by the provider. The contents of the buffer will remain
445 valid until a subsequent read call against the EQ. Applications must
446 serialize access to the EQ when processing errors to ensure that the
447 buffer referenced by err_data does no change.
448
450 If an event queue has been overrun, it will be placed into an 'overrun'
451 state. Write operations against an overrun EQ will fail with
452 -FI_EOVERRUN. Read operations will continue to return any valid,
453 non-corrupted events, if available. After all valid events have been
454 retrieved, any attempt to read the EQ will result in it returning an
455 FI_EOVERRUN error event. Overrun event queues are considered fatal and
456 may not be used to report additional events once the overrun occurs.
457
459 fi_eq_open
460 Returns 0 on success. On error, a negative value corresponding
461 to fabric errno is returned.
462
463 fi_eq_read / fi_eq_readerr
464 On success, returns the number of bytes read from the event
465 queue. On error, a negative value corresponding to fabric errno
466 is returned. If no data is available to be read from the event
467 queue, -FI_EAGAIN is returned.
468
469 fi_eq_sread
470 On success, returns the number of bytes read from the event
471 queue. On error, a negative value corresponding to fabric errno
472 is returned. If the timeout expires or the calling thread is
473 signaled and no data is available to be read from the event
474 queue, -FI_EAGAIN is returned.
475
476 fi_eq_write
477 On success, returns the number of bytes written to the event
478 queue. On error, a negative value corresponding to fabric errno
479 is returned.
480
481 fi_eq_strerror
482 Returns a character string interpretation of the provider spe‐
483 cific error returned with a completion.
484
485 Fabric errno values are defined in rdma/fi_errno.h.
486
488 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cntr(3), fi_poll(3)
489
491 OpenFabrics.
492
493
494
495Libfabric Programmer's Manual 2019-12-13 fi_eq(3)