1fi_eq(3) Libfabric v1.17.0 fi_eq(3)
2
3
4
6 fi_eq - Event queue operations
7
8 fi_eq_open / fi_close
9 Open/close an event queue
10
11 fi_control
12 Control operation of EQ
13
14 fi_eq_read / fi_eq_readerr
15 Read an event from an event queue
16
17 fi_eq_write
18 Writes an event to an event queue
19
20 fi_eq_sread
21 A synchronous (blocking) read of an event queue
22
23 fi_eq_strerror
24 Converts provider specific error information into a printable
25 string
26
28 #include <rdma/fi_domain.h>
29
30 int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr,
31 struct fid_eq **eq, void *context);
32
33 int fi_close(struct fid *eq);
34
35 int fi_control(struct fid *eq, int command, void *arg);
36
37 ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event,
38 void *buf, size_t len, uint64_t flags);
39
40 ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf,
41 uint64_t flags);
42
43 ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event,
44 const void *buf, size_t len, uint64_t flags);
45
46 ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event,
47 void *buf, size_t len, int timeout, uint64_t flags);
48
49 const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno,
50 const void *err_data, char *buf, size_t len);
51
53 fabric Opened fabric descriptor
54
55 eq Event queue
56
57 attr Event queue attributes
58
59 context
60 User specified context associated with the event queue.
61
62 event Reported event
63
64 buf For read calls, the data buffer to write events into. For write
65 calls, an event to insert into the event queue. For fi_eq_str‐
66 error, an optional buffer that receives printable error informa‐
67 tion.
68
69 len Length of data buffer
70
71 flags Additional flags to apply to the operation
72
73 command
74 Command of control operation to perform on EQ.
75
76 arg Optional control argument
77
78 prov_errno
79 Provider specific error value
80
81 err_data
82 Provider specific error data related to a completion
83
84 timeout
85 Timeout specified in milliseconds
86
88 Event queues are used to report events associated with control opera‐
89 tions. They are associated with memory registration, address vectors,
90 connection management, and fabric and domain level events. Reported
91 events are either associated with a requested operation or affiliated
92 with a call that registers for specific types of events, such as lis‐
93 tening for connection requests.
94
95 fi_eq_open
96 fi_eq_open allocates a new event queue.
97
98 The properties and behavior of an event queue are defined by struct
99 fi_eq_attr.
100
101 struct fi_eq_attr {
102 size_t size; /* # entries for EQ */
103 uint64_t flags; /* operation flags */
104 enum fi_wait_obj wait_obj; /* requested wait object */
105 int signaling_vector; /* interrupt affinity */
106 struct fid_wait *wait_set; /* optional wait set */
107 };
108
109 size Specifies the minimum size of an event queue.
110
111 flags Flags that control the configuration of the EQ.
112
113 - FI_WRITE
114 Indicates that the application requires support for inserting
115 user events into the EQ. If this flag is set, then the
116 fi_eq_write operation must be supported by the provider. If the
117 FI_WRITE flag is not set, then the application may not invoke
118 fi_eq_write.
119
120 - FI_AFFINITY
121 Indicates that the signaling_vector field (see below) is valid.
122
123 wait_obj
124 EQ’s may be associated with a specific wait object. Wait ob‐
125 jects allow applications to block until the wait object is sig‐
126 naled, indicating that an event is available to be read. Users
127 may use fi_control to retrieve the underlying wait object asso‐
128 ciated with an EQ, in order to use it in other system calls.
129 The following values may be used to specify the type of wait ob‐
130 ject associated with an EQ:
131
132 - FI_WAIT_NONE
133 Used to indicate that the user will not block (wait) for events
134 on the EQ. When FI_WAIT_NONE is specified, the application may
135 not call fi_eq_sread. This is the default is no wait object is
136 specified.
137
138 - FI_WAIT_UNSPEC
139 Specifies that the user will only wait on the EQ using fabric
140 interface calls, such as fi_eq_sread. In this case, the under‐
141 lying provider may select the most appropriate or highest per‐
142 forming wait object available, including custom wait mechanisms.
143 Applications that select FI_WAIT_UNSPEC are not guaranteed to
144 retrieve the underlying wait object.
145
146 - FI_WAIT_SET
147 Indicates that the event queue should use a wait set object to
148 wait for events. If specified, the wait_set field must refer‐
149 ence an existing wait set object.
150
151 - FI_WAIT_FD
152 Indicates that the EQ should use a file descriptor as its wait
153 mechanism. A file descriptor wait object must be usable in se‐
154 lect, poll, and epoll routines. However, a provider may signal
155 an FD wait object by marking it as readable or with an error.
156
157 - FI_WAIT_MUTEX_COND
158 Specifies that the EQ should use a pthread mutex and cond vari‐
159 able as a wait object.
160
161 - FI_WAIT_YIELD
162 Indicates that the EQ will wait without a wait object but in‐
163 stead yield on every wait. Allows usage of fi_eq_sread through
164 a spin.
165
166 signaling_vector
167 If the FI_AFFINITY flag is set, this indicates the logical cpu
168 number (0..max cpu - 1) that interrupts associated with the EQ
169 should target. This field should be treated as a hint to the
170 provider and may be ignored if the provider does not support in‐
171 terrupt affinity.
172
173 wait_set
174 If wait_obj is FI_WAIT_SET, this field references a wait object
175 to which the event queue should attach. When an event is in‐
176 serted into the event queue, the corresponding wait set will be
177 signaled if all necessary conditions are met. The use of a
178 wait_set enables an optimized method of waiting for events
179 across multiple event queues. This field is ignored if wait_obj
180 is not FI_WAIT_SET.
181
182 fi_close
183 The fi_close call releases all resources associated with an event
184 queue. Any events which remain on the EQ when it is closed are lost.
185
186 The EQ must not be bound to any other objects prior to being closed,
187 otherwise the call will return -FI_EBUSY.
188
189 fi_control
190 The fi_control call is used to access provider or implementation spe‐
191 cific details of the event queue. Access to the EQ should be serial‐
192 ized across all calls when fi_control is invoked, as it may redirect
193 the implementation of EQ operations. The following control commands
194 are usable with an EQ.
195
196 FI_GETWAIT (void **)
197 This command allows the user to retrieve the low-level wait ob‐
198 ject associated with the EQ. The format of the wait-object is
199 specified during EQ creation, through the EQ attributes. The
200 fi_control arg parameter should be an address where a pointer to
201 the returned wait object will be written. This should be an
202 ’int *’ for FI_WAIT_FD, or `struct fi_mutex_cond' for
203 FI_WAIT_MUTEX_COND.
204
205 struct fi_mutex_cond {
206 pthread_mutex_t *mutex;
207 pthread_cond_t *cond;
208 };
209
210 fi_eq_read
211 The fi_eq_read operations performs a non-blocking read of event data
212 from the EQ. The format of the event data is based on the type of
213 event retrieved from the EQ, with all events starting with a struct
214 fi_eq_entry header. At most one event will be returned per EQ read op‐
215 eration. The number of bytes successfully read from the EQ is returned
216 from the read. The FI_PEEK flag may be used to indicate that event da‐
217 ta should be read from the EQ without being consumed. A subsequent
218 read without the FI_PEEK flag would then remove the event from the EQ.
219
220 The following types of events may be reported to an EQ, along with in‐
221 formation regarding the format associated with each event.
222
223 Asynchronous Control Operations
224 Asynchronous control operations are basic requests that simply
225 need to generate an event to indicate that they have completed.
226 These include the following types of events: memory registra‐
227 tion, address vector resolution, and multicast joins.
228
229 Control requests report their completion by inserting a struct
230 fi_eq_entry into the EQ. The format of this structure is:
231
232 struct fi_eq_entry {
233 fid_t fid; /* fid associated with request */
234 void *context; /* operation context */
235 uint64_t data; /* completion-specific data */
236 };
237
238 For the completion of basic asynchronous control operations, the re‐
239 turned event will indicate the operation that has completed, and the
240 fid will reference the fabric descriptor associated with the event.
241 For memory registration, this will be an FI_MR_COMPLETE event and the
242 fid_mr. Address resolution will reference an FI_AV_COMPLETE event and
243 fid_av. Multicast joins will report an FI_JOIN_COMPLETE and fid_mc.
244 The context field will be set to the context specified as part of the
245 operation, if available, otherwise the context will be associated with
246 the fabric descriptor. The data field will be set as described in the
247 man page for the corresponding object type (e.g., see fi_av(3) for a
248 description of how asynchronous address vector insertions are complet‐
249 ed).
250
251 Connection Notification
252 Connection notifications are connection management notifications
253 used to setup or tear down connections between endpoints. There
254 are three connection notification events: FI_CONNREQ, FI_CON‐
255 NECTED, and FI_SHUTDOWN. Connection notifications are reported
256 using struct fi_eq_cm_entry:
257
258 struct fi_eq_cm_entry {
259 fid_t fid; /* fid associated with request */
260 struct fi_info *info; /* endpoint information */
261 uint8_t data[]; /* app connection data */
262 };
263
264 A connection request (FI_CONNREQ) event indicates that a remote end‐
265 point wishes to establish a new connection to a listening, or passive,
266 endpoint. The fid is the passive endpoint. Information regarding the
267 requested, active endpoint’s capabilities and attributes are available
268 from the info field. The application is responsible for freeing this
269 structure by calling fi_freeinfo when it is no longer needed. The
270 fi_info connreq field will reference the connection request associated
271 with this event. To accept a connection, an endpoint must first be
272 created by passing an fi_info structure referencing this connreq field
273 to fi_endpoint(). This endpoint is then passed to fi_accept() to com‐
274 plete the acceptance of the connection attempt. Creating the endpoint
275 is most easily accomplished by passing the fi_info returned as part of
276 the CM event into fi_endpoint(). If the connection is to be rejected,
277 the connreq is passed to fi_reject().
278
279 Any application data exchanged as part of the connection request is
280 placed beyond the fi_eq_cm_entry structure. The amount of data avail‐
281 able is application dependent and limited to the buffer space provided
282 by the application when fi_eq_read is called. The amount of returned
283 data may be calculated using the return value to fi_eq_read. Note that
284 the amount of returned data is limited by the underlying connection
285 protocol, and the length of any data returned may include protocol pad‐
286 ding. As a result, the returned length may be larger than that speci‐
287 fied by the connecting peer.
288
289 If a connection request has been accepted, an FI_CONNECTED event will
290 be generated on both sides of the connection. The active side – one
291 that called fi_connect() – may receive user data as part of the FI_CON‐
292 NECTED event. The user data is passed to the connection manager on the
293 passive side through the fi_accept call. User data is not provided
294 with an FI_CONNECTED event on the listening side of the connection.
295
296 Notification that a remote peer has disconnected from an active end‐
297 point is done through the FI_SHUTDOWN event. Shutdown notification us‐
298 es struct fi_eq_cm_entry as declared above. The fid field for a shut‐
299 down notification refers to the active endpoint’s fid_ep.
300
301 Asynchronous Error Notification
302 Asynchronous errors are used to report problems with fabric re‐
303 sources. Reported errors may be fatal or transient, based on
304 the error, and result in the resource becoming disabled. Dis‐
305 abled resources will fail operations submitted against them un‐
306 til they are explicitly re-enabled by the application.
307
308 Asynchronous errors may be reported for completion queues and endpoints
309 of all types. CQ errors can result when resource management has been
310 disabled, and the provider has detected a queue overrun. Endpoint er‐
311 rors may be result of numerous actions, but are often associated with a
312 failed operation. Operations may fail because of buffer overruns, in‐
313 valid permissions, incorrect memory access keys, network routing fail‐
314 ures, network reach-ability issues, etc.
315
316 Asynchronous errors are reported using struct fi_eq_err_entry, as de‐
317 fined below. The fabric descriptor (fid) associated with the error is
318 provided as part of the error data. An error code is also available to
319 determine the cause of the error.
320
321 fi_eq_sread
322 The fi_eq_sread call is the blocking (or synchronous) equivalent to
323 fi_eq_read. It behaves is similar to the non-blocking call, with the
324 exception that the calls will not return until either an event has been
325 read from the EQ or an error or timeout occurs. Specifying a negative
326 timeout means an infinite timeout.
327
328 Threads blocking in this function will return to the caller if they are
329 signaled by some external source. This is true even if the timeout has
330 not occurred or was specified as infinite.
331
332 It is invalid for applications to call this function if the EQ has been
333 configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
334
335 fi_eq_readerr
336 The read error function, fi_eq_readerr, retrieves information regarding
337 any asynchronous operation which has completed with an unexpected er‐
338 ror. fi_eq_readerr is a non-blocking call, returning immediately
339 whether an error completion was found or not.
340
341 EQs are optimized to report operations which have completed successful‐
342 ly. Operations which fail are reported `out of band'. Such operations
343 are retrieved using the fi_eq_readerr function. When an operation that
344 completes with an unexpected error is inserted into an EQ, it is placed
345 into a temporary error queue. Attempting to read from an EQ while an
346 item is in the error queue results in an FI_EAVAIL failure. Applica‐
347 tions may use this return code to determine when to call fi_eq_readerr.
348
349 Error information is reported to the user through struct fi_eq_err_en‐
350 try. The format of this structure is defined below.
351
352 struct fi_eq_err_entry {
353 fid_t fid; /* fid associated with error */
354 void *context; /* operation context */
355 uint64_t data; /* completion-specific data */
356 int err; /* positive error code */
357 int prov_errno; /* provider error code */
358 void *err_data; /* additional error data */
359 size_t err_data_size; /* size of err_data */
360 };
361
362 The fid will reference the fabric descriptor associated with the event.
363 For memory registration, this will be the fid_mr, address resolution
364 will reference a fid_av, and CM events will refer to a fid_ep. The
365 context field will be set to the context specified as part of the oper‐
366 ation.
367
368 The data field will be set as described in the man page for the corre‐
369 sponding object type (e.g., see fi_av(3) for a description of how asyn‐
370 chronous address vector insertions are completed).
371
372 The general reason for the error is provided through the err field.
373 Provider or operational specific error information may also be avail‐
374 able through the prov_errno and err_data fields. Users may call
375 fi_eq_strerror to convert provider specific error information into a
376 printable string for debugging purposes.
377
378 On input, err_data_size indicates the size of the err_data buffer in
379 bytes. On output, err_data_size will be set to the number of bytes
380 copied to the err_data buffer. The err_data information is typically
381 used with fi_eq_strerror to provide details about the type of error
382 that occurred.
383
384 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
385 ric was opened with release < 1.5, err_data will be set to a data buf‐
386 fer owned by the provider. The contents of the buffer will remain
387 valid until a subsequent read call against the EQ. Applications must
388 serialize access to the EQ when processing errors to ensure that the
389 buffer referenced by err_data does not change.
390
392 The EQ entry data structures share many of the same fields. The mean‐
393 ings are the same or similar for all EQ structure formats, with specif‐
394 ic details described below.
395
396 fid This corresponds to the fabric descriptor associated with the
397 event. The type of fid depends on the event being reported.
398 For FI_CONNREQ this will be the fid of the passive endpoint.
399 FI_CONNECTED and FI_SHUTDOWN will reference the active endpoint.
400 FI_MR_COMPLETE and FI_AV_COMPLETE will refer to the MR or AV
401 fabric descriptor, respectively. FI_JOIN_COMPLETE will point to
402 the multicast descriptor returned as part of the join operation.
403 Applications can use fid->context value to retrieve the context
404 associated with the fabric descriptor.
405
406 context
407 The context value is set to the context parameter specified with
408 the operation that generated the event. If no context parameter
409 is associated with the operation, this field will be NULL.
410
411 data Data is an operation specific value or set of bytes. For con‐
412 nection events, data is application data exchanged as part of
413 the connection protocol.
414
415 err This err code is a positive fabric errno associated with an
416 event. The err value indicates the general reason for an error,
417 if one occurred. See fi_errno.3 for a list of possible error
418 codes.
419
420 prov_errno
421 On an error, prov_errno may contain a provider specific error
422 code. The use of this field and its meaning is provider specif‐
423 ic. It is intended to be used as a debugging aid. See
424 fi_eq_strerror for additional details on converting this error
425 value into a human readable string.
426
427 err_data
428 On an error, err_data may reference a provider specific amount
429 of data associated with an error. The use of this field and its
430 meaning is provider specific. It is intended to be used as a
431 debugging aid. See fi_eq_strerror for additional details on
432 converting this error data into a human readable string.
433
434 err_data_size
435 On input, err_data_size indicates the size of the err_data buf‐
436 fer in bytes. On output, err_data_size will be set to the num‐
437 ber of bytes copied to the err_data buffer. The err_data infor‐
438 mation is typically used with fi_eq_strerror to provide details
439 about the type of error that occurred.
440
441 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
442 ric was opened with release < 1.5, err_data will be set to a data buf‐
443 fer owned by the provider. The contents of the buffer will remain
444 valid until a subsequent read call against the EQ. Applications must
445 serialize access to the EQ when processing errors to ensure that the
446 buffer referenced by err_data does no change.
447
449 If an event queue has been overrun, it will be placed into an `overrun'
450 state. Write operations against an overrun EQ will fail with
451 -FI_EOVERRUN. Read operations will continue to return any valid, non-
452 corrupted events, if available. After all valid events have been re‐
453 trieved, any attempt to read the EQ will result in it returning an
454 FI_EOVERRUN error event. Overrun event queues are considered fatal and
455 may not be used to report additional events once the overrun occurs.
456
458 fi_eq_open
459 Returns 0 on success. On error, a negative value corresponding
460 to fabric errno is returned.
461
462 fi_eq_read / fi_eq_readerr
463 On success, returns the number of bytes read from the event
464 queue. On error, a negative value corresponding to fabric errno
465 is returned. If no data is available to be read from the event
466 queue, -FI_EAGAIN is returned.
467
468 fi_eq_sread
469 On success, returns the number of bytes read from the event
470 queue. On error, a negative value corresponding to fabric errno
471 is returned. If the timeout expires or the calling thread is
472 signaled and no data is available to be read from the event
473 queue, -FI_EAGAIN is returned.
474
475 fi_eq_write
476 On success, returns the number of bytes written to the event
477 queue. On error, a negative value corresponding to fabric errno
478 is returned.
479
480 fi_eq_strerror
481 Returns a character string interpretation of the provider spe‐
482 cific error returned with a completion.
483
484 Fabric errno values are defined in rdma/fi_errno.h.
485
487 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cntr(3), fi_poll(3)
488
490 OpenFabrics.
491
492
493
494Libfabric Programmer’s Manual 2022-12-11 fi_eq(3)