1fi_eq(3) Libfabric v1.8.0 fi_eq(3)
2
3
4
6 fi_eq - Event queue operations
7
8 fi_eq_open / fi_close
9 Open/close an event queue
10
11 fi_control
12 Control operation of EQ
13
14 fi_eq_read / fi_eq_readerr
15 Read an event from an event queue
16
17 fi_eq_write
18 Writes an event to an event queue
19
20 fi_eq_sread
21 A synchronous (blocking) read of an event queue
22
23 fi_eq_strerror
24 Converts provider specific error information into a printable
25 string
26
28 #include <rdma/fi_domain.h>
29
30 int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr,
31 struct fid_eq **eq, void *context);
32
33 int fi_close(struct fid *eq);
34
35 int fi_control(struct fid *eq, int command, void *arg);
36
37 ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event,
38 void *buf, size_t len, uint64_t flags);
39
40 ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf,
41 uint64_t flags);
42
43 ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event,
44 const void *buf, size_t len, uint64_t flags);
45
46 ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event,
47 void *buf, size_t len, int timeout, uint64_t flags);
48
49 const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno,
50 const void *err_data, char *buf, size_t len);
51
53 fabric Opened fabric descriptor
54
55 eq Event queue
56
57 attr Event queue attributes
58
59 context
60 User specified context associated with the event queue.
61
62 event Reported event
63
64 buf For read calls, the data buffer to write events into. For write
65 calls, an event to insert into the event queue. For fi_eq_str‐
66 error, an optional buffer that receives printable error informa‐
67 tion.
68
69 len Length of data buffer
70
71 flags Additional flags to apply to the operation
72
73 command
74 Command of control operation to perform on EQ.
75
76 arg Optional control argument
77
78 prov_errno
79 Provider specific error value
80
81 err_data
82 Provider specific error data related to a completion
83
84 timeout
85 Timeout specified in milliseconds
86
88 Event queues are used to report events associated with control opera‐
89 tions. They are associated with memory registration, address vectors,
90 connection management, and fabric and domain level events. Reported
91 events are either associated with a requested operation or affiliated
92 with a call that registers for specific types of events, such as lis‐
93 tening for connection requests.
94
95 fi_eq_open
96 fi_eq_open allocates a new event queue.
97
98 The properties and behavior of an event queue are defined by
99 struct fi_eq_attr.
100
101 struct fi_eq_attr {
102 size_t size; /* # entries for EQ */
103 uint64_t flags; /* operation flags */
104 enum fi_wait_obj wait_obj; /* requested wait object */
105 int signaling_vector; /* interrupt affinity */
106 struct fid_wait *wait_set; /* optional wait set */
107 };
108
109 size Specifies the minimum size of an event queue.
110
111 flags Flags that control the configuration of the EQ.
112
113 - FI_WRITE
114 Indicates that the application requires support for inserting
115 user events into the EQ. If this flag is set, then the
116 fi_eq_write operation must be supported by the provider. If the
117 FI_WRITE flag is not set, then the application may not invoke
118 fi_eq_write.
119
120 - FI_AFFINITY
121 Indicates that the signaling_vector field (see below) is valid.
122
123 wait_obj
124 EQ's may be associated with a specific wait object. Wait ob‐
125 jects allow applications to block until the wait object is sig‐
126 naled, indicating that an event is available to be read. Users
127 may use fi_control to retrieve the underlying wait object asso‐
128 ciated with an EQ, in order to use it in other system calls.
129 The following values may be used to specify the type of wait ob‐
130 ject associated with an EQ:
131
132 - FI_WAIT_NONE
133 Used to indicate that the user will not block (wait) for events
134 on the EQ. When FI_WAIT_NONE is specified, the application may
135 not call fi_eq_sread. This is the default is no wait object is
136 specified.
137
138 - FI_WAIT_UNSPEC
139 Specifies that the user will only wait on the EQ using fabric
140 interface calls, such as fi_eq_sread. In this case, the under‐
141 lying provider may select the most appropriate or highest per‐
142 forming wait object available, including custom wait mechanisms.
143 Applications that select FI_WAIT_UNSPEC are not guaranteed to
144 retrieve the underlying wait object.
145
146 - FI_WAIT_SET
147 Indicates that the event queue should use a wait set object to
148 wait for events. If specified, the wait_set field must refer‐
149 ence an existing wait set object.
150
151 - FI_WAIT_FD
152 Indicates that the EQ should use a file descriptor as its wait
153 mechanism. A file descriptor wait object must be usable in se‐
154 lect, poll, and epoll routines. However, a provider may signal
155 an FD wait object by marking it as readable or with an error.
156
157 - FI_WAIT_MUTEX_COND
158 Specifies that the EQ should use a pthread mutex and cond vari‐
159 able as a wait object.
160
161 - FI_WAIT_CRITSEC_COND
162 Windows specific. Specifies that the EQ should use a critical
163 section and condition variable as a wait object.
164
165 signaling_vector
166 If the FI_AFFINITY flag is set, this indicates the logical cpu
167 number (0..max cpu - 1) that interrupts associated with the EQ
168 should target. This field should be treated as a hint to the
169 provider and may be ignored if the provider does not support in‐
170 terrupt affinity.
171
172 wait_set
173 If wait_obj is FI_WAIT_SET, this field references a wait object
174 to which the event queue should attach. When an event is in‐
175 serted into the event queue, the corresponding wait set will be
176 signaled if all necessary conditions are met. The use of a
177 wait_set enables an optimized method of waiting for events
178 across multiple event queues. This field is ignored if wait_obj
179 is not FI_WAIT_SET.
180
181 fi_close
182 The fi_close call releases all resources associated with an event
183 queue. Any events which remain on the EQ when it is closed are lost.
184
185 The EQ must not be bound to any other objects prior to being closed,
186 otherwise the call will return -FI_EBUSY.
187
188 fi_control
189 The fi_control call is used to access provider or implementation spe‐
190 cific details of the event queue. Access to the EQ should be serial‐
191 ized across all calls when fi_control is invoked, as it may redirect
192 the implementation of EQ operations. The following control commands
193 are usable with an EQ.
194
195 FI_GETWAIT (void **)
196 This command allows the user to retrieve the low-level wait ob‐
197 ject associated with the EQ. The format of the wait-object is
198 specified during EQ creation, through the EQ attributes. The
199 fi_control arg parameter should be an address where a pointer to
200 the returned wait object will be written. This should be an
201 'int *' for FI_WAIT_FD, or 'struct fi_mutex_cond' for
202 FI_WAIT_MUTEX_COND.
203
204 struct fi_mutex_cond {
205 pthread_mutex_t *mutex;
206 pthread_cond_t *cond;
207 };
208
209 fi_eq_read
210 The fi_eq_read operations performs a non-blocking read of event data
211 from the EQ. The format of the event data is based on the type of
212 event retrieved from the EQ, with all events starting with a struct
213 fi_eq_entry header. At most one event will be returned per EQ read op‐
214 eration. The number of bytes successfully read from the EQ is returned
215 from the read. The FI_PEEK flag may be used to indicate that event da‐
216 ta should be read from the EQ without being consumed. A subsequent
217 read without the FI_PEEK flag would then remove the event from the EQ.
218
219 The following types of events may be reported to an EQ, along with in‐
220 formation regarding the format associated with each event.
221
222 Asynchronous Control Operations
223 Asynchronous control operations are basic requests that simply
224 need to generate an event to indicate that they have completed.
225 These include the following types of events: memory registra‐
226 tion, address vector resolution, and multicast joins.
227
228 Control requests report their completion by inserting a
229 struct fi_eq_entry into the EQ. The format of this structure is:
230
231 struct fi_eq_entry {
232 fid_t fid; /* fid associated with request */
233 void *context; /* operation context */
234 uint64_t data; /* completion-specific data */
235 };
236
237 For the completion of basic asynchronous control operations, the re‐
238 turned event will indicate the operation that has completed, and the
239 fid will reference the fabric descriptor associated with the event.
240 For memory registration, this will be an FI_MR_COMPLETE event and the
241 fid_mr. Address resolution will reference an FI_AV_COMPLETE event and
242 fid_av. Multicast joins will report an FI_JOIN_COMPLETE and fid_mc.
243 The context field will be set to the context specified as part of the
244 operation, if available, otherwise the context will be associated with
245 the fabric descriptor. The data field will be set as described in the
246 man page for the corresponding object type (e.g., see fi_av(3) for a
247 description of how asynchronous address vector insertions are complet‐
248 ed).
249
250 Connection Notification
251 Connection notifications are connection management notifications
252 used to setup or tear down connections between endpoints. There
253 are three connection notification events: FI_CONNREQ, FI_CON‐
254 NECTED, and FI_SHUTDOWN. Connection notifications are reported
255 using struct fi_eq_cm_entry:
256
257 struct fi_eq_cm_entry {
258 fid_t fid; /* fid associated with request */
259 struct fi_info *info; /* endpoint information */
260 uint8_t data[]; /* app connection data */
261 };
262
263 A connection request (FI_CONNREQ) event indicates that a remote end‐
264 point wishes to establish a new connection to a listening, or passive,
265 endpoint. The fid is the passive endpoint. Information regarding the
266 requested, active endpoint's capabilities and attributes are available
267 from the info field. The application is responsible for freeing this
268 structure by calling fi_freeinfo when it is no longer needed. The
269 fi_info connreq field will reference the connection request associated
270 with this event. To accept a connection, an endpoint must first be
271 created by passing an fi_info structure referencing this connreq field
272 to fi_endpoint(). This endpoint is then passed to fi_accept() to com‐
273 plete the acceptance of the connection attempt. Creating the endpoint
274 is most easily accomplished by passing the fi_info returned as part of
275 the CM event into fi_endpoint(). If the connection is to be rejected,
276 the connreq is passed to fi_reject().
277
278 Any application data exchanged as part of the connection request is
279 placed beyond the fi_eq_cm_entry structure. The amount of data avail‐
280 able is application dependent and limited to the buffer space provided
281 by the application when fi_eq_read is called. The amount of returned
282 data may be calculated using the return value to fi_eq_read. Note that
283 the amount of returned data is limited by the underlying connection
284 protocol, and the length of any data returned may include protocol pad‐
285 ding. As a result, the returned length may be larger than that speci‐
286 fied by the connecting peer.
287
288 If a connection request has been accepted, an FI_CONNECTED event will
289 be generated on both sides of the connection. The active side -- one
290 that called fi_connect() -- may receive user data as part of the
291 FI_CONNECTED event. The user data is passed to the connection manager
292 on the passive side through the fi_accept call. User data is not pro‐
293 vided with an FI_CONNECTED event on the listening side of the connec‐
294 tion.
295
296 Notification that a remote peer has disconnected from an active end‐
297 point is done through the FI_SHUTDOWN event. Shutdown notification us‐
298 es struct fi_eq_cm_entry as declared above. The fid field for a shut‐
299 down notification refers to the active endpoint's fid_ep.
300
301 Asynchronous Error Notification
302 Asynchronous errors are used to report problems with fabric re‐
303 sources. Reported errors may be fatal or transient, based on
304 the error, and result in the resource becoming disabled. Dis‐
305 abled resources will fail operations submitted against them un‐
306 til they are explicitly re-enabled by the application.
307
308 Asynchronous errors may be reported for completion queues and endpoints
309 of all types. CQ errors can result when resource management has been
310 disabled, and the provider has detected a queue overrun. Endpoint er‐
311 rors may be result of numerous actions, but are often associated with a
312 failed operation. Operations may fail because of buffer overruns, in‐
313 valid permissions, incorrect memory access keys, network routing fail‐
314 ures, network reach-ability issues, etc.
315
316 Asynchronous errors are reported using struct fi_eq_err_entry, as de‐
317 fined below. The fabric descriptor (fid) associated with the error is
318 provided as part of the error data. An error code is also available to
319 determine the cause of the error.
320
321 fi_eq_sread
322 The fi_eq_sread call is the blocking (or synchronous) equivalent to
323 fi_eq_read. It behaves is similar to the non-blocking call, with the
324 exception that the calls will not return until either an event has been
325 read from the EQ or an error or timeout occurs. Specifying a negative
326 timeout means an infinite timeout.
327
328 Threads blocking in this function will return to the caller if they are
329 signaled by some external source. This is true even if the timeout has
330 not occurred or was specified as infinite.
331
332 It is invalid for applications to call this function if the EQ has been
333 configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
334
335 fi_eq_readerr
336 The read error function, fi_eq_readerr, retrieves information regarding
337 any asynchronous operation which has completed with an unexpected er‐
338 ror. fi_eq_readerr is a non-blocking call, returning immediately
339 whether an error completion was found or not.
340
341 EQs are optimized to report operations which have completed successful‐
342 ly. Operations which fail are reported 'out of band'. Such operations
343 are retrieved using the fi_eq_readerr function. When an operation that
344 completes with an unexpected error is inserted into an EQ, it is placed
345 into a temporary error queue. Attempting to read from an EQ while an
346 item is in the error queue results in an FI_EAVAIL failure. Applica‐
347 tions may use this return code to determine when to call fi_eq_readerr.
348
349 Error information is reported to the user through struct fi_eq_err_en‐
350 try. The format of this structure is defined below.
351
352 struct fi_eq_err_entry {
353 fid_t fid; /* fid associated with error */
354 void *context; /* operation context */
355 uint64_t data; /* completion-specific data */
356 int err; /* positive error code */
357 int prov_errno; /* provider error code */
358 void *err_data; /* additional error data */
359 size_t err_data_size; /* size of err_data */
360 };
361
362 The fid will reference the fabric descriptor associated with the event.
363 For memory registration, this will be the fid_mr, address resolution
364 will reference a fid_av, and CM events will refer to a fid_ep. The
365 context field will be set to the context specified as part of the oper‐
366 ation.
367
368 The data field will be set as described in the man page for the corre‐
369 sponding object type (e.g., see fi_av(3) for a description of how asyn‐
370 chronous address vector insertions are completed).
371
372 The general reason for the error is provided through the err field.
373 Provider or operational specific error information may also be avail‐
374 able through the prov_errno and err_data fields. Users may call
375 fi_eq_strerror to convert provider specific error information into a
376 printable string for debugging purposes.
377
378 On input, err_data_size indicates the size of the err_data buffer in
379 bytes. On output, err_data_size will be set to the number of bytes
380 copied to the err_data buffer. The err_data information is typically
381 used with fi_eq_strerror to provide details about the type of error
382 that occurred.
383
384 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
385 ric was opened with release < 1.5, err_data will be set to a data buf‐
386 fer owned by the provider. The contents of the buffer will remain
387 valid until a subsequent read call against the EQ. Applications must
388 serialize access to the EQ when processing errors to ensure that the
389 buffer referenced by err_data does not change.
390
392 The EQ entry data structures share many of the same fields. The mean‐
393 ings are the same or similar for all EQ structure formats, with specif‐
394 ic details described below.
395
396 fid This corresponds to the fabric descriptor associated with the
397 event. The type of fid depends on the event being reported.
398 For FI_CONNREQ this will be the fid of the passive endpoint.
399 FI_CONNECTED and FI_SHUTDOWN will reference the active endpoint.
400 FI_MR_COMPLETE and FI_AV_COMPLETE will refer to the MR or AV
401 fabric descriptor, respectively. FI_JOIN_COMPLETE will point to
402 the multicast descriptor returned as part of the join operation.
403 Applications can use fid->context value to retrieve the context
404 associated with the fabric descriptor.
405
406 context
407 The context value is set to the context parameter specified with
408 the operation that generated the event. If no context parameter
409 is associated with the operation, this field will be NULL.
410
411 data Data is an operation specific value or set of bytes. For con‐
412 nection events, data is application data exchanged as part of
413 the connection protocol.
414
415 err This err code is a positive fabric errno associated with an
416 event. The err value indicates the general reason for an error,
417 if one occurred. See fi_errno.3 for a list of possible error
418 codes.
419
420 prov_errno
421 On an error, prov_errno may contain a provider specific error
422 code. The use of this field and its meaning is provider specif‐
423 ic. It is intended to be used as a debugging aid. See
424 fi_eq_strerror for additional details on converting this error
425 value into a human readable string.
426
427 err_data
428 On an error, err_data may reference a provider specific amount
429 of data associated with an error. The use of this field and its
430 meaning is provider specific. It is intended to be used as a
431 debugging aid. See fi_eq_strerror for additional details on
432 converting this error data into a human readable string.
433
434 err_data_size
435 On input, err_data_size indicates the size of the err_data buf‐
436 fer in bytes. On output, err_data_size will be set to the num‐
437 ber of bytes copied to the err_data buffer. The err_data infor‐
438 mation is typically used with fi_eq_strerror to provide details
439 about the type of error that occurred.
440
441 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
442 ric was opened with release < 1.5, err_data will be set to a data buf‐
443 fer owned by the provider. The contents of the buffer will remain
444 valid until a subsequent read call against the EQ. Applications must
445 serialize access to the EQ when processing errors to ensure that the
446 buffer referenced by err_data does no change.
447
449 If an event queue has been overrun, it will be placed into an 'overrun'
450 state. Write operations against an overrun EQ will fail with
451 -FI_EOVERRUN. Read operations will continue to return any valid,
452 non-corrupted events, if available. After all valid events have been
453 retrieved, any attempt to read the EQ will result in it returning an
454 FI_EOVERRUN error event. Overrun event queues are considered fatal and
455 may not be used to report additional events once the overrun occurs.
456
458 fi_eq_open
459 Returns 0 on success. On error, a negative value corresponding
460 to fabric errno is returned.
461
462 fi_eq_read / fi_eq_readerr
463 On success, returns the number of bytes read from the event
464 queue. On error, a negative value corresponding to fabric errno
465 is returned. If no data is available to be read from the event
466 queue, -FI_EAGAIN is returned.
467
468 fi_eq_sread
469 On success, returns the number of bytes read from the event
470 queue. On error, a negative value corresponding to fabric errno
471 is returned. If the timeout expires or the calling thread is
472 signaled and no data is available to be read from the event
473 queue, -FI_EAGAIN is returned.
474
475 fi_eq_write
476 On success, returns the number of bytes written to the event
477 queue. On error, a negative value corresponding to fabric errno
478 is returned.
479
480 fi_eq_strerror
481 Returns a character string interpretation of the provider spe‐
482 cific error returned with a completion.
483
484 Fabric errno values are defined in rdma/fi_errno.h.
485
487 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cntr(3), fi_poll(3)
488
490 OpenFabrics.
491
492
493
494Libfabric Programmer's Manual 2019-02-19 fi_eq(3)