1fi_eq(3) Libfabric v1.7.0 fi_eq(3)
2
3
4
6 fi_eq - Event queue operations
7
8 fi_eq_open / fi_close
9 Open/close an event queue
10
11 fi_control
12 Control operation of EQ
13
14 fi_eq_read / fi_eq_readerr
15 Read an event from an event queue
16
17 fi_eq_write
18 Writes an event to an event queue
19
20 fi_eq_sread
21 A synchronous (blocking) read of an event queue
22
23 fi_eq_strerror
24 Converts provider specific error information into a printable
25 string
26
28 #include <rdma/fi_domain.h>
29
30 int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr,
31 struct fid_eq **eq, void *context);
32
33 int fi_close(struct fid *eq);
34
35 int fi_control(struct fid *eq, int command, void *arg);
36
37 ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event,
38 void *buf, size_t len, uint64_t flags);
39
40 ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf,
41 uint64_t flags);
42
43 ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event,
44 const void *buf, size_t len, uint64_t flags);
45
46 ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event,
47 void *buf, size_t len, int timeout, uint64_t flags);
48
49 const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno,
50 const void *err_data, char *buf, size_t len);
51
53 fabric Opened fabric descriptor
54
55 eq Event queue
56
57 attr Event queue attributes
58
59 context
60 User specified context associated with the event queue.
61
62 event Reported event
63
64 buf For read calls, the data buffer to write events into. For write
65 calls, an event to insert into the event queue. For fi_eq_str‐
66 error, an optional buffer that receives printable error informa‐
67 tion.
68
69 len Length of data buffer
70
71 flags Additional flags to apply to the operation
72
73 command
74 Command of control operation to perform on EQ.
75
76 arg Optional control argument
77
78 prov_errno
79 Provider specific error value
80
81 err_data
82 Provider specific error data related to a completion
83
84 timeout
85 Timeout specified in milliseconds
86
88 Event queues are used to report events associated with control opera‐
89 tions. They are associated with memory registration, address vectors,
90 connection management, and fabric and domain level events. Reported
91 events are either associated with a requested operation or affiliated
92 with a call that registers for specific types of events, such as lis‐
93 tening for connection requests.
94
95 fi_eq_open
96 fi_eq_open allocates a new event queue.
97
98 The properties and behavior of an event queue are defined by
99 struct fi_eq_attr.
100
101 struct fi_eq_attr {
102 size_t size; /* # entries for EQ */
103 uint64_t flags; /* operation flags */
104 enum fi_wait_obj wait_obj; /* requested wait object */
105 int signaling_vector; /* interrupt affinity */
106 struct fid_wait *wait_set; /* optional wait set */
107 };
108
109 size Specifies the minimum size of an event queue.
110
111 flags Flags that control the configuration of the EQ.
112
113 - FI_WRITE
114 Indicates that the application requires support for inserting
115 user events into the EQ. If this flag is set, then the
116 fi_eq_write operation must be supported by the provider. If the
117 FI_WRITE flag is not set, then the application may not invoke
118 fi_eq_write.
119
120 - FI_AFFINITY
121 Indicates that the signaling_vector field (see below) is valid.
122
123 wait_obj
124 EQ's may be associated with a specific wait object. Wait ob‐
125 jects allow applications to block until the wait object is sig‐
126 naled, indicating that an event is available to be read. Users
127 may use fi_control to retrieve the underlying wait object asso‐
128 ciated with an EQ, in order to use it in other system calls.
129 The following values may be used to specify the type of wait ob‐
130 ject associated with an EQ:
131
132 - FI_WAIT_NONE
133 Used to indicate that the user will not block (wait) for events
134 on the EQ. When FI_WAIT_NONE is specified, the application may
135 not call fi_eq_sread. This is the default is no wait object is
136 specified.
137
138 - FI_WAIT_UNSPEC
139 Specifies that the user will only wait on the EQ using fabric
140 interface calls, such as fi_eq_sread. In this case, the under‐
141 lying provider may select the most appropriate or highest per‐
142 forming wait object available, including custom wait mechanisms.
143 Applications that select FI_WAIT_UNSPEC are not guaranteed to
144 retrieve the underlying wait object.
145
146 - FI_WAIT_SET
147 Indicates that the event queue should use a wait set object to
148 wait for events. If specified, the wait_set field must refer‐
149 ence an existing wait set object.
150
151 - FI_WAIT_FD
152 Indicates that the EQ should use a file descriptor as its wait
153 mechanism. A file descriptor wait object must be usable in se‐
154 lect, poll, and epoll routines. However, a provider may signal
155 an FD wait object by marking it as readable or with an error.
156
157 - FI_WAIT_MUTEX_COND
158 Specifies that the EQ should use a pthread mutex and cond vari‐
159 able as a wait object.
160
161 - FI_WAIT_CRITSEC_COND
162 Windows specific. Specifies that the EQ should use a critical
163 section and condition variable as a wait object.
164
165 signaling_vector
166 If the FI_AFFINITY flag is set, this indicates the logical cpu
167 number (0..max cpu - 1) that interrupts associated with the EQ
168 should target. This field should be treated as a hint to the
169 provider and may be ignored if the provider does not support in‐
170 terrupt affinity.
171
172 wait_set
173 If wait_obj is FI_WAIT_SET, this field references a wait object
174 to which the event queue should attach. When an event is in‐
175 serted into the event queue, the corresponding wait set will be
176 signaled if all necessary conditions are met. The use of a
177 wait_set enables an optimized method of waiting for events
178 across multiple event queues. This field is ignored if wait_obj
179 is not FI_WAIT_SET.
180
181 fi_close
182 The fi_close call releases all resources associated with an event
183 queue. Any events which remain on the EQ when it is closed are lost.
184
185 The EQ must not be bound to any other objects prior to being closed,
186 otherwise the call will return -FI_EBUSY.
187
188 fi_control
189 The fi_control call is used to access provider or implementation spe‐
190 cific details of the event queue. Access to the EQ should be serial‐
191 ized across all calls when fi_control is invoked, as it may redirect
192 the implementation of EQ operations. The following control commands
193 are usable with an EQ.
194
195 FI_GETWAIT (void **)
196 This command allows the user to retrieve the low-level wait ob‐
197 ject associated with the EQ. The format of the wait-object is
198 specified during EQ creation, through the EQ attributes. The
199 fi_control arg parameter should be an address where a pointer to
200 the returned wait object will be written. This should be an
201 'int *' for FI_WAIT_FD, or 'struct fi_mutex_cond' for
202 FI_WAIT_MUTEX_COND.
203
204 struct fi_mutex_cond {
205 pthread_mutex_t *mutex;
206 pthread_cond_t *cond;
207 };
208
209 fi_eq_read
210 The fi_eq_read operations performs a non-blocking read of event data
211 from the EQ. The format of the event data is based on the type of
212 event retrieved from the EQ, with all events starting with a struct
213 fi_eq_entry header. At most one event will be returned per EQ read op‐
214 eration. The number of bytes successfully read from the EQ is returned
215 from the read. The FI_PEEK flag may be used to indicate that event da‐
216 ta should be read from the EQ without being consumed. A subsequent
217 read without the FI_PEEK flag would then remove the event from the EQ.
218
219 The following types of events may be reported to an EQ, along with in‐
220 formation regarding the format associated with each event.
221
222 Asynchronous Control Operations
223 Asynchronous control operations are basic requests that simply
224 need to generate an event to indicate that they have completed.
225 These include the following types of events: memory registra‐
226 tion, address vector resolution, and multicast joins.
227
228 Control requests report their completion by inserting a
229 struct fi_eq_entry into the EQ. The format of this structure is:
230
231 struct fi_eq_entry {
232 fid_t fid; /* fid associated with request */
233 void *context; /* operation context */
234 uint64_t data; /* completion-specific data */
235 };
236
237 For the completion of basic asynchronous control operations, the re‐
238 turned event will indicate the operation that has completed, and the
239 fid will reference the fabric descriptor associated with the event.
240 For memory registration, this will be an FI_MR_COMPLETE event and the
241 fid_mr. Address resolution will reference an FI_AV_COMPLETE event and
242 fid_av. Multicast joins will report an FI_JOIN_COMPLETE and fid_mc.
243 The context field will be set to the context specified as part of the
244 operation, if available, otherwise the context will be associated with
245 the fabric descriptor. The data field will be set as described in the
246 man page for the corresponding object type (e.g., see fi_av(3) for a
247 description of how asynchronous address vector insertions are complet‐
248 ed).
249
250 Connection Notification
251 Connection notifications are connection management notifications
252 used to setup or tear down connections between endpoints. There
253 are three connection notification events: FI_CONNREQ, FI_CON‐
254 NECTED, and FI_SHUTDOWN. Connection notifications are reported
255 using struct fi_eq_cm_entry:
256
257 struct fi_eq_cm_entry {
258 fid_t fid; /* fid associated with request */
259 struct fi_info *info; /* endpoint information */
260 uint8_t data[]; /* app connection data */
261 };
262
263 A connection request (FI_CONNREQ) event indicates that a remote end‐
264 point wishes to establish a new connection to a listening, or passive,
265 endpoint. The fid is the passive endpoint. Information regarding the
266 requested, active endpoint's capabilities and attributes are available
267 from the info field. The application is responsible for freeing this
268 structure by calling fi_freeinfo when it is no longer needed. The
269 fi_info connreq field will reference the connection request associated
270 with this event. To accept a connection, an endpoint must first be
271 created by passing an fi_info structure referencing this connreq field
272 to fi_endpoint(). This endpoint is then passed to fi_accept() to com‐
273 plete the acceptance of the connection attempt. Creating the endpoint
274 is most easily accomplished by passing the fi_info returned as part of
275 the CM event into fi_endpoint(). If the connection is to be rejected,
276 the connreq is passed to fi_reject().
277
278 Any application data exchanged as part of the connection request is
279 placed beyond the fi_eq_cm_entry structure. The amount of data avail‐
280 able is application dependent and limited to the buffer space provided
281 by the application when fi_eq_read is called. The amount of returned
282 data may be calculated using the return value to fi_eq_read. Note that
283 the amount of returned data is limited by the underlying connection
284 protocol, and the length of any data returned may include protocol pad‐
285 ding. As a result, the returned length may be larger than that speci‐
286 fied by the connecting peer.
287
288 If a connection request has been accepted, an FI_CONNECTED event will
289 be generated on both sides of the connection. The active side -- one
290 that called fi_connect() -- may receive user data as part of the
291 FI_CONNECTED event. The user data is passed to the connection manager
292 on the passive side through the fi_accept call. User data is not pro‐
293 vided with an FI_CONNECTED event on the listening side of the connec‐
294 tion.
295
296 Notification that a remote peer has disconnected from an active end‐
297 point is done through the FI_SHUTDOWN event. Shutdown notification us‐
298 es struct fi_eq_cm_entry as declared above. The fid field for a shut‐
299 down notification refers to the active endpoint's fid_ep.
300
301 Asynchronous Error Notification
302 Asynchronous errors are used to report problems with fabric re‐
303 sources. Reported errors may be fatal or transient, based on
304 the error, and result in the resource becoming disabled. Dis‐
305 abled resources will fail operations submitted against them un‐
306 til they are explicitly re-enabled by the application.
307
308 Asynchronous errors may be reported for completion queues and endpoints
309 of all types. CQ errors can result when resource management has been
310 disabled, and the provider has detected a queue overrun. Endpoint er‐
311 rors may be result of numerous actions, but are often associated with a
312 failed operation. Operations may fail because of buffer overruns, in‐
313 valid permissions, incorrect memory access keys, network routing fail‐
314 ures, network reach-ability issues, etc.
315
316 Asynchronous errors are reported using struct fi_eq_err_entry, as de‐
317 fined below. The fabric descriptor (fid) associated with the error is
318 provided as part of the error data. An error code is also available to
319 determine the cause of the error.
320
321 fi_eq_sread
322 The fi_eq_sread call is the blocking (or synchronous) equivalent to
323 fi_eq_read. It behaves is similar to the non-blocking call, with the
324 exception that the calls will not return until either an event has been
325 read from the EQ or an error or timeout occurs. Specifying a negative
326 timeout means an infinite timeout.
327
328 It is invalid for applications to call this function if the EQ has been
329 configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
330
331 fi_eq_readerr
332 The read error function, fi_eq_readerr, retrieves information regarding
333 any asynchronous operation which has completed with an unexpected er‐
334 ror. fi_eq_readerr is a non-blocking call, returning immediately
335 whether an error completion was found or not.
336
337 EQs are optimized to report operations which have completed successful‐
338 ly. Operations which fail are reported 'out of band'. Such operations
339 are retrieved using the fi_eq_readerr function. When an operation that
340 completes with an unexpected error is inserted into an EQ, it is placed
341 into a temporary error queue. Attempting to read from an EQ while an
342 item is in the error queue results in an FI_EAVAIL failure. Applica‐
343 tions may use this return code to determine when to call fi_eq_readerr.
344
345 Error information is reported to the user through struct fi_eq_err_en‐
346 try. The format of this structure is defined below.
347
348 struct fi_eq_err_entry {
349 fid_t fid; /* fid associated with error */
350 void *context; /* operation context */
351 uint64_t data; /* completion-specific data */
352 int err; /* positive error code */
353 int prov_errno; /* provider error code */
354 void *err_data; /* additional error data */
355 size_t err_data_size; /* size of err_data */
356 };
357
358 The fid will reference the fabric descriptor associated with the event.
359 For memory registration, this will be the fid_mr, address resolution
360 will reference a fid_av, and CM events will refer to a fid_ep. The
361 context field will be set to the context specified as part of the oper‐
362 ation.
363
364 The data field will be set as described in the man page for the corre‐
365 sponding object type (e.g., see fi_av(3) for a description of how asyn‐
366 chronous address vector insertions are completed).
367
368 The general reason for the error is provided through the err field.
369 Provider or operational specific error information may also be avail‐
370 able through the prov_errno and err_data fields. Users may call
371 fi_eq_strerror to convert provider specific error information into a
372 printable string for debugging purposes.
373
374 On input, err_data_size indicates the size of the err_data buffer in
375 bytes. On output, err_data_size will be set to the number of bytes
376 copied to the err_data buffer. The err_data information is typically
377 used with fi_eq_strerror to provide details about the type of error
378 that occurred.
379
380 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
381 ric was opened with release < 1.5, err_data will be set to a data buf‐
382 fer owned by the provider. The contents of the buffer will remain
383 valid until a subsequent read call against the EQ. Applications must
384 serialize access to the EQ when processing errors to ensure that the
385 buffer referenced by err_data does not change.
386
388 The EQ entry data structures share many of the same fields. The mean‐
389 ings are the same or similar for all EQ structure formats, with specif‐
390 ic details described below.
391
392 fid This corresponds to the fabric descriptor associated with the
393 event. The type of fid depends on the event being reported.
394 For FI_CONNREQ this will be the fid of the passive endpoint.
395 FI_CONNECTED and FI_SHUTDOWN will reference the active endpoint.
396 FI_MR_COMPLETE and FI_AV_COMPLETE will refer to the MR or AV
397 fabric descriptor, respectively. FI_JOIN_COMPLETE will point to
398 the multicast descriptor returned as part of the join operation.
399 Applications can use fid->context value to retrieve the context
400 associated with the fabric descriptor.
401
402 context
403 The context value is set to the context parameter specified with
404 the operation that generated the event. If no context parameter
405 is associated with the operation, this field will be NULL.
406
407 data Data is an operation specific value or set of bytes. For con‐
408 nection events, data is application data exchanged as part of
409 the connection protocol.
410
411 err This err code is a positive fabric errno associated with an
412 event. The err value indicates the general reason for an error,
413 if one occurred. See fi_errno.3 for a list of possible error
414 codes.
415
416 prov_errno
417 On an error, prov_errno may contain a provider specific error
418 code. The use of this field and its meaning is provider specif‐
419 ic. It is intended to be used as a debugging aid. See
420 fi_eq_strerror for additional details on converting this error
421 value into a human readable string.
422
423 err_data
424 On an error, err_data may reference a provider specific amount
425 of data associated with an error. The use of this field and its
426 meaning is provider specific. It is intended to be used as a
427 debugging aid. See fi_eq_strerror for additional details on
428 converting this error data into a human readable string.
429
430 err_data_size
431 On input, err_data_size indicates the size of the err_data buf‐
432 fer in bytes. On output, err_data_size will be set to the num‐
433 ber of bytes copied to the err_data buffer. The err_data infor‐
434 mation is typically used with fi_eq_strerror to provide details
435 about the type of error that occurred.
436
437 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
438 ric was opened with release < 1.5, err_data will be set to a data buf‐
439 fer owned by the provider. The contents of the buffer will remain
440 valid until a subsequent read call against the EQ. Applications must
441 serialize access to the EQ when processing errors to ensure that the
442 buffer referenced by err_data does no change.
443
445 If an event queue has been overrun, it will be placed into an 'overrun'
446 state. Write operations against an overrun EQ will fail with
447 -FI_EOVERRUN. Read operations will continue to return any valid,
448 non-corrupted events, if available. After all valid events have been
449 retrieved, any attempt to read the EQ will result in it returning an
450 FI_EOVERRUN error event. Overrun event queues are considered fatal and
451 may not be used to report additional events once the overrun occurs.
452
454 fi_eq_open
455 Returns 0 on success. On error, a negative value corresponding
456 to fabric errno is returned.
457
458 fi_eq_read / fi_eq_readerr / fi_eq_sread
459 On success, returns the number of bytes read from the event
460 queue. On error, a negative value corresponding to fabric errno
461 is returned. If no data is available to be read from the event
462 queue, -FI_EAGAIN is returned.
463
464 fi_eq_write
465 On success, returns the number of bytes written to the event
466 queue. On error, a negative value corresponding to fabric errno
467 is returned.
468
469 fi_eq_strerror
470 Returns a character string interpretation of the provider spe‐
471 cific error returned with a completion.
472
473 Fabric errno values are defined in rdma/fi_errno.h.
474
476 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cntr(3), fi_poll(3)
477
479 OpenFabrics.
480
481
482
483Libfabric Programmer's Manual 2018-10-05 fi_eq(3)