1fi_eq(3) Libfabric v1.6.1 fi_eq(3)
2
3
4
6 fi_eq - Event queue operations
7
8 fi_eq_open / fi_close : Open/close an event queue
9
10 fi_control : Control operation of EQ
11
12 fi_eq_read / fi_eq_readerr : Read an event from an event queue
13
14 fi_eq_write : Writes an event to an event queue
15
16 fi_eq_sread : A synchronous (blocking) read of an event queue
17
18 fi_eq_strerror : Converts provider specific error information into a
19 printable string
20
22 #include <rdma/fi_domain.h>
23
24 int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr,
25 struct fid_eq **eq, void *context);
26
27 int fi_close(struct fid *eq);
28
29 int fi_control(struct fid *eq, int command, void *arg);
30
31 ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event,
32 void *buf, size_t len, uint64_t flags);
33
34 ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf,
35 uint64_t flags);
36
37 ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event,
38 const void *buf, size_t len, uint64_t flags);
39
40 ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event,
41 void *buf, size_t len, int timeout, uint64_t flags);
42
43 const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno,
44 const void *err_data, char *buf, size_t len);
45
47 fabric : Opened fabric descriptor
48
49 eq : Event queue
50
51 attr : Event queue attributes
52
53 context : User specified context associated with the event queue.
54
55 event : Reported event
56
57 buf : For read calls, the data buffer to write events into. For write
58 calls, an event to insert into the event queue. For fi_eq_strerror, an
59 optional buffer that receives printable error information.
60
61 len : Length of data buffer
62
63 flags : Additional flags to apply to the operation
64
65 command : Command of control operation to perform on EQ.
66
67 arg : Optional control argument
68
69 prov_errno : Provider specific error value
70
71 err_data : Provider specific error data related to a completion
72
73 timeout : Timeout specified in milliseconds
74
76 Event queues are used to report events associated with control opera‐
77 tions. They are associated with memory registration, address vectors,
78 connection management, and fabric and domain level events. Reported
79 events are either associated with a requested operation or affiliated
80 with a call that registers for specific types of events, such as lis‐
81 tening for connection requests.
82
83 fi_eq_open
84 fi_eq_open allocates a new event queue.
85
86 The properties and behavior of an event queue are defined by
87 struct fi_eq_attr.
88
89 struct fi_eq_attr {
90 size_t size; /* # entries for EQ */
91 uint64_t flags; /* operation flags */
92 enum fi_wait_obj wait_obj; /* requested wait object */
93 int signaling_vector; /* interrupt affinity */
94 struct fid_wait *wait_set; /* optional wait set */
95 };
96
97 size : Specifies the minimum size of an event queue.
98
99 flags : Flags that control the configuration of the EQ.
100
101 · FI_WRITE : Indicates that the application requires support for
102 inserting user events into the EQ. If this flag is set, then the
103 fi_eq_write operation must be supported by the provider. If the
104 FI_WRITE flag is not set, then the application may not invoke
105 fi_eq_write.
106
107 · FI_AFFINITY : Indicates that the signaling_vector field (see below)
108 is valid.
109
110 wait_obj : EQ's may be associated with a specific wait object. Wait
111 objects allow applications to block until the wait object is signaled,
112 indicating that an event is available to be read. Users may use
113 fi_control to retrieve the underlying wait object associated with an
114 EQ, in order to use it in other system calls. The following values may
115 be used to specify the type of wait object associated with an EQ:
116
117 · FI_WAIT_NONE : Used to indicate that the user will not block (wait)
118 for events on the EQ. When FI_WAIT_NONE is specified, the applica‐
119 tion may not call fi_eq_sread. This is the default is no wait object
120 is specified.
121
122 · FI_WAIT_UNSPEC : Specifies that the user will only wait on the EQ
123 using fabric interface calls, such as fi_eq_sread. In this case, the
124 underlying provider may select the most appropriate or highest per‐
125 forming wait object available, including custom wait mechanisms.
126 Applications that select FI_WAIT_UNSPEC are not guaranteed to
127 retrieve the underlying wait object.
128
129 · FI_WAIT_SET : Indicates that the event queue should use a wait set
130 object to wait for events. If specified, the wait_set field must
131 reference an existing wait set object.
132
133 · FI_WAIT_FD : Indicates that the EQ should use a file descriptor as
134 its wait mechanism. A file descriptor wait object must be usable in
135 select, poll, and epoll routines. However, a provider may signal an
136 FD wait object by marking it as readable or with an error.
137
138 · FI_WAIT_MUTEX_COND : Specifies that the EQ should use a pthread mutex
139 and cond variable as a wait object.
140
141 · FI_WAIT_CRITSEC_COND : Windows specific. Specifies that the EQ
142 should use a critical section and condition variable as a wait
143 object.
144
145 signaling_vector : If the FI_AFFINITY flag is set, this indicates the
146 logical cpu number (0..max cpu - 1) that interrupts associated with the
147 EQ should target. This field should be treated as a hint to the
148 provider and may be ignored if the provider does not support interrupt
149 affinity.
150
151 wait_set : If wait_obj is FI_WAIT_SET, this field references a wait
152 object to which the event queue should attach. When an event is
153 inserted into the event queue, the corresponding wait set will be sig‐
154 naled if all necessary conditions are met. The use of a wait_set
155 enables an optimized method of waiting for events across multiple event
156 queues. This field is ignored if wait_obj is not FI_WAIT_SET.
157
158 fi_close
159 The fi_close call releases all resources associated with an event
160 queue. Any events which remain on the EQ when it is closed are lost.
161
162 The EQ must not be bound to any other objects prior to being closed,
163 otherwise the call will return -FI_EBUSY.
164
165 fi_control
166 The fi_control call is used to access provider or implementation spe‐
167 cific details of the event queue. Access to the EQ should be serial‐
168 ized across all calls when fi_control is invoked, as it may redirect
169 the implementation of EQ operations. The following control commands
170 are usable with an EQ.
171
172 FI_GETWAIT (void **) : This command allows the user to retrieve the
173 low-level wait object associated with the EQ. The format of the
174 wait-object is specified during EQ creation, through the EQ attributes.
175 The fi_control arg parameter should be an address where a pointer to
176 the returned wait object will be written. This should be an 'int *'
177 for FI_WAIT_FD, or 'struct fi_mutex_cond' for FI_WAIT_MUTEX_COND.
178
179 struct fi_mutex_cond {
180 pthread_mutex_t *mutex;
181 pthread_cond_t *cond;
182 };
183
184 fi_eq_read
185 The fi_eq_read operations performs a non-blocking read of event data
186 from the EQ. The format of the event data is based on the type of
187 event retrieved from the EQ, with all events starting with a struct
188 fi_eq_entry header. At most one event will be returned per EQ read
189 operation. The number of bytes successfully read from the EQ is
190 returned from the read. The FI_PEEK flag may be used to indicate that
191 event data should be read from the EQ without being consumed. A subse‐
192 quent read without the FI_PEEK flag would then remove the event from
193 the EQ.
194
195 The following types of events may be reported to an EQ, along with
196 information regarding the format associated with each event.
197
198 Asynchronous Control Operations : Asynchronous control operations are
199 basic requests that simply need to generate an event to indicate that
200 they have completed. These include the following types of events: mem‐
201 ory registration, address vector resolution, and multicast joins.
202
203 Control requests report their completion by inserting a
204 struct fi_eq_entry into the EQ. The format of this structure is:
205
206 struct fi_eq_entry {
207 fid_t fid; /* fid associated with request */
208 void *context; /* operation context */
209 uint64_t data; /* completion-specific data */
210 };
211
212 For the completion of basic asynchronous control operations, the
213 returned event will indicate the operation that has completed, and the
214 fid will reference the fabric descriptor associated with the event.
215 For memory registration, this will be an FI_MR_COMPLETE event and the
216 fid_mr. Address resolution will reference an FI_AV_COMPLETE event and
217 fid_av. Multicast joins will report an FI_JOIN_COMPLETE and fid_mc.
218 The context field will be set to the context specified as part of the
219 operation, if available, otherwise the context will be associated with
220 the fabric descriptor. The data field will be set as described in the
221 man page for the corresponding object type (e.g., see fi_av(3) for a
222 description of how asynchronous address vector insertions are com‐
223 pleted).
224
225 Connection Notification : Connection notifications are connection man‐
226 agement notifications used to setup or tear down connections between
227 endpoints. There are three connection notification events: FI_CONNREQ,
228 FI_CONNECTED, and FI_SHUTDOWN. Connection notifications are reported
229 using struct fi_eq_cm_entry:
230
231 struct fi_eq_cm_entry {
232 fid_t fid; /* fid associated with request */
233 struct fi_info *info; /* endpoint information */
234 uint8_t data[]; /* app connection data */
235 };
236
237 A connection request (FI_CONNREQ) event indicates that a remote end‐
238 point wishes to establish a new connection to a listening, or passive,
239 endpoint. The fid is the passive endpoint. Information regarding the
240 requested, active endpoint's capabilities and attributes are available
241 from the info field. The application is responsible for freeing this
242 structure by calling fi_freeinfo when it is no longer needed. The
243 fi_info connreq field will reference the connection request associated
244 with this event. To accept a connection, an endpoint must first be
245 created by passing an fi_info structure referencing this connreq field
246 to fi_endpoint(). This endpoint is then passed to fi_accept() to com‐
247 plete the acceptance of the connection attempt. Creating the endpoint
248 is most easily accomplished by passing the fi_info returned as part of
249 the CM event into fi_endpoint(). If the connection is to be rejected,
250 the connreq is passed to fi_reject().
251
252 Any application data exchanged as part of the connection request is
253 placed beyond the fi_eq_cm_entry structure. The amount of data avail‐
254 able is application dependent and limited to the buffer space provided
255 by the application when fi_eq_read is called. The amount of returned
256 data may be calculated using the return value to fi_eq_read. Note that
257 the amount of returned data is limited by the underlying connection
258 protocol, and the length of any data returned may include protocol pad‐
259 ding. As a result, the returned length may be larger than that speci‐
260 fied by the connecting peer.
261
262 If a connection request has been accepted, an FI_CONNECTED event will
263 be generated on both sides of the connection. The active side -- one
264 that called fi_connect() -- may receive user data as part of the
265 FI_CONNECTED event. The user data is passed to the connection manager
266 on the passive side through the fi_accept call. User data is not pro‐
267 vided with an FI_CONNECTED event on the listening side of the connec‐
268 tion.
269
270 Notification that a remote peer has disconnected from an active end‐
271 point is done through the FI_SHUTDOWN event. Shutdown notification
272 uses struct fi_eq_cm_entry as declared above. The fid field for a
273 shutdown notification refers to the active endpoint's fid_ep.
274
275 Asynchronous Error Notification : Asynchronous errors are used to
276 report problems with fabric resources. Reported errors may be fatal or
277 transient, based on the error, and result in the resource becoming dis‐
278 abled. Disabled resources will fail operations submitted against them
279 until they are explicitly re-enabled by the application.
280
281 Asynchronous errors may be reported for completion queues and endpoints
282 of all types. CQ errors can result when resource management has been
283 disabled, and the provider has detected a queue overrun. Endpoint
284 errors may be result of numerous actions, but are often associated with
285 a failed operation. Operations may fail because of buffer overruns,
286 invalid permissions, incorrect memory access keys, network routing
287 failures, network reach-ability issues, etc.
288
289 Asynchronous errors are reported using struct fi_eq_err_entry, as
290 defined below. The fabric descriptor (fid) associated with the error
291 is provided as part of the error data. An error code is also available
292 to determine the cause of the error.
293
294 fi_eq_sread
295 The fi_eq_sread call is the blocking (or synchronous) equivalent to
296 fi_eq_read. It behaves is similar to the non-blocking call, with the
297 exception that the calls will not return until either an event has been
298 read from the EQ or an error or timeout occurs. Specifying a negative
299 timeout means an infinite timeout.
300
301 It is invalid for applications to call this function if the EQ has been
302 configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
303
304 fi_eq_readerr
305 The read error function, fi_eq_readerr, retrieves information regarding
306 any asynchronous operation which has completed with an unexpected
307 error. fi_eq_readerr is a non-blocking call, returning immediately
308 whether an error completion was found or not.
309
310 EQs are optimized to report operations which have completed success‐
311 fully. Operations which fail are reported 'out of band'. Such opera‐
312 tions are retrieved using the fi_eq_readerr function. When an opera‐
313 tion that completes with an unexpected error is inserted into an EQ, it
314 is placed into a temporary error queue. Attempting to read from an EQ
315 while an item is in the error queue results in an FI_EAVAIL failure.
316 Applications may use this return code to determine when to call
317 fi_eq_readerr.
318
319 Error information is reported to the user through struct
320 fi_eq_err_entry. The format of this structure is defined below.
321
322 struct fi_eq_err_entry {
323 fid_t fid; /* fid associated with error */
324 void *context; /* operation context */
325 uint64_t data; /* completion-specific data */
326 int err; /* positive error code */
327 int prov_errno; /* provider error code */
328 void *err_data; /* additional error data */
329 size_t err_data_size; /* size of err_data */
330 };
331
332 The fid will reference the fabric descriptor associated with the event.
333 For memory registration, this will be the fid_mr, address resolution
334 will reference a fid_av, and CM events will refer to a fid_ep. The
335 context field will be set to the context specified as part of the oper‐
336 ation.
337
338 The data field will be set as described in the man page for the corre‐
339 sponding object type (e.g., see fi_av(3) for a description of how asyn‐
340 chronous address vector insertions are completed).
341
342 The general reason for the error is provided through the err field.
343 Provider or operational specific error information may also be avail‐
344 able through the prov_errno and err_data fields. Users may call
345 fi_eq_strerror to convert provider specific error information into a
346 printable string for debugging purposes.
347
348 On input, err_data_size indicates the size of the err_data buffer in
349 bytes. On output, err_data_size will be set to the number of bytes
350 copied to the err_data buffer. The err_data information is typically
351 used with fi_eq_strerror to provide details about the type of error
352 that occurred.
353
354 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
355 ric was opened with release < 1.5, err_data will be set to a data buf‐
356 fer owned by the provider. The contents of the buffer will remain
357 valid until a subsequent read call against the EQ. Applications must
358 serialize access to the EQ when processing errors to ensure that the
359 buffer referenced by err_data does not change.
360
362 The EQ entry data structures share many of the same fields. The mean‐
363 ings are the same or similar for all EQ structure formats, with spe‐
364 cific details described below.
365
366 fid : This corresponds to the fabric descriptor associated with the
367 event. The type of fid depends on the event being reported. For
368 FI_CONNREQ this will be the fid of the passive endpoint. FI_CONNECTED
369 and FI_SHUTDOWN will reference the active endpoint. FI_MR_COMPLETE and
370 FI_AV_COMPLETE will refer to the MR or AV fabric descriptor, respec‐
371 tively. FI_JOIN_COMPLETE will point to the multicast descriptor
372 returned as part of the join operation. Applications can use fid->con‐
373 text value to retrieve the context associated with the fabric descrip‐
374 tor.
375
376 context : The context value is set to the context parameter specified
377 with the operation that generated the event. If no context parameter
378 is associated with the operation, this field will be NULL.
379
380 data : Data is an operation specific value or set of bytes. For con‐
381 nection events, data is application data exchanged as part of the con‐
382 nection protocol.
383
384 err : This err code is a positive fabric errno associated with an
385 event. The err value indicates the general reason for an error, if one
386 occurred. See fi_errno.3 for a list of possible error codes.
387
388 prov_errno : On an error, prov_errno may contain a provider specific
389 error code. The use of this field and its meaning is provider spe‐
390 cific. It is intended to be used as a debugging aid. See fi_eq_str‐
391 error for additional details on converting this error value into a
392 human readable string.
393
394 err_data : On an error, err_data may reference a provider specific
395 amount of data associated with an error. The use of this field and its
396 meaning is provider specific. It is intended to be used as a debugging
397 aid. See fi_eq_strerror for additional details on converting this
398 error data into a human readable string.
399
400 err_data_size : On input, err_data_size indicates the size of the
401 err_data buffer in bytes. On output, err_data_size will be set to the
402 number of bytes copied to the err_data buffer. The err_data informa‐
403 tion is typically used with fi_eq_strerror to provide details about the
404 type of error that occurred.
405
406 For compatibility purposes, if err_data_size is 0 on input, or the fab‐
407 ric was opened with release < 1.5, err_data will be set to a data buf‐
408 fer owned by the provider. The contents of the buffer will remain
409 valid until a subsequent read call against the EQ. Applications must
410 serialize access to the EQ when processing errors to ensure that the
411 buffer referenced by err_data does no change.
412
414 If an event queue has been overrun, it will be placed into an 'overrun'
415 state. Write operations against an overrun EQ will fail with
416 -FI_EOVERRUN. Read operations will continue to return any valid,
417 non-corrupted events, if available. After all valid events have been
418 retrieved, any attempt to read the EQ will result in it returning an
419 FI_EOVERRUN error event. Overrun event queues are considered fatal and
420 may not be used to report additional events once the overrun occurs.
421
423 fi_eq_open : Returns 0 on success. On error, a negative value corre‐
424 sponding to fabric errno is returned.
425
426 fi_eq_read / fi_eq_readerr / fi_eq_sread : On success, returns the num‐
427 ber of bytes read from the event queue. On error, a negative value
428 corresponding to fabric errno is returned. If no data is available to
429 be read from the event queue, -FI_EAGAIN is returned.
430
431 fi_eq_write : On success, returns the number of bytes written to the
432 event queue. On error, a negative value corresponding to fabric errno
433 is returned.
434
435 fi_eq_strerror : Returns a character string interpretation of the
436 provider specific error returned with a completion.
437
438 Fabric errno values are defined in rdma/fi_errno.h.
439
441 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cntr(3), fi_poll(3)
442
444 OpenFabrics.
445
446
447
448Libfabric Programmer's Manual 2017-12-01 fi_eq(3)