1fi_eq(3)                     Libfabric v1.12.0rc1                     fi_eq(3)
2
3
4

NAME

6       fi_eq - Event queue operations
7
8       fi_eq_open / fi_close
9              Open/close an event queue
10
11       fi_control
12              Control operation of EQ
13
14       fi_eq_read / fi_eq_readerr
15              Read an event from an event queue
16
17       fi_eq_write
18              Writes an event to an event queue
19
20       fi_eq_sread
21              A synchronous (blocking) read of an event queue
22
23       fi_eq_strerror
24              Converts  provider  specific  error information into a printable
25              string
26

SYNOPSIS

28              #include <rdma/fi_domain.h>
29
30              int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr,
31                  struct fid_eq **eq, void *context);
32
33              int fi_close(struct fid *eq);
34
35              int fi_control(struct fid *eq, int command, void *arg);
36
37              ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event,
38                  void *buf, size_t len, uint64_t flags);
39
40              ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf,
41                  uint64_t flags);
42
43              ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event,
44                  const void *buf, size_t len, uint64_t flags);
45
46              ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event,
47                  void *buf, size_t len, int timeout, uint64_t flags);
48
49              const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno,
50                    const void *err_data, char *buf, size_t len);
51

ARGUMENTS

53       fabric Opened fabric descriptor
54
55       eq     Event queue
56
57       attr   Event queue attributes
58
59       context
60              User specified context associated with the event queue.
61
62       event  Reported event
63
64       buf    For read calls, the data buffer to write events into.  For write
65              calls,  an event to insert into the event queue.  For fi_eq_str‐
66              error, an optional buffer that receives printable error informa‐
67              tion.
68
69       len    Length of data buffer
70
71       flags  Additional flags to apply to the operation
72
73       command
74              Command of control operation to perform on EQ.
75
76       arg    Optional control argument
77
78       prov_errno
79              Provider specific error value
80
81       err_data
82              Provider specific error data related to a completion
83
84       timeout
85              Timeout specified in milliseconds
86

DESCRIPTION

88       Event  queues  are used to report events associated with control opera‐
89       tions.  They are associated with memory registration, address  vectors,
90       connection  management,  and  fabric and domain level events.  Reported
91       events are either associated with a requested operation  or  affiliated
92       with  a  call that registers for specific types of events, such as lis‐
93       tening for connection requests.
94
95   fi_eq_open
96       fi_eq_open allocates a new event queue.
97
98       The  properties  and  behavior  of  an  event  queue  are  defined   by
99       struct fi_eq_attr.
100
101              struct fi_eq_attr {
102                  size_t               size;      /* # entries for EQ */
103                  uint64_t             flags;     /* operation flags */
104                  enum fi_wait_obj     wait_obj;  /* requested wait object */
105                  int                  signaling_vector; /* interrupt affinity */
106                  struct fid_wait     *wait_set;  /* optional wait set */
107              };
108
109       size   Specifies the minimum size of an event queue.
110
111       flags  Flags that control the configuration of the EQ.
112
113       - FI_WRITE
114              Indicates  that  the  application requires support for inserting
115              user events into  the  EQ.   If  this  flag  is  set,  then  the
116              fi_eq_write operation must be supported by the provider.  If the
117              FI_WRITE flag is not set, then the application  may  not  invoke
118              fi_eq_write.
119
120       - FI_AFFINITY
121              Indicates that the signaling_vector field (see below) is valid.
122
123       wait_obj
124              EQ's  may  be  associated with a specific wait object.  Wait ob‐
125              jects allow applications to block until the wait object is  sig‐
126              naled,  indicating that an event is available to be read.  Users
127              may use fi_control to retrieve the underlying wait object  asso‐
128              ciated  with  an  EQ,  in order to use it in other system calls.
129              The following values may be used to specify the type of wait ob‐
130              ject associated with an EQ:
131
132       - FI_WAIT_NONE
133              Used  to indicate that the user will not block (wait) for events
134              on the EQ.  When FI_WAIT_NONE is specified, the application  may
135              not  call fi_eq_sread.  This is the default is no wait object is
136              specified.
137
138       - FI_WAIT_UNSPEC
139              Specifies that the user will only wait on the  EQ  using  fabric
140              interface  calls, such as fi_eq_sread.  In this case, the under‐
141              lying provider may select the most appropriate or  highest  per‐
142              forming wait object available, including custom wait mechanisms.
143              Applications that select FI_WAIT_UNSPEC are  not  guaranteed  to
144              retrieve the underlying wait object.
145
146       - FI_WAIT_SET
147              Indicates  that  the event queue should use a wait set object to
148              wait for events.  If specified, the wait_set field  must  refer‐
149              ence an existing wait set object.
150
151       - FI_WAIT_FD
152              Indicates  that  the EQ should use a file descriptor as its wait
153              mechanism.  A file descriptor wait object must be usable in  se‐
154              lect,  poll, and epoll routines.  However, a provider may signal
155              an FD wait object by marking it as readable or with an error.
156
157       - FI_WAIT_MUTEX_COND
158              Specifies that the EQ should use a pthread mutex and cond  vari‐
159              able as a wait object.
160
161       - FI_WAIT_YIELD
162              Indicates  that  the  EQ will wait without a wait object but in‐
163              stead yield on every wait.  Allows usage of fi_eq_sread  through
164              a spin.
165
166       signaling_vector
167              If  the  FI_AFFINITY flag is set, this indicates the logical cpu
168              number (0..max cpu - 1) that interrupts associated with  the  EQ
169              should  target.   This  field should be treated as a hint to the
170              provider and may be ignored if the provider does not support in‐
171              terrupt affinity.
172
173       wait_set
174              If  wait_obj is FI_WAIT_SET, this field references a wait object
175              to which the event queue should attach.  When an  event  is  in‐
176              serted  into the event queue, the corresponding wait set will be
177              signaled if all necessary conditions are  met.   The  use  of  a
178              wait_set  enables  an  optimized  method  of  waiting for events
179              across multiple event queues.  This field is ignored if wait_obj
180              is not FI_WAIT_SET.
181
182   fi_close
183       The  fi_close  call  releases  all  resources  associated with an event
184       queue.  Any events which remain on the EQ when it is closed are lost.
185
186       The EQ must not be bound to any other objects prior  to  being  closed,
187       otherwise the call will return -FI_EBUSY.
188
189   fi_control
190       The  fi_control  call is used to access provider or implementation spe‐
191       cific details of the event queue.  Access to the EQ should  be  serial‐
192       ized  across  all  calls when fi_control is invoked, as it may redirect
193       the implementation of EQ operations.  The  following  control  commands
194       are usable with an EQ.
195
196       FI_GETWAIT (void **)
197              This  command allows the user to retrieve the low-level wait ob‐
198              ject associated with the EQ.  The format of the  wait-object  is
199              specified  during  EQ  creation, through the EQ attributes.  The
200              fi_control arg parameter should be an address where a pointer to
201              the  returned  wait  object  will be written.  This should be an
202              'int  *'  for  FI_WAIT_FD,   or   'struct   fi_mutex_cond'   for
203              FI_WAIT_MUTEX_COND.
204
205              struct fi_mutex_cond {
206                  pthread_mutex_t     *mutex;
207                  pthread_cond_t      *cond;
208              };
209
210   fi_eq_read
211       The  fi_eq_read  operations  performs a non-blocking read of event data
212       from the EQ.  The format of the event data is  based  on  the  type  of
213       event  retrieved  from  the  EQ, with all events starting with a struct
214       fi_eq_entry header.  At most one event will be returned per EQ read op‐
215       eration.  The number of bytes successfully read from the EQ is returned
216       from the read.  The FI_PEEK flag may be used to indicate that event da‐
217       ta  should  be  read  from the EQ without being consumed.  A subsequent
218       read without the FI_PEEK flag would then remove the event from the EQ.
219
220       The following types of events may be reported to an EQ, along with  in‐
221       formation regarding the format associated with each event.
222
223       Asynchronous Control Operations
224              Asynchronous  control  operations are basic requests that simply
225              need to generate an event to indicate that they have  completed.
226              These  include  the  following types of events: memory registra‐
227              tion, address vector resolution, and multicast joins.
228
229       Control   requests   report   their   completion   by    inserting    a
230       struct   fi_eq_entry into the EQ.  The format of this structure is:
231
232              struct fi_eq_entry {
233                  fid_t            fid;        /* fid associated with request */
234                  void            *context;    /* operation context */
235                  uint64_t         data;       /* completion-specific data */
236              };
237
238       For  the  completion  of basic asynchronous control operations, the re‐
239       turned event will indicate the operation that has  completed,  and  the
240       fid  will  reference  the  fabric descriptor associated with the event.
241       For memory registration, this will be an FI_MR_COMPLETE event  and  the
242       fid_mr.   Address resolution will reference an FI_AV_COMPLETE event and
243       fid_av.  Multicast joins will report an  FI_JOIN_COMPLETE  and  fid_mc.
244       The  context  field will be set to the context specified as part of the
245       operation, if available, otherwise the context will be associated  with
246       the  fabric descriptor.  The data field will be set as described in the
247       man page for the corresponding object type (e.g., see  fi_av(3)  for  a
248       description  of how asynchronous address vector insertions are complet‐
249       ed).
250
251       Connection Notification
252              Connection notifications are connection management notifications
253              used to setup or tear down connections between endpoints.  There
254              are three connection notification  events:  FI_CONNREQ,  FI_CON‐
255              NECTED,  and FI_SHUTDOWN.  Connection notifications are reported
256              using struct   fi_eq_cm_entry:
257
258              struct fi_eq_cm_entry {
259                  fid_t            fid;        /* fid associated with request */
260                  struct fi_info  *info;       /* endpoint information */
261                  uint8_t         data[];     /* app connection data */
262              };
263
264       A connection request (FI_CONNREQ) event indicates that  a  remote  end‐
265       point  wishes to establish a new connection to a listening, or passive,
266       endpoint.  The fid is the passive endpoint.  Information regarding  the
267       requested,  active endpoint's capabilities and attributes are available
268       from the info field.  The application is responsible for  freeing  this
269       structure  by  calling  fi_freeinfo  when  it is no longer needed.  The
270       fi_info connreq field will reference the connection request  associated
271       with  this  event.   To  accept a connection, an endpoint must first be
272       created by passing an fi_info structure referencing this connreq  field
273       to  fi_endpoint().  This endpoint is then passed to fi_accept() to com‐
274       plete the acceptance of the connection attempt.  Creating the  endpoint
275       is  most easily accomplished by passing the fi_info returned as part of
276       the CM event into fi_endpoint().  If the connection is to be  rejected,
277       the connreq is passed to fi_reject().
278
279       Any  application  data  exchanged  as part of the connection request is
280       placed beyond the fi_eq_cm_entry structure.  The amount of data  avail‐
281       able  is application dependent and limited to the buffer space provided
282       by the application when fi_eq_read is called.  The amount  of  returned
283       data may be calculated using the return value to fi_eq_read.  Note that
284       the amount of returned data is limited  by  the  underlying  connection
285       protocol, and the length of any data returned may include protocol pad‐
286       ding.  As a result, the returned length may be larger than that  speci‐
287       fied by the connecting peer.
288
289       If  a  connection request has been accepted, an FI_CONNECTED event will
290       be generated on both sides of the connection.  The active side  --  one
291       that  called  fi_connect()  --  may  receive  user  data as part of the
292       FI_CONNECTED event.  The user data is passed to the connection  manager
293       on  the passive side through the fi_accept call.  User data is not pro‐
294       vided with an FI_CONNECTED event on the listening side of  the  connec‐
295       tion.
296
297       Notification  that  a  remote peer has disconnected from an active end‐
298       point is done through the FI_SHUTDOWN event.  Shutdown notification us‐
299       es  struct fi_eq_cm_entry as declared above.  The fid field for a shut‐
300       down notification refers to the active endpoint's fid_ep.
301
302       Asynchronous Error Notification
303              Asynchronous errors are used to report problems with fabric  re‐
304              sources.   Reported  errors  may be fatal or transient, based on
305              the error, and result in the resource becoming  disabled.   Dis‐
306              abled  resources will fail operations submitted against them un‐
307              til they are explicitly re-enabled by the application.
308
309       Asynchronous errors may be reported for completion queues and endpoints
310       of  all  types.  CQ errors can result when resource management has been
311       disabled, and the provider has detected a queue overrun.  Endpoint  er‐
312       rors may be result of numerous actions, but are often associated with a
313       failed operation.  Operations may fail because of buffer overruns,  in‐
314       valid  permissions, incorrect memory access keys, network routing fail‐
315       ures, network reach-ability issues, etc.
316
317       Asynchronous errors are reported using struct fi_eq_err_entry,  as  de‐
318       fined  below.  The fabric descriptor (fid) associated with the error is
319       provided as part of the error data.  An error code is also available to
320       determine the cause of the error.
321
322   fi_eq_sread
323       The  fi_eq_sread  call  is  the blocking (or synchronous) equivalent to
324       fi_eq_read.  It behaves is similar to the non-blocking call,  with  the
325       exception that the calls will not return until either an event has been
326       read from the EQ or an error or timeout occurs.  Specifying a  negative
327       timeout means an infinite timeout.
328
329       Threads blocking in this function will return to the caller if they are
330       signaled by some external source.  This is true even if the timeout has
331       not occurred or was specified as infinite.
332
333       It is invalid for applications to call this function if the EQ has been
334       configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
335
336   fi_eq_readerr
337       The read error function, fi_eq_readerr, retrieves information regarding
338       any  asynchronous  operation which has completed with an unexpected er‐
339       ror.  fi_eq_readerr  is  a  non-blocking  call,  returning  immediately
340       whether an error completion was found or not.
341
342       EQs are optimized to report operations which have completed successful‐
343       ly.  Operations which fail are reported 'out of band'.  Such operations
344       are retrieved using the fi_eq_readerr function.  When an operation that
345       completes with an unexpected error is inserted into an EQ, it is placed
346       into  a  temporary error queue.  Attempting to read from an EQ while an
347       item is in the error queue results in an FI_EAVAIL  failure.   Applica‐
348       tions may use this return code to determine when to call fi_eq_readerr.
349
350       Error  information is reported to the user through struct fi_eq_err_en‐
351       try.  The format of this structure is defined below.
352
353              struct fi_eq_err_entry {
354                  fid_t            fid;        /* fid associated with error */
355                  void            *context;    /* operation context */
356                  uint64_t         data;       /* completion-specific data */
357                  int              err;        /* positive error code */
358                  int              prov_errno; /* provider error code */
359                  void            *err_data;   /* additional error data */
360                  size_t           err_data_size; /* size of err_data */
361              };
362
363       The fid will reference the fabric descriptor associated with the event.
364       For  memory  registration,  this will be the fid_mr, address resolution
365       will reference a fid_av, and CM events will refer  to  a  fid_ep.   The
366       context field will be set to the context specified as part of the oper‐
367       ation.
368
369       The data field will be set as described in the man page for the  corre‐
370       sponding object type (e.g., see fi_av(3) for a description of how asyn‐
371       chronous address vector insertions are completed).
372
373       The general reason for the error is provided  through  the  err  field.
374       Provider  or  operational specific error information may also be avail‐
375       able through the  prov_errno  and  err_data  fields.   Users  may  call
376       fi_eq_strerror  to  convert  provider specific error information into a
377       printable string for debugging purposes.
378
379       On input, err_data_size indicates the size of the  err_data  buffer  in
380       bytes.   On  output,  err_data_size  will be set to the number of bytes
381       copied to the err_data buffer.  The err_data information  is  typically
382       used  with  fi_eq_strerror  to  provide details about the type of error
383       that occurred.
384
385       For compatibility purposes, if err_data_size is 0 on input, or the fab‐
386       ric  was opened with release < 1.5, err_data will be set to a data buf‐
387       fer owned by the provider.  The contents  of  the  buffer  will  remain
388       valid  until  a subsequent read call against the EQ.  Applications must
389       serialize access to the EQ when processing errors to  ensure  that  the
390       buffer referenced by err_data does not change.
391

EVENT FIELDS

393       The  EQ entry data structures share many of the same fields.  The mean‐
394       ings are the same or similar for all EQ structure formats, with specif‐
395       ic details described below.
396
397       fid    This  corresponds  to  the fabric descriptor associated with the
398              event.  The type of fid depends on  the  event  being  reported.
399              For  FI_CONNREQ  this  will  be the fid of the passive endpoint.
400              FI_CONNECTED and FI_SHUTDOWN will reference the active endpoint.
401              FI_MR_COMPLETE  and  FI_AV_COMPLETE  will  refer to the MR or AV
402              fabric descriptor, respectively.  FI_JOIN_COMPLETE will point to
403              the multicast descriptor returned as part of the join operation.
404              Applications can use fid->context value to retrieve the  context
405              associated with the fabric descriptor.
406
407       context
408              The context value is set to the context parameter specified with
409              the operation that generated the event.  If no context parameter
410              is associated with the operation, this field will be NULL.
411
412       data   Data  is  an operation specific value or set of bytes.  For con‐
413              nection events, data is application data exchanged  as  part  of
414              the connection protocol.
415
416       err    This  err  code  is  a  positive fabric errno associated with an
417              event.  The err value indicates the general reason for an error,
418              if  one  occurred.   See fi_errno.3 for a list of possible error
419              codes.
420
421       prov_errno
422              On an error, prov_errno may contain a  provider  specific  error
423              code.  The use of this field and its meaning is provider specif‐
424              ic.  It is  intended  to  be  used  as  a  debugging  aid.   See
425              fi_eq_strerror  for  additional details on converting this error
426              value into a human readable string.
427
428       err_data
429              On an error, err_data may reference a provider  specific  amount
430              of data associated with an error.  The use of this field and its
431              meaning is provider specific.  It is intended to be  used  as  a
432              debugging  aid.   See  fi_eq_strerror  for additional details on
433              converting this error data into a human readable string.
434
435       err_data_size
436              On input, err_data_size indicates the size of the err_data  buf‐
437              fer  in bytes.  On output, err_data_size will be set to the num‐
438              ber of bytes copied to the err_data buffer.  The err_data infor‐
439              mation  is typically used with fi_eq_strerror to provide details
440              about the type of error that occurred.
441
442       For compatibility purposes, if err_data_size is 0 on input, or the fab‐
443       ric  was opened with release < 1.5, err_data will be set to a data buf‐
444       fer owned by the provider.  The contents  of  the  buffer  will  remain
445       valid  until  a subsequent read call against the EQ.  Applications must
446       serialize access to the EQ when processing errors to  ensure  that  the
447       buffer referenced by err_data does no change.
448

NOTES

450       If an event queue has been overrun, it will be placed into an 'overrun'
451       state.   Write  operations  against  an  overrun  EQ  will  fail   with
452       -FI_EOVERRUN.   Read  operations  will  continue  to  return any valid,
453       non-corrupted events, if available.  After all valid events  have  been
454       retrieved,  any  attempt  to read the EQ will result in it returning an
455       FI_EOVERRUN error event.  Overrun event queues are considered fatal and
456       may not be used to report additional events once the overrun occurs.
457

RETURN VALUES

459       fi_eq_open
460              Returns  0 on success.  On error, a negative value corresponding
461              to fabric errno is returned.
462
463       fi_eq_read / fi_eq_readerr
464              On success, returns the number of  bytes  read  from  the  event
465              queue.  On error, a negative value corresponding to fabric errno
466              is returned.  If no data is available to be read from the  event
467              queue, -FI_EAGAIN is returned.
468
469       fi_eq_sread
470              On  success,  returns  the  number  of bytes read from the event
471              queue.  On error, a negative value corresponding to fabric errno
472              is  returned.   If  the timeout expires or the calling thread is
473              signaled and no data is available to  be  read  from  the  event
474              queue, -FI_EAGAIN is returned.
475
476       fi_eq_write
477              On  success,  returns  the  number of bytes written to the event
478              queue.  On error, a negative value corresponding to fabric errno
479              is returned.
480
481       fi_eq_strerror
482              Returns  a  character string interpretation of the provider spe‐
483              cific error returned with a completion.
484
485       Fabric errno values are defined in rdma/fi_errno.h.
486

SEE ALSO

488       fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cntr(3), fi_poll(3)
489

AUTHORS

491       OpenFabrics.
492
493
494
495Libfabric Programmer's Manual     2019-12-13                          fi_eq(3)
Impressum