1fi_eq(3)                       Libfabric v1.6.1                       fi_eq(3)
2
3
4

NAME

6       fi_eq - Event queue operations
7
8       fi_eq_open / fi_close : Open/close an event queue
9
10       fi_control : Control operation of EQ
11
12       fi_eq_read / fi_eq_readerr : Read an event from an event queue
13
14       fi_eq_write : Writes an event to an event queue
15
16       fi_eq_sread : A synchronous (blocking) read of an event queue
17
18       fi_eq_strerror  :  Converts  provider specific error information into a
19       printable string
20

SYNOPSIS

22              #include <rdma/fi_domain.h>
23
24              int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr,
25                  struct fid_eq **eq, void *context);
26
27              int fi_close(struct fid *eq);
28
29              int fi_control(struct fid *eq, int command, void *arg);
30
31              ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event,
32                  void *buf, size_t len, uint64_t flags);
33
34              ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf,
35                  uint64_t flags);
36
37              ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event,
38                  const void *buf, size_t len, uint64_t flags);
39
40              ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event,
41                  void *buf, size_t len, int timeout, uint64_t flags);
42
43              const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno,
44                    const void *err_data, char *buf, size_t len);
45

ARGUMENTS

47       fabric : Opened fabric descriptor
48
49       eq : Event queue
50
51       attr : Event queue attributes
52
53       context : User specified context associated with the event queue.
54
55       event : Reported event
56
57       buf : For read calls, the data buffer to write events into.  For  write
58       calls, an event to insert into the event queue.  For fi_eq_strerror, an
59       optional buffer that receives printable error information.
60
61       len : Length of data buffer
62
63       flags : Additional flags to apply to the operation
64
65       command : Command of control operation to perform on EQ.
66
67       arg : Optional control argument
68
69       prov_errno : Provider specific error value
70
71       err_data : Provider specific error data related to a completion
72
73       timeout : Timeout specified in milliseconds
74

DESCRIPTION

76       Event queues are used to report events associated with  control  opera‐
77       tions.   They are associated with memory registration, address vectors,
78       connection management, and fabric and domain  level  events.   Reported
79       events  are  either associated with a requested operation or affiliated
80       with a call that registers for specific types of events, such  as  lis‐
81       tening for connection requests.
82
83   fi_eq_open
84       fi_eq_open allocates a new event queue.
85
86       The   properties  and  behavior  of  an  event  queue  are  defined  by
87       struct fi_eq_attr.
88
89              struct fi_eq_attr {
90                  size_t               size;      /* # entries for EQ */
91                  uint64_t             flags;     /* operation flags */
92                  enum fi_wait_obj     wait_obj;  /* requested wait object */
93                  int                  signaling_vector; /* interrupt affinity */
94                  struct fid_wait     *wait_set;  /* optional wait set */
95              };
96
97       size : Specifies the minimum size of an event queue.
98
99       flags : Flags that control the configuration of the EQ.
100
101       · FI_WRITE :  Indicates  that  the  application  requires  support  for
102         inserting  user  events  into  the EQ.  If this flag is set, then the
103         fi_eq_write operation must be supported  by  the  provider.   If  the
104         FI_WRITE  flag  is  not  set,  then  the  application  may not invoke
105         fi_eq_write.
106
107       · FI_AFFINITY : Indicates that the signaling_vector field  (see  below)
108         is valid.
109
110       wait_obj  :  EQ's  may be associated with a specific wait object.  Wait
111       objects allow applications to block until the wait object is  signaled,
112       indicating  that  an  event  is  available  to  be read.  Users may use
113       fi_control to retrieve the underlying wait object  associated  with  an
114       EQ, in order to use it in other system calls.  The following values may
115       be used to specify the type of wait object associated with an EQ:
116
117       · FI_WAIT_NONE : Used to indicate that the user will not  block  (wait)
118         for  events  on the EQ.  When FI_WAIT_NONE is specified, the applica‐
119         tion may not call fi_eq_sread.  This is the default is no wait object
120         is specified.
121
122       · FI_WAIT_UNSPEC  :  Specifies  that  the user will only wait on the EQ
123         using fabric interface calls, such as fi_eq_sread.  In this case, the
124         underlying  provider  may select the most appropriate or highest per‐
125         forming wait object  available,  including  custom  wait  mechanisms.
126         Applications   that  select  FI_WAIT_UNSPEC  are  not  guaranteed  to
127         retrieve the underlying wait object.
128
129       · FI_WAIT_SET : Indicates that the event queue should use  a  wait  set
130         object  to  wait  for  events.  If specified, the wait_set field must
131         reference an existing wait set object.
132
133       · FI_WAIT_FD : Indicates that the EQ should use a  file  descriptor  as
134         its  wait mechanism.  A file descriptor wait object must be usable in
135         select, poll, and epoll routines.  However, a provider may signal  an
136         FD wait object by marking it as readable or with an error.
137
138       · FI_WAIT_MUTEX_COND : Specifies that the EQ should use a pthread mutex
139         and cond variable as a wait object.
140
141       · FI_WAIT_CRITSEC_COND :  Windows  specific.   Specifies  that  the  EQ
142         should  use  a  critical  section  and  condition  variable as a wait
143         object.
144
145       signaling_vector : If the FI_AFFINITY flag is set, this  indicates  the
146       logical cpu number (0..max cpu - 1) that interrupts associated with the
147       EQ should target.  This field should  be  treated  as  a  hint  to  the
148       provider  and may be ignored if the provider does not support interrupt
149       affinity.
150
151       wait_set : If wait_obj is FI_WAIT_SET, this  field  references  a  wait
152       object  to  which  the  event  queue  should  attach.  When an event is
153       inserted into the event queue, the corresponding wait set will be  sig‐
154       naled  if  all  necessary  conditions  are  met.  The use of a wait_set
155       enables an optimized method of waiting for events across multiple event
156       queues.  This field is ignored if wait_obj is not FI_WAIT_SET.
157
158   fi_close
159       The  fi_close  call  releases  all  resources  associated with an event
160       queue.  Any events which remain on the EQ when it is closed are lost.
161
162       The EQ must not be bound to any other objects prior  to  being  closed,
163       otherwise the call will return -FI_EBUSY.
164
165   fi_control
166       The  fi_control  call is used to access provider or implementation spe‐
167       cific details of the event queue.  Access to the EQ should  be  serial‐
168       ized  across  all  calls when fi_control is invoked, as it may redirect
169       the implementation of EQ operations.  The  following  control  commands
170       are usable with an EQ.
171
172       FI_GETWAIT  (void  **)  :  This command allows the user to retrieve the
173       low-level wait object associated  with  the  EQ.   The  format  of  the
174       wait-object is specified during EQ creation, through the EQ attributes.
175       The fi_control arg parameter should be an address where  a  pointer  to
176       the  returned  wait  object will be written.  This should be an 'int *'
177       for FI_WAIT_FD, or 'struct fi_mutex_cond' for FI_WAIT_MUTEX_COND.
178
179              struct fi_mutex_cond {
180                  pthread_mutex_t     *mutex;
181                  pthread_cond_t      *cond;
182              };
183
184   fi_eq_read
185       The fi_eq_read operations performs a non-blocking read  of  event  data
186       from  the  EQ.   The  format  of the event data is based on the type of
187       event retrieved from the EQ, with all events  starting  with  a  struct
188       fi_eq_entry  header.   At  most  one event will be returned per EQ read
189       operation.  The number of  bytes  successfully  read  from  the  EQ  is
190       returned  from the read.  The FI_PEEK flag may be used to indicate that
191       event data should be read from the EQ without being consumed.  A subse‐
192       quent  read  without  the FI_PEEK flag would then remove the event from
193       the EQ.
194
195       The following types of events may be reported  to  an  EQ,  along  with
196       information regarding the format associated with each event.
197
198       Asynchronous  Control  Operations : Asynchronous control operations are
199       basic requests that simply need to generate an event to  indicate  that
200       they have completed.  These include the following types of events: mem‐
201       ory registration, address vector resolution, and multicast joins.
202
203       Control   requests   report   their   completion   by    inserting    a
204       struct   fi_eq_entry into the EQ.  The format of this structure is:
205
206              struct fi_eq_entry {
207                  fid_t            fid;        /* fid associated with request */
208                  void            *context;    /* operation context */
209                  uint64_t         data;       /* completion-specific data */
210              };
211
212       For  the  completion  of  basic  asynchronous  control  operations, the
213       returned event will indicate the operation that has completed, and  the
214       fid  will  reference  the  fabric descriptor associated with the event.
215       For memory registration, this will be an FI_MR_COMPLETE event  and  the
216       fid_mr.   Address resolution will reference an FI_AV_COMPLETE event and
217       fid_av.  Multicast joins will report an  FI_JOIN_COMPLETE  and  fid_mc.
218       The  context  field will be set to the context specified as part of the
219       operation, if available, otherwise the context will be associated  with
220       the  fabric descriptor.  The data field will be set as described in the
221       man page for the corresponding object type (e.g., see  fi_av(3)  for  a
222       description  of  how  asynchronous  address  vector insertions are com‐
223       pleted).
224
225       Connection Notification : Connection notifications are connection  man‐
226       agement  notifications  used  to setup or tear down connections between
227       endpoints.  There are three connection notification events: FI_CONNREQ,
228       FI_CONNECTED,  and  FI_SHUTDOWN.  Connection notifications are reported
229       using struct   fi_eq_cm_entry:
230
231              struct fi_eq_cm_entry {
232                  fid_t            fid;        /* fid associated with request */
233                  struct fi_info  *info;       /* endpoint information */
234                  uint8_t         data[];     /* app connection data */
235              };
236
237       A connection request (FI_CONNREQ) event indicates that  a  remote  end‐
238       point  wishes to establish a new connection to a listening, or passive,
239       endpoint.  The fid is the passive endpoint.  Information regarding  the
240       requested,  active endpoint's capabilities and attributes are available
241       from the info field.  The application is responsible for  freeing  this
242       structure  by  calling  fi_freeinfo  when  it is no longer needed.  The
243       fi_info connreq field will reference the connection request  associated
244       with  this  event.   To  accept a connection, an endpoint must first be
245       created by passing an fi_info structure referencing this connreq  field
246       to  fi_endpoint().  This endpoint is then passed to fi_accept() to com‐
247       plete the acceptance of the connection attempt.  Creating the  endpoint
248       is  most easily accomplished by passing the fi_info returned as part of
249       the CM event into fi_endpoint().  If the connection is to be  rejected,
250       the connreq is passed to fi_reject().
251
252       Any  application  data  exchanged  as part of the connection request is
253       placed beyond the fi_eq_cm_entry structure.  The amount of data  avail‐
254       able  is application dependent and limited to the buffer space provided
255       by the application when fi_eq_read is called.  The amount  of  returned
256       data may be calculated using the return value to fi_eq_read.  Note that
257       the amount of returned data is limited  by  the  underlying  connection
258       protocol, and the length of any data returned may include protocol pad‐
259       ding.  As a result, the returned length may be larger than that  speci‐
260       fied by the connecting peer.
261
262       If  a  connection request has been accepted, an FI_CONNECTED event will
263       be generated on both sides of the connection.  The active side  --  one
264       that  called  fi_connect()  --  may  receive  user  data as part of the
265       FI_CONNECTED event.  The user data is passed to the connection  manager
266       on  the passive side through the fi_accept call.  User data is not pro‐
267       vided with an FI_CONNECTED event on the listening side of  the  connec‐
268       tion.
269
270       Notification  that  a  remote peer has disconnected from an active end‐
271       point is done through the  FI_SHUTDOWN  event.   Shutdown  notification
272       uses  struct  fi_eq_cm_entry  as  declared  above.  The fid field for a
273       shutdown notification refers to the active endpoint's fid_ep.
274
275       Asynchronous Error Notification  :  Asynchronous  errors  are  used  to
276       report problems with fabric resources.  Reported errors may be fatal or
277       transient, based on the error, and result in the resource becoming dis‐
278       abled.   Disabled resources will fail operations submitted against them
279       until they are explicitly re-enabled by the application.
280
281       Asynchronous errors may be reported for completion queues and endpoints
282       of  all  types.  CQ errors can result when resource management has been
283       disabled, and the provider has  detected  a  queue  overrun.   Endpoint
284       errors may be result of numerous actions, but are often associated with
285       a failed operation.  Operations may fail because  of  buffer  overruns,
286       invalid  permissions,  incorrect  memory  access  keys, network routing
287       failures, network reach-ability issues, etc.
288
289       Asynchronous errors  are  reported  using  struct  fi_eq_err_entry,  as
290       defined  below.   The fabric descriptor (fid) associated with the error
291       is provided as part of the error data.  An error code is also available
292       to determine the cause of the error.
293
294   fi_eq_sread
295       The  fi_eq_sread  call  is  the blocking (or synchronous) equivalent to
296       fi_eq_read.  It behaves is similar to the non-blocking call,  with  the
297       exception that the calls will not return until either an event has been
298       read from the EQ or an error or timeout occurs.  Specifying a  negative
299       timeout means an infinite timeout.
300
301       It is invalid for applications to call this function if the EQ has been
302       configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
303
304   fi_eq_readerr
305       The read error function, fi_eq_readerr, retrieves information regarding
306       any  asynchronous  operation  which  has  completed  with an unexpected
307       error.  fi_eq_readerr is a  non-blocking  call,  returning  immediately
308       whether an error completion was found or not.
309
310       EQs  are  optimized  to report operations which have completed success‐
311       fully.  Operations which fail are reported 'out of band'.  Such  opera‐
312       tions  are  retrieved using the fi_eq_readerr function.  When an opera‐
313       tion that completes with an unexpected error is inserted into an EQ, it
314       is  placed into a temporary error queue.  Attempting to read from an EQ
315       while an item is in the error queue results in  an  FI_EAVAIL  failure.
316       Applications  may  use  this  return  code  to  determine  when to call
317       fi_eq_readerr.
318
319       Error  information   is   reported   to   the   user   through   struct
320       fi_eq_err_entry.  The format of this structure is defined below.
321
322              struct fi_eq_err_entry {
323                  fid_t            fid;        /* fid associated with error */
324                  void            *context;    /* operation context */
325                  uint64_t         data;       /* completion-specific data */
326                  int              err;        /* positive error code */
327                  int              prov_errno; /* provider error code */
328                  void            *err_data;   /* additional error data */
329                  size_t           err_data_size; /* size of err_data */
330              };
331
332       The fid will reference the fabric descriptor associated with the event.
333       For memory registration, this will be the  fid_mr,  address  resolution
334       will  reference  a  fid_av,  and CM events will refer to a fid_ep.  The
335       context field will be set to the context specified as part of the oper‐
336       ation.
337
338       The  data field will be set as described in the man page for the corre‐
339       sponding object type (e.g., see fi_av(3) for a description of how asyn‐
340       chronous address vector insertions are completed).
341
342       The  general  reason  for  the error is provided through the err field.
343       Provider or operational specific error information may also  be  avail‐
344       able  through  the  prov_errno  and  err_data  fields.   Users may call
345       fi_eq_strerror to convert provider specific error  information  into  a
346       printable string for debugging purposes.
347
348       On  input,  err_data_size  indicates the size of the err_data buffer in
349       bytes.  On output, err_data_size will be set to  the  number  of  bytes
350       copied  to  the err_data buffer.  The err_data information is typically
351       used with fi_eq_strerror to provide details about  the  type  of  error
352       that occurred.
353
354       For compatibility purposes, if err_data_size is 0 on input, or the fab‐
355       ric was opened with release < 1.5, err_data will be set to a data  buf‐
356       fer  owned  by  the  provider.   The contents of the buffer will remain
357       valid until a subsequent read call against the EQ.   Applications  must
358       serialize  access  to  the EQ when processing errors to ensure that the
359       buffer referenced by err_data does not change.
360

EVENT FIELDS

362       The EQ entry data structures share many of the same fields.  The  mean‐
363       ings  are  the  same or similar for all EQ structure formats, with spe‐
364       cific details described below.
365
366       fid : This corresponds to the fabric  descriptor  associated  with  the
367       event.   The  type  of  fid  depends  on the event being reported.  For
368       FI_CONNREQ this will be the fid of the passive endpoint.   FI_CONNECTED
369       and FI_SHUTDOWN will reference the active endpoint.  FI_MR_COMPLETE and
370       FI_AV_COMPLETE will refer to the MR or AV  fabric  descriptor,  respec‐
371       tively.   FI_JOIN_COMPLETE  will  point  to  the  multicast  descriptor
372       returned as part of the join operation.  Applications can use fid->con‐
373       text  value to retrieve the context associated with the fabric descrip‐
374       tor.
375
376       context : The context value is set to the context  parameter  specified
377       with  the  operation that generated the event.  If no context parameter
378       is associated with the operation, this field will be NULL.
379
380       data : Data is an operation specific value or set of bytes.   For  con‐
381       nection  events, data is application data exchanged as part of the con‐
382       nection protocol.
383
384       err : This err code is a  positive  fabric  errno  associated  with  an
385       event.  The err value indicates the general reason for an error, if one
386       occurred.  See fi_errno.3 for a list of possible error codes.
387
388       prov_errno : On an error, prov_errno may contain  a  provider  specific
389       error  code.   The  use  of this field and its meaning is provider spe‐
390       cific.  It is intended to be used as a debugging aid.   See  fi_eq_str‐
391       error  for  additional  details  on  converting this error value into a
392       human readable string.
393
394       err_data : On an error, err_data  may  reference  a  provider  specific
395       amount of data associated with an error.  The use of this field and its
396       meaning is provider specific.  It is intended to be used as a debugging
397       aid.   See  fi_eq_strerror  for  additional  details on converting this
398       error data into a human readable string.
399
400       err_data_size : On input,  err_data_size  indicates  the  size  of  the
401       err_data  buffer in bytes.  On output, err_data_size will be set to the
402       number of bytes copied to the err_data buffer.  The  err_data  informa‐
403       tion is typically used with fi_eq_strerror to provide details about the
404       type of error that occurred.
405
406       For compatibility purposes, if err_data_size is 0 on input, or the fab‐
407       ric  was opened with release < 1.5, err_data will be set to a data buf‐
408       fer owned by the provider.  The contents  of  the  buffer  will  remain
409       valid  until  a subsequent read call against the EQ.  Applications must
410       serialize access to the EQ when processing errors to  ensure  that  the
411       buffer referenced by err_data does no change.
412

NOTES

414       If an event queue has been overrun, it will be placed into an 'overrun'
415       state.   Write  operations  against  an  overrun  EQ  will  fail   with
416       -FI_EOVERRUN.   Read  operations  will  continue  to  return any valid,
417       non-corrupted events, if available.  After all valid events  have  been
418       retrieved,  any  attempt  to read the EQ will result in it returning an
419       FI_EOVERRUN error event.  Overrun event queues are considered fatal and
420       may not be used to report additional events once the overrun occurs.
421

RETURN VALUES

423       fi_eq_open  :  Returns 0 on success.  On error, a negative value corre‐
424       sponding to fabric errno is returned.
425
426       fi_eq_read / fi_eq_readerr / fi_eq_sread : On success, returns the num‐
427       ber  of  bytes  read  from the event queue.  On error, a negative value
428       corresponding to fabric errno is returned.  If no data is available  to
429       be read from the event queue, -FI_EAGAIN is returned.
430
431       fi_eq_write  :  On  success, returns the number of bytes written to the
432       event queue.  On error, a negative value corresponding to fabric  errno
433       is returned.
434
435       fi_eq_strerror  :  Returns  a  character  string  interpretation of the
436       provider specific error returned with a completion.
437
438       Fabric errno values are defined in rdma/fi_errno.h.
439

SEE ALSO

441       fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cntr(3), fi_poll(3)
442

AUTHORS

444       OpenFabrics.
445
446
447
448Libfabric Programmer's Manual     2017-12-01                          fi_eq(3)
Impressum