1fi_eq(3)                       Libfabric v1.8.0                       fi_eq(3)
2
3
4

NAME

6       fi_eq - Event queue operations
7
8       fi_eq_open / fi_close
9              Open/close an event queue
10
11       fi_control
12              Control operation of EQ
13
14       fi_eq_read / fi_eq_readerr
15              Read an event from an event queue
16
17       fi_eq_write
18              Writes an event to an event queue
19
20       fi_eq_sread
21              A synchronous (blocking) read of an event queue
22
23       fi_eq_strerror
24              Converts  provider  specific  error information into a printable
25              string
26

SYNOPSIS

28              #include <rdma/fi_domain.h>
29
30              int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr,
31                  struct fid_eq **eq, void *context);
32
33              int fi_close(struct fid *eq);
34
35              int fi_control(struct fid *eq, int command, void *arg);
36
37              ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event,
38                  void *buf, size_t len, uint64_t flags);
39
40              ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf,
41                  uint64_t flags);
42
43              ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event,
44                  const void *buf, size_t len, uint64_t flags);
45
46              ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event,
47                  void *buf, size_t len, int timeout, uint64_t flags);
48
49              const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno,
50                    const void *err_data, char *buf, size_t len);
51

ARGUMENTS

53       fabric Opened fabric descriptor
54
55       eq     Event queue
56
57       attr   Event queue attributes
58
59       context
60              User specified context associated with the event queue.
61
62       event  Reported event
63
64       buf    For read calls, the data buffer to write events into.  For write
65              calls,  an event to insert into the event queue.  For fi_eq_str‐
66              error, an optional buffer that receives printable error informa‐
67              tion.
68
69       len    Length of data buffer
70
71       flags  Additional flags to apply to the operation
72
73       command
74              Command of control operation to perform on EQ.
75
76       arg    Optional control argument
77
78       prov_errno
79              Provider specific error value
80
81       err_data
82              Provider specific error data related to a completion
83
84       timeout
85              Timeout specified in milliseconds
86

DESCRIPTION

88       Event  queues  are used to report events associated with control opera‐
89       tions.  They are associated with memory registration, address  vectors,
90       connection  management,  and  fabric and domain level events.  Reported
91       events are either associated with a requested operation  or  affiliated
92       with  a  call that registers for specific types of events, such as lis‐
93       tening for connection requests.
94
95   fi_eq_open
96       fi_eq_open allocates a new event queue.
97
98       The  properties  and  behavior  of  an  event  queue  are  defined   by
99       struct fi_eq_attr.
100
101              struct fi_eq_attr {
102                  size_t               size;      /* # entries for EQ */
103                  uint64_t             flags;     /* operation flags */
104                  enum fi_wait_obj     wait_obj;  /* requested wait object */
105                  int                  signaling_vector; /* interrupt affinity */
106                  struct fid_wait     *wait_set;  /* optional wait set */
107              };
108
109       size   Specifies the minimum size of an event queue.
110
111       flags  Flags that control the configuration of the EQ.
112
113       - FI_WRITE
114              Indicates  that  the  application requires support for inserting
115              user events into  the  EQ.   If  this  flag  is  set,  then  the
116              fi_eq_write operation must be supported by the provider.  If the
117              FI_WRITE flag is not set, then the application  may  not  invoke
118              fi_eq_write.
119
120       - FI_AFFINITY
121              Indicates that the signaling_vector field (see below) is valid.
122
123       wait_obj
124              EQ's  may  be  associated with a specific wait object.  Wait ob‐
125              jects allow applications to block until the wait object is  sig‐
126              naled,  indicating that an event is available to be read.  Users
127              may use fi_control to retrieve the underlying wait object  asso‐
128              ciated  with  an  EQ,  in order to use it in other system calls.
129              The following values may be used to specify the type of wait ob‐
130              ject associated with an EQ:
131
132       - FI_WAIT_NONE
133              Used  to indicate that the user will not block (wait) for events
134              on the EQ.  When FI_WAIT_NONE is specified, the application  may
135              not  call fi_eq_sread.  This is the default is no wait object is
136              specified.
137
138       - FI_WAIT_UNSPEC
139              Specifies that the user will only wait on the  EQ  using  fabric
140              interface  calls, such as fi_eq_sread.  In this case, the under‐
141              lying provider may select the most appropriate or  highest  per‐
142              forming wait object available, including custom wait mechanisms.
143              Applications that select FI_WAIT_UNSPEC are  not  guaranteed  to
144              retrieve the underlying wait object.
145
146       - FI_WAIT_SET
147              Indicates  that  the event queue should use a wait set object to
148              wait for events.  If specified, the wait_set field  must  refer‐
149              ence an existing wait set object.
150
151       - FI_WAIT_FD
152              Indicates  that  the EQ should use a file descriptor as its wait
153              mechanism.  A file descriptor wait object must be usable in  se‐
154              lect,  poll, and epoll routines.  However, a provider may signal
155              an FD wait object by marking it as readable or with an error.
156
157       - FI_WAIT_MUTEX_COND
158              Specifies that the EQ should use a pthread mutex and cond  vari‐
159              able as a wait object.
160
161       - FI_WAIT_CRITSEC_COND
162              Windows  specific.   Specifies that the EQ should use a critical
163              section and condition variable as a wait object.
164
165       signaling_vector
166              If the FI_AFFINITY flag is set, this indicates the  logical  cpu
167              number  (0..max  cpu - 1) that interrupts associated with the EQ
168              should target.  This field should be treated as a  hint  to  the
169              provider and may be ignored if the provider does not support in‐
170              terrupt affinity.
171
172       wait_set
173              If wait_obj is FI_WAIT_SET, this field references a wait  object
174              to  which  the  event queue should attach.  When an event is in‐
175              serted into the event queue, the corresponding wait set will  be
176              signaled  if  all  necessary  conditions  are met.  The use of a
177              wait_set enables an  optimized  method  of  waiting  for  events
178              across multiple event queues.  This field is ignored if wait_obj
179              is not FI_WAIT_SET.
180
181   fi_close
182       The fi_close call releases  all  resources  associated  with  an  event
183       queue.  Any events which remain on the EQ when it is closed are lost.
184
185       The  EQ  must  not be bound to any other objects prior to being closed,
186       otherwise the call will return -FI_EBUSY.
187
188   fi_control
189       The fi_control call is used to access provider or  implementation  spe‐
190       cific  details  of the event queue.  Access to the EQ should be serial‐
191       ized across all calls when fi_control is invoked, as  it  may  redirect
192       the  implementation  of  EQ operations.  The following control commands
193       are usable with an EQ.
194
195       FI_GETWAIT (void **)
196              This command allows the user to retrieve the low-level wait  ob‐
197              ject  associated  with the EQ.  The format of the wait-object is
198              specified during EQ creation, through the  EQ  attributes.   The
199              fi_control arg parameter should be an address where a pointer to
200              the returned wait object will be written.   This  should  be  an
201              'int   *'   for   FI_WAIT_FD,   or  'struct  fi_mutex_cond'  for
202              FI_WAIT_MUTEX_COND.
203
204              struct fi_mutex_cond {
205                  pthread_mutex_t     *mutex;
206                  pthread_cond_t      *cond;
207              };
208
209   fi_eq_read
210       The fi_eq_read operations performs a non-blocking read  of  event  data
211       from  the  EQ.   The  format  of the event data is based on the type of
212       event retrieved from the EQ, with all events  starting  with  a  struct
213       fi_eq_entry header.  At most one event will be returned per EQ read op‐
214       eration.  The number of bytes successfully read from the EQ is returned
215       from the read.  The FI_PEEK flag may be used to indicate that event da‐
216       ta should be read from the EQ without  being  consumed.   A  subsequent
217       read without the FI_PEEK flag would then remove the event from the EQ.
218
219       The  following types of events may be reported to an EQ, along with in‐
220       formation regarding the format associated with each event.
221
222       Asynchronous Control Operations
223              Asynchronous control operations are basic requests  that  simply
224              need  to generate an event to indicate that they have completed.
225              These include the following types of  events:  memory  registra‐
226              tion, address vector resolution, and multicast joins.
227
228       Control    requests    report   their   completion   by   inserting   a
229       struct   fi_eq_entry into the EQ.  The format of this structure is:
230
231              struct fi_eq_entry {
232                  fid_t            fid;        /* fid associated with request */
233                  void            *context;    /* operation context */
234                  uint64_t         data;       /* completion-specific data */
235              };
236
237       For the completion of basic asynchronous control  operations,  the  re‐
238       turned  event  will  indicate the operation that has completed, and the
239       fid will reference the fabric descriptor  associated  with  the  event.
240       For  memory  registration, this will be an FI_MR_COMPLETE event and the
241       fid_mr.  Address resolution will reference an FI_AV_COMPLETE event  and
242       fid_av.   Multicast  joins  will report an FI_JOIN_COMPLETE and fid_mc.
243       The context field will be set to the context specified as part  of  the
244       operation,  if available, otherwise the context will be associated with
245       the fabric descriptor.  The data field will be set as described in  the
246       man  page  for  the corresponding object type (e.g., see fi_av(3) for a
247       description of how asynchronous address vector insertions are  complet‐
248       ed).
249
250       Connection Notification
251              Connection notifications are connection management notifications
252              used to setup or tear down connections between endpoints.  There
253              are  three  connection  notification events: FI_CONNREQ, FI_CON‐
254              NECTED, and FI_SHUTDOWN.  Connection notifications are  reported
255              using struct   fi_eq_cm_entry:
256
257              struct fi_eq_cm_entry {
258                  fid_t            fid;        /* fid associated with request */
259                  struct fi_info  *info;       /* endpoint information */
260                  uint8_t         data[];     /* app connection data */
261              };
262
263       A  connection  request  (FI_CONNREQ) event indicates that a remote end‐
264       point wishes to establish a new connection to a listening, or  passive,
265       endpoint.   The fid is the passive endpoint.  Information regarding the
266       requested, active endpoint's capabilities and attributes are  available
267       from  the  info field.  The application is responsible for freeing this
268       structure by calling fi_freeinfo when it  is  no  longer  needed.   The
269       fi_info  connreq field will reference the connection request associated
270       with this event.  To accept a connection, an  endpoint  must  first  be
271       created  by passing an fi_info structure referencing this connreq field
272       to fi_endpoint().  This endpoint is then passed to fi_accept() to  com‐
273       plete  the acceptance of the connection attempt.  Creating the endpoint
274       is most easily accomplished by passing the fi_info returned as part  of
275       the  CM event into fi_endpoint().  If the connection is to be rejected,
276       the connreq is passed to fi_reject().
277
278       Any application data exchanged as part of  the  connection  request  is
279       placed  beyond the fi_eq_cm_entry structure.  The amount of data avail‐
280       able is application dependent and limited to the buffer space  provided
281       by  the  application when fi_eq_read is called.  The amount of returned
282       data may be calculated using the return value to fi_eq_read.  Note that
283       the  amount  of  returned  data is limited by the underlying connection
284       protocol, and the length of any data returned may include protocol pad‐
285       ding.   As a result, the returned length may be larger than that speci‐
286       fied by the connecting peer.
287
288       If a connection request has been accepted, an FI_CONNECTED  event  will
289       be  generated  on both sides of the connection.  The active side -- one
290       that called fi_connect() -- may  receive  user  data  as  part  of  the
291       FI_CONNECTED  event.  The user data is passed to the connection manager
292       on the passive side through the fi_accept call.  User data is not  pro‐
293       vided  with  an FI_CONNECTED event on the listening side of the connec‐
294       tion.
295
296       Notification that a remote peer has disconnected from  an  active  end‐
297       point is done through the FI_SHUTDOWN event.  Shutdown notification us‐
298       es struct fi_eq_cm_entry as declared above.  The fid field for a  shut‐
299       down notification refers to the active endpoint's fid_ep.
300
301       Asynchronous Error Notification
302              Asynchronous  errors are used to report problems with fabric re‐
303              sources.  Reported errors may be fatal or  transient,  based  on
304              the  error,  and result in the resource becoming disabled.  Dis‐
305              abled resources will fail operations submitted against them  un‐
306              til they are explicitly re-enabled by the application.
307
308       Asynchronous errors may be reported for completion queues and endpoints
309       of all types.  CQ errors can result when resource management  has  been
310       disabled,  and the provider has detected a queue overrun.  Endpoint er‐
311       rors may be result of numerous actions, but are often associated with a
312       failed  operation.  Operations may fail because of buffer overruns, in‐
313       valid permissions, incorrect memory access keys, network routing  fail‐
314       ures, network reach-ability issues, etc.
315
316       Asynchronous  errors  are reported using struct fi_eq_err_entry, as de‐
317       fined below.  The fabric descriptor (fid) associated with the error  is
318       provided as part of the error data.  An error code is also available to
319       determine the cause of the error.
320
321   fi_eq_sread
322       The fi_eq_sread call is the blocking  (or  synchronous)  equivalent  to
323       fi_eq_read.   It  behaves is similar to the non-blocking call, with the
324       exception that the calls will not return until either an event has been
325       read  from the EQ or an error or timeout occurs.  Specifying a negative
326       timeout means an infinite timeout.
327
328       Threads blocking in this function will return to the caller if they are
329       signaled by some external source.  This is true even if the timeout has
330       not occurred or was specified as infinite.
331
332       It is invalid for applications to call this function if the EQ has been
333       configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
334
335   fi_eq_readerr
336       The read error function, fi_eq_readerr, retrieves information regarding
337       any asynchronous operation which has completed with an  unexpected  er‐
338       ror.   fi_eq_readerr  is  a  non-blocking  call,  returning immediately
339       whether an error completion was found or not.
340
341       EQs are optimized to report operations which have completed successful‐
342       ly.  Operations which fail are reported 'out of band'.  Such operations
343       are retrieved using the fi_eq_readerr function.  When an operation that
344       completes with an unexpected error is inserted into an EQ, it is placed
345       into a temporary error queue.  Attempting to read from an EQ  while  an
346       item  is  in the error queue results in an FI_EAVAIL failure.  Applica‐
347       tions may use this return code to determine when to call fi_eq_readerr.
348
349       Error information is reported to the user through struct  fi_eq_err_en‐
350       try.  The format of this structure is defined below.
351
352              struct fi_eq_err_entry {
353                  fid_t            fid;        /* fid associated with error */
354                  void            *context;    /* operation context */
355                  uint64_t         data;       /* completion-specific data */
356                  int              err;        /* positive error code */
357                  int              prov_errno; /* provider error code */
358                  void            *err_data;   /* additional error data */
359                  size_t           err_data_size; /* size of err_data */
360              };
361
362       The fid will reference the fabric descriptor associated with the event.
363       For memory registration, this will be the  fid_mr,  address  resolution
364       will  reference  a  fid_av,  and CM events will refer to a fid_ep.  The
365       context field will be set to the context specified as part of the oper‐
366       ation.
367
368       The  data field will be set as described in the man page for the corre‐
369       sponding object type (e.g., see fi_av(3) for a description of how asyn‐
370       chronous address vector insertions are completed).
371
372       The  general  reason  for  the error is provided through the err field.
373       Provider or operational specific error information may also  be  avail‐
374       able  through  the  prov_errno  and  err_data  fields.   Users may call
375       fi_eq_strerror to convert provider specific error  information  into  a
376       printable string for debugging purposes.
377
378       On  input,  err_data_size  indicates the size of the err_data buffer in
379       bytes.  On output, err_data_size will be set to  the  number  of  bytes
380       copied  to  the err_data buffer.  The err_data information is typically
381       used with fi_eq_strerror to provide details about  the  type  of  error
382       that occurred.
383
384       For compatibility purposes, if err_data_size is 0 on input, or the fab‐
385       ric was opened with release < 1.5, err_data will be set to a data  buf‐
386       fer  owned  by  the  provider.   The contents of the buffer will remain
387       valid until a subsequent read call against the EQ.   Applications  must
388       serialize  access  to  the EQ when processing errors to ensure that the
389       buffer referenced by err_data does not change.
390

EVENT FIELDS

392       The EQ entry data structures share many of the same fields.  The  mean‐
393       ings are the same or similar for all EQ structure formats, with specif‐
394       ic details described below.
395
396       fid    This corresponds to the fabric descriptor  associated  with  the
397              event.   The  type  of  fid depends on the event being reported.
398              For FI_CONNREQ this will be the fid  of  the  passive  endpoint.
399              FI_CONNECTED and FI_SHUTDOWN will reference the active endpoint.
400              FI_MR_COMPLETE and FI_AV_COMPLETE will refer to  the  MR  or  AV
401              fabric descriptor, respectively.  FI_JOIN_COMPLETE will point to
402              the multicast descriptor returned as part of the join operation.
403              Applications  can use fid->context value to retrieve the context
404              associated with the fabric descriptor.
405
406       context
407              The context value is set to the context parameter specified with
408              the operation that generated the event.  If no context parameter
409              is associated with the operation, this field will be NULL.
410
411       data   Data is an operation specific value or set of bytes.   For  con‐
412              nection  events,  data  is application data exchanged as part of
413              the connection protocol.
414
415       err    This err code is a positive  fabric  errno  associated  with  an
416              event.  The err value indicates the general reason for an error,
417              if one occurred.  See fi_errno.3 for a list  of  possible  error
418              codes.
419
420       prov_errno
421              On  an  error,  prov_errno may contain a provider specific error
422              code.  The use of this field and its meaning is provider specif‐
423              ic.   It  is  intended  to  be  used  as  a  debugging aid.  See
424              fi_eq_strerror for additional details on converting  this  error
425              value into a human readable string.
426
427       err_data
428              On  an  error, err_data may reference a provider specific amount
429              of data associated with an error.  The use of this field and its
430              meaning  is  provider  specific.  It is intended to be used as a
431              debugging aid.  See fi_eq_strerror  for  additional  details  on
432              converting this error data into a human readable string.
433
434       err_data_size
435              On  input, err_data_size indicates the size of the err_data buf‐
436              fer in bytes.  On output, err_data_size will be set to the  num‐
437              ber of bytes copied to the err_data buffer.  The err_data infor‐
438              mation is typically used with fi_eq_strerror to provide  details
439              about the type of error that occurred.
440
441       For compatibility purposes, if err_data_size is 0 on input, or the fab‐
442       ric was opened with release < 1.5, err_data will be set to a data  buf‐
443       fer  owned  by  the  provider.   The contents of the buffer will remain
444       valid until a subsequent read call against the EQ.   Applications  must
445       serialize  access  to  the EQ when processing errors to ensure that the
446       buffer referenced by err_data does no change.
447

NOTES

449       If an event queue has been overrun, it will be placed into an 'overrun'
450       state.    Write  operations  against  an  overrun  EQ  will  fail  with
451       -FI_EOVERRUN.  Read operations  will  continue  to  return  any  valid,
452       non-corrupted  events,  if available.  After all valid events have been
453       retrieved, any attempt to read the EQ will result in  it  returning  an
454       FI_EOVERRUN error event.  Overrun event queues are considered fatal and
455       may not be used to report additional events once the overrun occurs.
456

RETURN VALUES

458       fi_eq_open
459              Returns 0 on success.  On error, a negative value  corresponding
460              to fabric errno is returned.
461
462       fi_eq_read / fi_eq_readerr
463              On  success,  returns  the  number  of bytes read from the event
464              queue.  On error, a negative value corresponding to fabric errno
465              is  returned.  If no data is available to be read from the event
466              queue, -FI_EAGAIN is returned.
467
468       fi_eq_sread
469              On success, returns the number of  bytes  read  from  the  event
470              queue.  On error, a negative value corresponding to fabric errno
471              is returned.  If the timeout expires or the  calling  thread  is
472              signaled  and  no  data  is  available to be read from the event
473              queue, -FI_EAGAIN is returned.
474
475       fi_eq_write
476              On success, returns the number of bytes  written  to  the  event
477              queue.  On error, a negative value corresponding to fabric errno
478              is returned.
479
480       fi_eq_strerror
481              Returns a character string interpretation of the  provider  spe‐
482              cific error returned with a completion.
483
484       Fabric errno values are defined in rdma/fi_errno.h.
485

SEE ALSO

487       fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cntr(3), fi_poll(3)
488

AUTHORS

490       OpenFabrics.
491
492
493
494Libfabric Programmer's Manual     2019-02-19                          fi_eq(3)
Impressum