1fi_eq(3)                       Libfabric v1.17.0                      fi_eq(3)
2
3
4

NAME

6       fi_eq - Event queue operations
7
8       fi_eq_open / fi_close
9              Open/close an event queue
10
11       fi_control
12              Control operation of EQ
13
14       fi_eq_read / fi_eq_readerr
15              Read an event from an event queue
16
17       fi_eq_write
18              Writes an event to an event queue
19
20       fi_eq_sread
21              A synchronous (blocking) read of an event queue
22
23       fi_eq_strerror
24              Converts  provider  specific  error information into a printable
25              string
26

SYNOPSIS

28              #include <rdma/fi_domain.h>
29
30              int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr,
31                  struct fid_eq **eq, void *context);
32
33              int fi_close(struct fid *eq);
34
35              int fi_control(struct fid *eq, int command, void *arg);
36
37              ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event,
38                  void *buf, size_t len, uint64_t flags);
39
40              ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf,
41                  uint64_t flags);
42
43              ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event,
44                  const void *buf, size_t len, uint64_t flags);
45
46              ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event,
47                  void *buf, size_t len, int timeout, uint64_t flags);
48
49              const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno,
50                    const void *err_data, char *buf, size_t len);
51

ARGUMENTS

53       fabric Opened fabric descriptor
54
55       eq     Event queue
56
57       attr   Event queue attributes
58
59       context
60              User specified context associated with the event queue.
61
62       event  Reported event
63
64       buf    For read calls, the data buffer to write events into.  For write
65              calls,  an event to insert into the event queue.  For fi_eq_str‐
66              error, an optional buffer that receives printable error informa‐
67              tion.
68
69       len    Length of data buffer
70
71       flags  Additional flags to apply to the operation
72
73       command
74              Command of control operation to perform on EQ.
75
76       arg    Optional control argument
77
78       prov_errno
79              Provider specific error value
80
81       err_data
82              Provider specific error data related to a completion
83
84       timeout
85              Timeout specified in milliseconds
86

DESCRIPTION

88       Event  queues  are used to report events associated with control opera‐
89       tions.  They are associated with memory registration, address  vectors,
90       connection  management,  and  fabric and domain level events.  Reported
91       events are either associated with a requested operation  or  affiliated
92       with  a  call that registers for specific types of events, such as lis‐
93       tening for connection requests.
94
95   fi_eq_open
96       fi_eq_open allocates a new event queue.
97
98       The properties and behavior of an event queue  are  defined  by  struct
99       fi_eq_attr.
100
101              struct fi_eq_attr {
102                  size_t               size;      /* # entries for EQ */
103                  uint64_t             flags;     /* operation flags */
104                  enum fi_wait_obj     wait_obj;  /* requested wait object */
105                  int                  signaling_vector; /* interrupt affinity */
106                  struct fid_wait     *wait_set;  /* optional wait set */
107              };
108
109       size   Specifies the minimum size of an event queue.
110
111       flags  Flags that control the configuration of the EQ.
112
113       - FI_WRITE
114              Indicates  that  the  application requires support for inserting
115              user events into  the  EQ.   If  this  flag  is  set,  then  the
116              fi_eq_write operation must be supported by the provider.  If the
117              FI_WRITE flag is not set, then the application  may  not  invoke
118              fi_eq_write.
119
120       - FI_AFFINITY
121              Indicates that the signaling_vector field (see below) is valid.
122
123       wait_obj
124              EQ’s  may  be  associated with a specific wait object.  Wait ob‐
125              jects allow applications to block until the wait object is  sig‐
126              naled,  indicating that an event is available to be read.  Users
127              may use fi_control to retrieve the underlying wait object  asso‐
128              ciated  with  an  EQ,  in order to use it in other system calls.
129              The following values may be used to specify the type of wait ob‐
130              ject associated with an EQ:
131
132       - FI_WAIT_NONE
133              Used  to indicate that the user will not block (wait) for events
134              on the EQ.  When FI_WAIT_NONE is specified, the application  may
135              not  call fi_eq_sread.  This is the default is no wait object is
136              specified.
137
138       - FI_WAIT_UNSPEC
139              Specifies that the user will only wait on the  EQ  using  fabric
140              interface  calls, such as fi_eq_sread.  In this case, the under‐
141              lying provider may select the most appropriate or  highest  per‐
142              forming wait object available, including custom wait mechanisms.
143              Applications that select FI_WAIT_UNSPEC are  not  guaranteed  to
144              retrieve the underlying wait object.
145
146       - FI_WAIT_SET
147              Indicates  that  the event queue should use a wait set object to
148              wait for events.  If specified, the wait_set field  must  refer‐
149              ence an existing wait set object.
150
151       - FI_WAIT_FD
152              Indicates  that  the EQ should use a file descriptor as its wait
153              mechanism.  A file descriptor wait object must be usable in  se‐
154              lect,  poll, and epoll routines.  However, a provider may signal
155              an FD wait object by marking it as readable or with an error.
156
157       - FI_WAIT_MUTEX_COND
158              Specifies that the EQ should use a pthread mutex and cond  vari‐
159              able as a wait object.
160
161       - FI_WAIT_YIELD
162              Indicates  that  the  EQ will wait without a wait object but in‐
163              stead yield on every wait.  Allows usage of fi_eq_sread  through
164              a spin.
165
166       signaling_vector
167              If  the  FI_AFFINITY flag is set, this indicates the logical cpu
168              number (0..max cpu - 1) that interrupts associated with  the  EQ
169              should  target.   This  field should be treated as a hint to the
170              provider and may be ignored if the provider does not support in‐
171              terrupt affinity.
172
173       wait_set
174              If  wait_obj is FI_WAIT_SET, this field references a wait object
175              to which the event queue should attach.  When an  event  is  in‐
176              serted  into the event queue, the corresponding wait set will be
177              signaled if all necessary conditions are  met.   The  use  of  a
178              wait_set  enables  an  optimized  method  of  waiting for events
179              across multiple event queues.  This field is ignored if wait_obj
180              is not FI_WAIT_SET.
181
182   fi_close
183       The  fi_close  call  releases  all  resources  associated with an event
184       queue.  Any events which remain on the EQ when it is closed are lost.
185
186       The EQ must not be bound to any other objects prior  to  being  closed,
187       otherwise the call will return -FI_EBUSY.
188
189   fi_control
190       The  fi_control  call is used to access provider or implementation spe‐
191       cific details of the event queue.  Access to the EQ should  be  serial‐
192       ized  across  all  calls when fi_control is invoked, as it may redirect
193       the implementation of EQ operations.  The  following  control  commands
194       are usable with an EQ.
195
196       FI_GETWAIT (void **)
197              This  command allows the user to retrieve the low-level wait ob‐
198              ject associated with the EQ.  The format of the  wait-object  is
199              specified  during  EQ  creation, through the EQ attributes.  The
200              fi_control arg parameter should be an address where a pointer to
201              the  returned  wait  object  will be written.  This should be an
202              ’int  *’  for  FI_WAIT_FD,   or   `struct   fi_mutex_cond'   for
203              FI_WAIT_MUTEX_COND.
204
205              struct fi_mutex_cond {
206                  pthread_mutex_t     *mutex;
207                  pthread_cond_t      *cond;
208              };
209
210   fi_eq_read
211       The  fi_eq_read  operations  performs a non-blocking read of event data
212       from the EQ.  The format of the event data is  based  on  the  type  of
213       event  retrieved  from  the  EQ, with all events starting with a struct
214       fi_eq_entry header.  At most one event will be returned per EQ read op‐
215       eration.  The number of bytes successfully read from the EQ is returned
216       from the read.  The FI_PEEK flag may be used to indicate that event da‐
217       ta  should  be  read  from the EQ without being consumed.  A subsequent
218       read without the FI_PEEK flag would then remove the event from the EQ.
219
220       The following types of events may be reported to an EQ, along with  in‐
221       formation regarding the format associated with each event.
222
223       Asynchronous Control Operations
224              Asynchronous  control  operations are basic requests that simply
225              need to generate an event to indicate that they have  completed.
226              These  include  the  following types of events: memory registra‐
227              tion, address vector resolution, and multicast joins.
228
229       Control  requests  report  their  completion  by  inserting  a   struct
230       fi_eq_entry into the EQ.  The format of this structure is:
231
232              struct fi_eq_entry {
233                  fid_t            fid;        /* fid associated with request */
234                  void            *context;    /* operation context */
235                  uint64_t         data;       /* completion-specific data */
236              };
237
238       For  the  completion  of basic asynchronous control operations, the re‐
239       turned event will indicate the operation that has  completed,  and  the
240       fid  will  reference  the  fabric descriptor associated with the event.
241       For memory registration, this will be an FI_MR_COMPLETE event  and  the
242       fid_mr.   Address resolution will reference an FI_AV_COMPLETE event and
243       fid_av.  Multicast joins will report an  FI_JOIN_COMPLETE  and  fid_mc.
244       The  context  field will be set to the context specified as part of the
245       operation, if available, otherwise the context will be associated  with
246       the  fabric descriptor.  The data field will be set as described in the
247       man page for the corresponding object type (e.g., see  fi_av(3)  for  a
248       description  of how asynchronous address vector insertions are complet‐
249       ed).
250
251       Connection Notification
252              Connection notifications are connection management notifications
253              used to setup or tear down connections between endpoints.  There
254              are three connection notification  events:  FI_CONNREQ,  FI_CON‐
255              NECTED,  and FI_SHUTDOWN.  Connection notifications are reported
256              using struct   fi_eq_cm_entry:
257
258              struct fi_eq_cm_entry {
259                  fid_t            fid;        /* fid associated with request */
260                  struct fi_info  *info;       /* endpoint information */
261                  uint8_t         data[];     /* app connection data */
262              };
263
264       A connection request (FI_CONNREQ) event indicates that  a  remote  end‐
265       point  wishes to establish a new connection to a listening, or passive,
266       endpoint.  The fid is the passive endpoint.  Information regarding  the
267       requested,  active endpoint’s capabilities and attributes are available
268       from the info field.  The application is responsible for  freeing  this
269       structure  by  calling  fi_freeinfo  when  it is no longer needed.  The
270       fi_info connreq field will reference the connection request  associated
271       with  this  event.   To  accept a connection, an endpoint must first be
272       created by passing an fi_info structure referencing this connreq  field
273       to  fi_endpoint().  This endpoint is then passed to fi_accept() to com‐
274       plete the acceptance of the connection attempt.  Creating the  endpoint
275       is  most easily accomplished by passing the fi_info returned as part of
276       the CM event into fi_endpoint().  If the connection is to be  rejected,
277       the connreq is passed to fi_reject().
278
279       Any  application  data  exchanged  as part of the connection request is
280       placed beyond the fi_eq_cm_entry structure.  The amount of data  avail‐
281       able  is application dependent and limited to the buffer space provided
282       by the application when fi_eq_read is called.  The amount  of  returned
283       data may be calculated using the return value to fi_eq_read.  Note that
284       the amount of returned data is limited  by  the  underlying  connection
285       protocol, and the length of any data returned may include protocol pad‐
286       ding.  As a result, the returned length may be larger than that  speci‐
287       fied by the connecting peer.
288
289       If  a  connection request has been accepted, an FI_CONNECTED event will
290       be generated on both sides of the connection.  The active  side  –  one
291       that called fi_connect() – may receive user data as part of the FI_CON‐
292       NECTED event.  The user data is passed to the connection manager on the
293       passive  side  through  the  fi_accept call.  User data is not provided
294       with an FI_CONNECTED event on the listening side of the connection.
295
296       Notification that a remote peer has disconnected from  an  active  end‐
297       point is done through the FI_SHUTDOWN event.  Shutdown notification us‐
298       es struct fi_eq_cm_entry as declared above.  The fid field for a  shut‐
299       down notification refers to the active endpoint’s fid_ep.
300
301       Asynchronous Error Notification
302              Asynchronous  errors are used to report problems with fabric re‐
303              sources.  Reported errors may be fatal or  transient,  based  on
304              the  error,  and result in the resource becoming disabled.  Dis‐
305              abled resources will fail operations submitted against them  un‐
306              til they are explicitly re-enabled by the application.
307
308       Asynchronous errors may be reported for completion queues and endpoints
309       of all types.  CQ errors can result when resource management  has  been
310       disabled,  and the provider has detected a queue overrun.  Endpoint er‐
311       rors may be result of numerous actions, but are often associated with a
312       failed  operation.  Operations may fail because of buffer overruns, in‐
313       valid permissions, incorrect memory access keys, network routing  fail‐
314       ures, network reach-ability issues, etc.
315
316       Asynchronous  errors  are reported using struct fi_eq_err_entry, as de‐
317       fined below.  The fabric descriptor (fid) associated with the error  is
318       provided as part of the error data.  An error code is also available to
319       determine the cause of the error.
320
321   fi_eq_sread
322       The fi_eq_sread call is the blocking  (or  synchronous)  equivalent  to
323       fi_eq_read.   It  behaves is similar to the non-blocking call, with the
324       exception that the calls will not return until either an event has been
325       read  from the EQ or an error or timeout occurs.  Specifying a negative
326       timeout means an infinite timeout.
327
328       Threads blocking in this function will return to the caller if they are
329       signaled by some external source.  This is true even if the timeout has
330       not occurred or was specified as infinite.
331
332       It is invalid for applications to call this function if the EQ has been
333       configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
334
335   fi_eq_readerr
336       The read error function, fi_eq_readerr, retrieves information regarding
337       any asynchronous operation which has completed with an  unexpected  er‐
338       ror.   fi_eq_readerr  is  a  non-blocking  call,  returning immediately
339       whether an error completion was found or not.
340
341       EQs are optimized to report operations which have completed successful‐
342       ly.  Operations which fail are reported `out of band'.  Such operations
343       are retrieved using the fi_eq_readerr function.  When an operation that
344       completes with an unexpected error is inserted into an EQ, it is placed
345       into a temporary error queue.  Attempting to read from an EQ  while  an
346       item  is  in the error queue results in an FI_EAVAIL failure.  Applica‐
347       tions may use this return code to determine when to call fi_eq_readerr.
348
349       Error information is reported to the user through struct  fi_eq_err_en‐
350       try.  The format of this structure is defined below.
351
352              struct fi_eq_err_entry {
353                  fid_t            fid;        /* fid associated with error */
354                  void            *context;    /* operation context */
355                  uint64_t         data;       /* completion-specific data */
356                  int              err;        /* positive error code */
357                  int              prov_errno; /* provider error code */
358                  void            *err_data;   /* additional error data */
359                  size_t           err_data_size; /* size of err_data */
360              };
361
362       The fid will reference the fabric descriptor associated with the event.
363       For memory registration, this will be the  fid_mr,  address  resolution
364       will  reference  a  fid_av,  and CM events will refer to a fid_ep.  The
365       context field will be set to the context specified as part of the oper‐
366       ation.
367
368       The  data field will be set as described in the man page for the corre‐
369       sponding object type (e.g., see fi_av(3) for a description of how asyn‐
370       chronous address vector insertions are completed).
371
372       The  general  reason  for  the error is provided through the err field.
373       Provider or operational specific error information may also  be  avail‐
374       able  through  the  prov_errno  and  err_data  fields.   Users may call
375       fi_eq_strerror to convert provider specific error  information  into  a
376       printable string for debugging purposes.
377
378       On  input,  err_data_size  indicates the size of the err_data buffer in
379       bytes.  On output, err_data_size will be set to  the  number  of  bytes
380       copied  to  the err_data buffer.  The err_data information is typically
381       used with fi_eq_strerror to provide details about  the  type  of  error
382       that occurred.
383
384       For compatibility purposes, if err_data_size is 0 on input, or the fab‐
385       ric was opened with release < 1.5, err_data will be set to a data  buf‐
386       fer  owned  by  the  provider.   The contents of the buffer will remain
387       valid until a subsequent read call against the EQ.   Applications  must
388       serialize  access  to  the EQ when processing errors to ensure that the
389       buffer referenced by err_data does not change.
390

EVENT FIELDS

392       The EQ entry data structures share many of the same fields.  The  mean‐
393       ings are the same or similar for all EQ structure formats, with specif‐
394       ic details described below.
395
396       fid    This corresponds to the fabric descriptor  associated  with  the
397              event.   The  type  of  fid depends on the event being reported.
398              For FI_CONNREQ this will be the fid  of  the  passive  endpoint.
399              FI_CONNECTED and FI_SHUTDOWN will reference the active endpoint.
400              FI_MR_COMPLETE and FI_AV_COMPLETE will refer to  the  MR  or  AV
401              fabric descriptor, respectively.  FI_JOIN_COMPLETE will point to
402              the multicast descriptor returned as part of the join operation.
403              Applications  can use fid->context value to retrieve the context
404              associated with the fabric descriptor.
405
406       context
407              The context value is set to the context parameter specified with
408              the operation that generated the event.  If no context parameter
409              is associated with the operation, this field will be NULL.
410
411       data   Data is an operation specific value or set of bytes.   For  con‐
412              nection  events,  data  is application data exchanged as part of
413              the connection protocol.
414
415       err    This err code is a positive  fabric  errno  associated  with  an
416              event.  The err value indicates the general reason for an error,
417              if one occurred.  See fi_errno.3 for a list  of  possible  error
418              codes.
419
420       prov_errno
421              On  an  error,  prov_errno may contain a provider specific error
422              code.  The use of this field and its meaning is provider specif‐
423              ic.   It  is  intended  to  be  used  as  a  debugging aid.  See
424              fi_eq_strerror for additional details on converting  this  error
425              value into a human readable string.
426
427       err_data
428              On  an  error, err_data may reference a provider specific amount
429              of data associated with an error.  The use of this field and its
430              meaning  is  provider  specific.  It is intended to be used as a
431              debugging aid.  See fi_eq_strerror  for  additional  details  on
432              converting this error data into a human readable string.
433
434       err_data_size
435              On  input, err_data_size indicates the size of the err_data buf‐
436              fer in bytes.  On output, err_data_size will be set to the  num‐
437              ber of bytes copied to the err_data buffer.  The err_data infor‐
438              mation is typically used with fi_eq_strerror to provide  details
439              about the type of error that occurred.
440
441       For compatibility purposes, if err_data_size is 0 on input, or the fab‐
442       ric was opened with release < 1.5, err_data will be set to a data  buf‐
443       fer  owned  by  the  provider.   The contents of the buffer will remain
444       valid until a subsequent read call against the EQ.   Applications  must
445       serialize  access  to  the EQ when processing errors to ensure that the
446       buffer referenced by err_data does no change.
447

NOTES

449       If an event queue has been overrun, it will be placed into an `overrun'
450       state.    Write  operations  against  an  overrun  EQ  will  fail  with
451       -FI_EOVERRUN.  Read operations will continue to return any valid,  non-
452       corrupted  events,  if available.  After all valid events have been re‐
453       trieved, any attempt to read the EQ will  result  in  it  returning  an
454       FI_EOVERRUN error event.  Overrun event queues are considered fatal and
455       may not be used to report additional events once the overrun occurs.
456

RETURN VALUES

458       fi_eq_open
459              Returns 0 on success.  On error, a negative value  corresponding
460              to fabric errno is returned.
461
462       fi_eq_read / fi_eq_readerr
463              On  success,  returns  the  number  of bytes read from the event
464              queue.  On error, a negative value corresponding to fabric errno
465              is  returned.  If no data is available to be read from the event
466              queue, -FI_EAGAIN is returned.
467
468       fi_eq_sread
469              On success, returns the number of  bytes  read  from  the  event
470              queue.  On error, a negative value corresponding to fabric errno
471              is returned.  If the timeout expires or the  calling  thread  is
472              signaled  and  no  data  is  available to be read from the event
473              queue, -FI_EAGAIN is returned.
474
475       fi_eq_write
476              On success, returns the number of bytes  written  to  the  event
477              queue.  On error, a negative value corresponding to fabric errno
478              is returned.
479
480       fi_eq_strerror
481              Returns a character string interpretation of the  provider  spe‐
482              cific error returned with a completion.
483
484       Fabric errno values are defined in rdma/fi_errno.h.
485

SEE ALSO

487       fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cntr(3), fi_poll(3)
488

AUTHORS

490       OpenFabrics.
491
492
493
494Libfabric Programmer’s Manual     2022-12-11                          fi_eq(3)
Impressum