1fi_poll(3)                     Libfabric v1.17.0                    fi_poll(3)
2
3
4

NAME

6       fi_poll - Polling and wait set operations
7
8       fi_poll_open / fi_close
9              Open/close a polling set
10
11       fi_poll_add / fi_poll_del
12              Add/remove a completion queue or counter to/from a poll set.
13
14       fi_poll
15              Poll  for  progress and events across multiple completion queues
16              and counters.
17
18       fi_wait_open / fi_close
19              Open/close a wait set
20
21       fi_wait
22              Waits for one or more wait objects in a set to be signaled.
23
24       fi_trywait
25              Indicate when it is safe to block on wait objects  using  native
26              OS calls.
27
28       fi_control
29              Control wait set operation or attributes.
30

SYNOPSIS

32              #include <rdma/fi_domain.h>
33
34              int fi_poll_open(struct fid_domain *domain, struct fi_poll_attr *attr,
35                  struct fid_poll **pollset);
36
37              int fi_close(struct fid *pollset);
38
39              int fi_poll_add(struct fid_poll *pollset, struct fid *event_fid,
40                  uint64_t flags);
41
42              int fi_poll_del(struct fid_poll *pollset, struct fid *event_fid,
43                  uint64_t flags);
44
45              int fi_poll(struct fid_poll *pollset, void **context, int count);
46
47              int fi_wait_open(struct fid_fabric *fabric, struct fi_wait_attr *attr,
48                  struct fid_wait **waitset);
49
50              int fi_close(struct fid *waitset);
51
52              int fi_wait(struct fid_wait *waitset, int timeout);
53
54              int fi_trywait(struct fid_fabric *fabric, struct fid **fids, size_t count);
55
56              int fi_control(struct fid *waitset, int command, void *arg);
57

ARGUMENTS

59       fabric Fabric provider
60
61       domain Resource domain
62
63       pollset
64              Event poll set
65
66       waitset
67              Wait object set
68
69       attr   Poll or wait set attributes
70
71       context
72              On success, an array of user context values associated with com‐
73              pletion queues or counters.
74
75       fids   An array of fabric descriptors, each one associated with  a  na‐
76              tive wait object.
77
78       count  Number of entries in context or fids array.
79
80       timeout
81              Time to wait for a signal, in milliseconds.
82
83       command
84              Command of control operation to perform on the wait set.
85
86       arg    Optional control argument.
87

DESCRIPTION

89   fi_poll_open
90       fi_poll_open  creates  a  new polling set.  A poll set enables an opti‐
91       mized method for progressing asynchronous  operations  across  multiple
92       completion queues and counters and checking for their completions.
93
94       A poll set is defined with the following attributes.
95
96              struct fi_poll_attr {
97                  uint64_t             flags;     /* operation flags */
98              };
99
100       flags  Flags  that  set the default operation of the poll set.  The use
101              of this field is reserved and must be set to 0 by the caller.
102
103   fi_close
104       The fi_close call releases all resources associated with  a  poll  set.
105       The  poll  set must not be associated with any other resources prior to
106       being closed, otherwise the call will return -FI_EBUSY.
107
108   fi_poll_add
109       Associates a completion queue or counter with a poll set.
110
111   fi_poll_del
112       Removes a completion queue or counter from a poll set.
113
114   fi_poll
115       Progresses all completion queues and counters associated  with  a  poll
116       set and checks for events.  If events might have occurred, contexts as‐
117       sociated with the completion queues and/or counters are returned.  Com‐
118       pletion  queues  will  return their context if they are not empty.  The
119       context associated with a counter will be  returned  if  the  counter’s
120       success  value or error value have changed since the last time fi_poll,
121       fi_cntr_set, or fi_cntr_add were called.  The  number  of  contexts  is
122       limited to the size of the context array, indicated by the count param‐
123       eter.
124
125       Note that fi_poll only indicates that events might  be  available.   In
126       some  cases,  providers  may  consume  such events internally, to drive
127       progress, for example.  This can result in fi_poll returning false pos‐
128       itives.   Applications should drive their progress based on the results
129       of reading events from a completion queue or  reading  counter  values.
130       The fi_poll function will always return all completion queues and coun‐
131       ters that do have new events.
132
133   fi_wait_open
134       fi_wait_open allocates a new wait set.  A wait set enables an optimized
135       method  of  waiting  for  events  across multiple completion queues and
136       counters.  Where possible, a wait set uses a single underlying wait ob‐
137       ject that is signaled when a specified condition occurs on an associat‐
138       ed completion queue or counter.
139
140       The properties and behavior  of  a  wait  set  are  defined  by  struct
141       fi_wait_attr.
142
143              struct fi_wait_attr {
144                  enum fi_wait_obj     wait_obj;  /* requested wait object */
145                  uint64_t             flags;     /* operation flags */
146              };
147
148       wait_obj
149              Wait sets are associated with specific wait object(s).  Wait ob‐
150              jects allow applications to block until the wait object is  sig‐
151              naled,  indicating  that  an event is available to be read.  The
152              following values may be used to specify the type of wait  object
153              associated   with   a   wait  set:  FI_WAIT_UNSPEC,  FI_WAIT_FD,
154              FI_WAIT_MUTEX_COND, and FI_WAIT_YIELD.
155
156       - FI_WAIT_UNSPEC
157              Specifies that the user will only wait on  the  wait  set  using
158              fabric  interface calls, such as fi_wait.  In this case, the un‐
159              derlying provider may select the  most  appropriate  or  highest
160              performing  wait  object available, including custom wait mecha‐
161              nisms.  Applications that select FI_WAIT_UNSPEC are not  guaran‐
162              teed to retrieve the underlying wait object.
163
164       - FI_WAIT_FD
165              Indicates  that the wait set should use a single file descriptor
166              as its wait mechanism, as exposed to the application.  Internal‐
167              ly,  this may require the use of epoll in order to support wait‐
168              ing on a single file descriptor.  File descriptor  wait  objects
169              must  be  usable  in  the POSIX select(2) and poll(2), and Linux
170              epoll(7) routines (if available).  Provider signal  an  FD  wait
171              object by marking it as readable or with an error.
172
173       - FI_WAIT_MUTEX_COND
174              Specifies  that the wait set should use a pthread mutex and cond
175              variable as a wait object.
176
177       - FI_WAIT_POLLFD
178              This option is similar to FI_WAIT_FD, but allows the wait mecha‐
179              nism  to use multiple file descriptors as its wait mechanism, as
180              viewed by the application.  The use of FI_WAIT_POLLFD can elimi‐
181              nate  the  need  to  use epoll to abstract away needing to check
182              multiple file descriptors when waiting for events.  The file de‐
183              scriptors must be usable in the POSIX select(2) and poll(2) rou‐
184              tines, and match directly to being  used  with  poll.   See  the
185              NOTES section below for details on using pollfd.
186
187       - FI_WAIT_YIELD
188              Indicates  that the wait set will wait without a wait object but
189              instead yield on every wait.
190
191       flags  Flags that set the default operation of the wait set.   The  use
192              of this field is reserved and must be set to 0 by the caller.
193
194   fi_close
195       The  fi_close  call  releases all resources associated with a wait set.
196       The wait set must not be bound to any other opened resources  prior  to
197       being closed, otherwise the call will return -FI_EBUSY.
198
199   fi_wait
200       Waits on a wait set until one or more of its underlying wait objects is
201       signaled.
202
203   fi_trywait
204       The fi_trywait call was introduced in libfabric version 1.3.   The  be‐
205       havior  of  using  native wait objects without the use of fi_trywait is
206       provider specific and should be considered non-deterministic.
207
208       The fi_trywait() call is used in conjunction with native operating sys‐
209       tem  calls to block on wait objects, such as file descriptors.  The ap‐
210       plication must call fi_trywait and obtain a return value of  FI_SUCCESS
211       prior to blocking on a native wait object.  Failure to do so may result
212       in the wait object not being signaled, and the application not  observ‐
213       ing the desired events.  The following pseudo-code demonstrates the use
214       of fi_trywait in conjunction with the OS select(2) call.
215
216              fi_control(&cq->fid, FI_GETWAIT, (void *) &fd);
217              FD_ZERO(&fds);
218              FD_SET(fd, &fds);
219
220              while (1) {
221                  if (fi_trywait(&cq, 1) == FI_SUCCESS)
222                      select(fd + 1, &fds, NULL, &fds, &timeout);
223
224                  do {
225                      ret = fi_cq_read(cq, &comp, 1);
226                  } while (ret > 0);
227              }
228
229       fi_trywait() will return FI_SUCCESS if it is safe to block on the  wait
230       object(s)  corresponding  to the fabric descriptor(s), or -FI_EAGAIN if
231       there are events queued on the fabric descriptor or if  blocking  could
232       hang the application.
233
234       The  call  takes  an array of fabric descriptors.  For each wait object
235       that will be passed to the native wait routine, the corresponding  fab‐
236       ric  descriptor  should  first be passed to fi_trywait.  All fabric de‐
237       scriptors passed into a single fi_trywait call must  make  use  of  the
238       same underlying wait object type.
239
240       The  following  types  of fabric descriptors may be passed into fi_try‐
241       wait: event queues, completion queues, counters, and wait sets.  Appli‐
242       cations  that wish to use native wait calls should select specific wait
243       objects when allocating such resources.  For example,  by  setting  the
244       item’s creation attribute wait_obj value to FI_WAIT_FD.
245
246       In  the  case  the wait object to check belongs to a wait set, only the
247       wait set itself needs to be passed into  fi_trywait.   The  fabric  re‐
248       sources associated with the wait set do not.
249
250       On  receiving a return value of -FI_EAGAIN from fi_trywait, an applica‐
251       tion should read all queued completions and events, and call fi_trywait
252       again  before attempting to block.  Applications can make use of a fab‐
253       ric poll set to identify completion queues and counters  that  may  re‐
254       quire processing.
255
256   fi_control
257       The  fi_control  call is used to access provider or implementation spe‐
258       cific details of a fids that support blocking calls, such as wait sets,
259       completion  queues, counters, and event queues.  Access to the wait set
260       or fid should be serialized across all calls  when  fi_control  is  in‐
261       voked,  as  it  may redirect the implementation of wait set operations.
262       The following control commands are usable with a wait set or fid.
263
264       FI_GETWAIT (void **)
265              This command allows the user to retrieve the low-level wait  ob‐
266              ject  associated with a wait set or fid.  The format of the wait
267              set is specified during wait set creation, through the wait  set
268              attributes.   The  fi_control arg parameter should be an address
269              where a pointer to the returned wait  object  will  be  written.
270              This should be an ’int *’ for FI_WAIT_FD, `struct fi_mutex_cond'
271              for   FI_WAIT_MUTEX_COND,   or   `struct   fi_wait_pollfd'   for
272              FI_WAIT_POLLFD.  Support for FI_GETWAIT is provider specific.
273
274       FI_GETWAITOBJ (enum fi_wait_obj *)
275              This  command  returns the type of wait object associated with a
276              wait set or fid.
277

RETURN VALUES

279       Returns FI_SUCCESS on success.  On error, a negative value  correspond‐
280       ing to fabric errno is returned.
281
282       Fabric errno values are defined in rdma/fi_errno.h.
283
284       fi_poll
285              On  success,  if events are available, returns the number of en‐
286              tries written to the context array.
287

NOTES

289       In many situations, blocking calls may need to wait on signals sent  to
290       a number of file descriptors.  For example, this is the case for socket
291       based providers, such as tcp and udp, as well as utility providers such
292       as multi-rail.  For simplicity, when epoll is available, it can be used
293       to limit the number of file descriptors that an application must  moni‐
294       tor.   The  use  of  epoll  may  also  be  required in order to support
295       FI_WAIT_FD.
296
297       However, in order to support waiting on multiple  file  descriptors  on
298       systems  where  epoll  support is not available, or where epoll perfor‐
299       mance may negatively impact performance, FI_WAIT_POLLFD  provides  this
300       mechanism.  A significant different between using POLLFD versus FD wait
301       objects is that with FI_WAIT_POLLFD, the file  descriptors  may  change
302       dynamically.   As  an  example,  the file descriptors associated with a
303       completion queues’ wait set may change as  endpoint  associations  with
304       the CQ are added and removed.
305
306       Struct fi_wait_pollfd is used to retrieve all file descriptors for fids
307       using FI_WAIT_POLLFD to support blocking calls.
308
309              struct fi_wait_pollfd {
310                  uint64_t      change_index;
311                  size_t        nfds;
312                  struct pollfd *fd;
313              };
314
315       change_index
316              The change_index may be used to determine if there have been any
317              changes  to the file descriptor list.  Anytime a file descriptor
318              is added, removed, or its events are updated, this field is  in‐
319              cremented by the provider.  Applications wishing to wait on file
320              descriptors directly should cache the change_index  value.   Be‐
321              fore  blocking  on  file  descriptor  events, the app should use
322              fi_control() to retrieve the current  change_index  and  compare
323              that  against  its cached value.  If the values differ, then the
324              app should update its file descriptor list prior to blocking.
325
326       nfds   On input to fi_control(), this indicates the number  of  entries
327              in  the  struct  pollfd * array.  On output, this will be set to
328              the number of entries needed to store the current number of file
329              descriptors.  If the input value is smaller than the output val‐
330              ue, fi_control() will return the error -FI_ETOOSMALL.  Note that
331              setting  nfds  =  0  allows  an  efficient  way  of checking the
332              change_index.
333
334       fd     This points to an array of struct pollfd entries.  The number of
335              entries  is  specified through the nfds field.  If the number of
336              needed entries is less than or equal to the  number  of  entries
337              available,  the  struct  pollfd  array will be filled out with a
338              list of file descriptors and corresponding events  that  can  be
339              used in the select(2) and poll(2) calls.
340
341       The  change_index  is updated only when the file descriptors associated
342       with the pollfd file set has changed.  Checking the change_index is  an
343       additional  step  needed  when working with FI_WAIT_POLLFD wait objects
344       directly.  The use of the fi_trywait() function is  still  required  if
345       accessing wait objects directly.
346

SEE ALSO

348       fi_getinfo(3), fi_domain(3), fi_cntr(3), fi_eq(3)
349

AUTHORS

351       OpenFabrics.
352
353
354
355Libfabric Programmer’s Manual     2022-12-11                        fi_poll(3)
Impressum