1fi_poll(3) Libfabric v1.17.0 fi_poll(3)
2
3
4
6 fi_poll - Polling and wait set operations
7
8 fi_poll_open / fi_close
9 Open/close a polling set
10
11 fi_poll_add / fi_poll_del
12 Add/remove a completion queue or counter to/from a poll set.
13
14 fi_poll
15 Poll for progress and events across multiple completion queues
16 and counters.
17
18 fi_wait_open / fi_close
19 Open/close a wait set
20
21 fi_wait
22 Waits for one or more wait objects in a set to be signaled.
23
24 fi_trywait
25 Indicate when it is safe to block on wait objects using native
26 OS calls.
27
28 fi_control
29 Control wait set operation or attributes.
30
32 #include <rdma/fi_domain.h>
33
34 int fi_poll_open(struct fid_domain *domain, struct fi_poll_attr *attr,
35 struct fid_poll **pollset);
36
37 int fi_close(struct fid *pollset);
38
39 int fi_poll_add(struct fid_poll *pollset, struct fid *event_fid,
40 uint64_t flags);
41
42 int fi_poll_del(struct fid_poll *pollset, struct fid *event_fid,
43 uint64_t flags);
44
45 int fi_poll(struct fid_poll *pollset, void **context, int count);
46
47 int fi_wait_open(struct fid_fabric *fabric, struct fi_wait_attr *attr,
48 struct fid_wait **waitset);
49
50 int fi_close(struct fid *waitset);
51
52 int fi_wait(struct fid_wait *waitset, int timeout);
53
54 int fi_trywait(struct fid_fabric *fabric, struct fid **fids, size_t count);
55
56 int fi_control(struct fid *waitset, int command, void *arg);
57
59 fabric Fabric provider
60
61 domain Resource domain
62
63 pollset
64 Event poll set
65
66 waitset
67 Wait object set
68
69 attr Poll or wait set attributes
70
71 context
72 On success, an array of user context values associated with com‐
73 pletion queues or counters.
74
75 fids An array of fabric descriptors, each one associated with a na‐
76 tive wait object.
77
78 count Number of entries in context or fids array.
79
80 timeout
81 Time to wait for a signal, in milliseconds.
82
83 command
84 Command of control operation to perform on the wait set.
85
86 arg Optional control argument.
87
89 fi_poll_open
90 fi_poll_open creates a new polling set. A poll set enables an opti‐
91 mized method for progressing asynchronous operations across multiple
92 completion queues and counters and checking for their completions.
93
94 A poll set is defined with the following attributes.
95
96 struct fi_poll_attr {
97 uint64_t flags; /* operation flags */
98 };
99
100 flags Flags that set the default operation of the poll set. The use
101 of this field is reserved and must be set to 0 by the caller.
102
103 fi_close
104 The fi_close call releases all resources associated with a poll set.
105 The poll set must not be associated with any other resources prior to
106 being closed, otherwise the call will return -FI_EBUSY.
107
108 fi_poll_add
109 Associates a completion queue or counter with a poll set.
110
111 fi_poll_del
112 Removes a completion queue or counter from a poll set.
113
114 fi_poll
115 Progresses all completion queues and counters associated with a poll
116 set and checks for events. If events might have occurred, contexts as‐
117 sociated with the completion queues and/or counters are returned. Com‐
118 pletion queues will return their context if they are not empty. The
119 context associated with a counter will be returned if the counter’s
120 success value or error value have changed since the last time fi_poll,
121 fi_cntr_set, or fi_cntr_add were called. The number of contexts is
122 limited to the size of the context array, indicated by the count param‐
123 eter.
124
125 Note that fi_poll only indicates that events might be available. In
126 some cases, providers may consume such events internally, to drive
127 progress, for example. This can result in fi_poll returning false pos‐
128 itives. Applications should drive their progress based on the results
129 of reading events from a completion queue or reading counter values.
130 The fi_poll function will always return all completion queues and coun‐
131 ters that do have new events.
132
133 fi_wait_open
134 fi_wait_open allocates a new wait set. A wait set enables an optimized
135 method of waiting for events across multiple completion queues and
136 counters. Where possible, a wait set uses a single underlying wait ob‐
137 ject that is signaled when a specified condition occurs on an associat‐
138 ed completion queue or counter.
139
140 The properties and behavior of a wait set are defined by struct
141 fi_wait_attr.
142
143 struct fi_wait_attr {
144 enum fi_wait_obj wait_obj; /* requested wait object */
145 uint64_t flags; /* operation flags */
146 };
147
148 wait_obj
149 Wait sets are associated with specific wait object(s). Wait ob‐
150 jects allow applications to block until the wait object is sig‐
151 naled, indicating that an event is available to be read. The
152 following values may be used to specify the type of wait object
153 associated with a wait set: FI_WAIT_UNSPEC, FI_WAIT_FD,
154 FI_WAIT_MUTEX_COND, and FI_WAIT_YIELD.
155
156 - FI_WAIT_UNSPEC
157 Specifies that the user will only wait on the wait set using
158 fabric interface calls, such as fi_wait. In this case, the un‐
159 derlying provider may select the most appropriate or highest
160 performing wait object available, including custom wait mecha‐
161 nisms. Applications that select FI_WAIT_UNSPEC are not guaran‐
162 teed to retrieve the underlying wait object.
163
164 - FI_WAIT_FD
165 Indicates that the wait set should use a single file descriptor
166 as its wait mechanism, as exposed to the application. Internal‐
167 ly, this may require the use of epoll in order to support wait‐
168 ing on a single file descriptor. File descriptor wait objects
169 must be usable in the POSIX select(2) and poll(2), and Linux
170 epoll(7) routines (if available). Provider signal an FD wait
171 object by marking it as readable or with an error.
172
173 - FI_WAIT_MUTEX_COND
174 Specifies that the wait set should use a pthread mutex and cond
175 variable as a wait object.
176
177 - FI_WAIT_POLLFD
178 This option is similar to FI_WAIT_FD, but allows the wait mecha‐
179 nism to use multiple file descriptors as its wait mechanism, as
180 viewed by the application. The use of FI_WAIT_POLLFD can elimi‐
181 nate the need to use epoll to abstract away needing to check
182 multiple file descriptors when waiting for events. The file de‐
183 scriptors must be usable in the POSIX select(2) and poll(2) rou‐
184 tines, and match directly to being used with poll. See the
185 NOTES section below for details on using pollfd.
186
187 - FI_WAIT_YIELD
188 Indicates that the wait set will wait without a wait object but
189 instead yield on every wait.
190
191 flags Flags that set the default operation of the wait set. The use
192 of this field is reserved and must be set to 0 by the caller.
193
194 fi_close
195 The fi_close call releases all resources associated with a wait set.
196 The wait set must not be bound to any other opened resources prior to
197 being closed, otherwise the call will return -FI_EBUSY.
198
199 fi_wait
200 Waits on a wait set until one or more of its underlying wait objects is
201 signaled.
202
203 fi_trywait
204 The fi_trywait call was introduced in libfabric version 1.3. The be‐
205 havior of using native wait objects without the use of fi_trywait is
206 provider specific and should be considered non-deterministic.
207
208 The fi_trywait() call is used in conjunction with native operating sys‐
209 tem calls to block on wait objects, such as file descriptors. The ap‐
210 plication must call fi_trywait and obtain a return value of FI_SUCCESS
211 prior to blocking on a native wait object. Failure to do so may result
212 in the wait object not being signaled, and the application not observ‐
213 ing the desired events. The following pseudo-code demonstrates the use
214 of fi_trywait in conjunction with the OS select(2) call.
215
216 fi_control(&cq->fid, FI_GETWAIT, (void *) &fd);
217 FD_ZERO(&fds);
218 FD_SET(fd, &fds);
219
220 while (1) {
221 if (fi_trywait(&cq, 1) == FI_SUCCESS)
222 select(fd + 1, &fds, NULL, &fds, &timeout);
223
224 do {
225 ret = fi_cq_read(cq, &comp, 1);
226 } while (ret > 0);
227 }
228
229 fi_trywait() will return FI_SUCCESS if it is safe to block on the wait
230 object(s) corresponding to the fabric descriptor(s), or -FI_EAGAIN if
231 there are events queued on the fabric descriptor or if blocking could
232 hang the application.
233
234 The call takes an array of fabric descriptors. For each wait object
235 that will be passed to the native wait routine, the corresponding fab‐
236 ric descriptor should first be passed to fi_trywait. All fabric de‐
237 scriptors passed into a single fi_trywait call must make use of the
238 same underlying wait object type.
239
240 The following types of fabric descriptors may be passed into fi_try‐
241 wait: event queues, completion queues, counters, and wait sets. Appli‐
242 cations that wish to use native wait calls should select specific wait
243 objects when allocating such resources. For example, by setting the
244 item’s creation attribute wait_obj value to FI_WAIT_FD.
245
246 In the case the wait object to check belongs to a wait set, only the
247 wait set itself needs to be passed into fi_trywait. The fabric re‐
248 sources associated with the wait set do not.
249
250 On receiving a return value of -FI_EAGAIN from fi_trywait, an applica‐
251 tion should read all queued completions and events, and call fi_trywait
252 again before attempting to block. Applications can make use of a fab‐
253 ric poll set to identify completion queues and counters that may re‐
254 quire processing.
255
256 fi_control
257 The fi_control call is used to access provider or implementation spe‐
258 cific details of a fids that support blocking calls, such as wait sets,
259 completion queues, counters, and event queues. Access to the wait set
260 or fid should be serialized across all calls when fi_control is in‐
261 voked, as it may redirect the implementation of wait set operations.
262 The following control commands are usable with a wait set or fid.
263
264 FI_GETWAIT (void **)
265 This command allows the user to retrieve the low-level wait ob‐
266 ject associated with a wait set or fid. The format of the wait
267 set is specified during wait set creation, through the wait set
268 attributes. The fi_control arg parameter should be an address
269 where a pointer to the returned wait object will be written.
270 This should be an ’int *’ for FI_WAIT_FD, `struct fi_mutex_cond'
271 for FI_WAIT_MUTEX_COND, or `struct fi_wait_pollfd' for
272 FI_WAIT_POLLFD. Support for FI_GETWAIT is provider specific.
273
274 FI_GETWAITOBJ (enum fi_wait_obj *)
275 This command returns the type of wait object associated with a
276 wait set or fid.
277
279 Returns FI_SUCCESS on success. On error, a negative value correspond‐
280 ing to fabric errno is returned.
281
282 Fabric errno values are defined in rdma/fi_errno.h.
283
284 fi_poll
285 On success, if events are available, returns the number of en‐
286 tries written to the context array.
287
289 In many situations, blocking calls may need to wait on signals sent to
290 a number of file descriptors. For example, this is the case for socket
291 based providers, such as tcp and udp, as well as utility providers such
292 as multi-rail. For simplicity, when epoll is available, it can be used
293 to limit the number of file descriptors that an application must moni‐
294 tor. The use of epoll may also be required in order to support
295 FI_WAIT_FD.
296
297 However, in order to support waiting on multiple file descriptors on
298 systems where epoll support is not available, or where epoll perfor‐
299 mance may negatively impact performance, FI_WAIT_POLLFD provides this
300 mechanism. A significant different between using POLLFD versus FD wait
301 objects is that with FI_WAIT_POLLFD, the file descriptors may change
302 dynamically. As an example, the file descriptors associated with a
303 completion queues’ wait set may change as endpoint associations with
304 the CQ are added and removed.
305
306 Struct fi_wait_pollfd is used to retrieve all file descriptors for fids
307 using FI_WAIT_POLLFD to support blocking calls.
308
309 struct fi_wait_pollfd {
310 uint64_t change_index;
311 size_t nfds;
312 struct pollfd *fd;
313 };
314
315 change_index
316 The change_index may be used to determine if there have been any
317 changes to the file descriptor list. Anytime a file descriptor
318 is added, removed, or its events are updated, this field is in‐
319 cremented by the provider. Applications wishing to wait on file
320 descriptors directly should cache the change_index value. Be‐
321 fore blocking on file descriptor events, the app should use
322 fi_control() to retrieve the current change_index and compare
323 that against its cached value. If the values differ, then the
324 app should update its file descriptor list prior to blocking.
325
326 nfds On input to fi_control(), this indicates the number of entries
327 in the struct pollfd * array. On output, this will be set to
328 the number of entries needed to store the current number of file
329 descriptors. If the input value is smaller than the output val‐
330 ue, fi_control() will return the error -FI_ETOOSMALL. Note that
331 setting nfds = 0 allows an efficient way of checking the
332 change_index.
333
334 fd This points to an array of struct pollfd entries. The number of
335 entries is specified through the nfds field. If the number of
336 needed entries is less than or equal to the number of entries
337 available, the struct pollfd array will be filled out with a
338 list of file descriptors and corresponding events that can be
339 used in the select(2) and poll(2) calls.
340
341 The change_index is updated only when the file descriptors associated
342 with the pollfd file set has changed. Checking the change_index is an
343 additional step needed when working with FI_WAIT_POLLFD wait objects
344 directly. The use of the fi_trywait() function is still required if
345 accessing wait objects directly.
346
348 fi_getinfo(3), fi_domain(3), fi_cntr(3), fi_eq(3)
349
351 OpenFabrics.
352
353
354
355Libfabric Programmer’s Manual 2022-12-11 fi_poll(3)