1fi_atomic(3) Libfabric v1.10.0 fi_atomic(3)
2
3
4
6 fi_atomic - Remote atomic functions
7
8 fi_atomic / fi_atomicv / fi_atomicmsg / fi_inject_atomic
9 Initiates an atomic operation to remote memory
10
11 fi_fetch_atomic / fi_fetch_atomicv / fi_fetch_atomicmsg
12 Initiates an atomic operation to remote memory, retrieving the
13 initial value.
14
15 fi_compare_atomic / fi_compare_atomicv / fi_compare_atomicmsg
16 Initiates an atomic compare-operation to remote memory, retriev‐
17 ing the initial value.
18
19 fi_atomicvalid / fi_fetch_atomicvalid / fi_compare_atomicvalid /
20 fi_query_atomic : Indicates if a provider supports a specific atomic
21 operation
22
24 #include <rdma/fi_atomic.h>
25
26 ssize_t fi_atomic(struct fid_ep *ep, const void *buf,
27 size_t count, void *desc, fi_addr_t dest_addr,
28 uint64_t addr, uint64_t key,
29 enum fi_datatype datatype, enum fi_op op, void *context);
30
31 ssize_t fi_atomicv(struct fid_ep *ep, const struct fi_ioc *iov,
32 void **desc, size_t count, fi_addr_t dest_addr,
33 uint64_t addr, uint64_t key,
34 enum fi_datatype datatype, enum fi_op op, void *context);
35
36 ssize_t fi_atomicmsg(struct fid_ep *ep, const struct fi_msg_atomic *msg,
37 uint64_t flags);
38
39 ssize_t fi_inject_atomic(struct fid_ep *ep, const void *buf,
40 size_t count, fi_addr_t dest_addr,
41 uint64_t addr, uint64_t key,
42 enum fi_datatype datatype, enum fi_op op);
43
44 ssize_t fi_fetch_atomic(struct fid_ep *ep, const void *buf,
45 size_t count, void *desc, void *result, void *result_desc,
46 fi_addr_t dest_addr, uint64_t addr, uint64_t key,
47 enum fi_datatype datatype, enum fi_op op, void *context);
48
49 ssize_t fi_fetch_atomicv(struct fid_ep *ep, const struct fi_ioc *iov,
50 void **desc, size_t count, struct fi_ioc *resultv,
51 void **result_desc, size_t result_count, fi_addr_t dest_addr,
52 uint64_t addr, uint64_t key, enum fi_datatype datatype,
53 enum fi_op op, void *context);
54
55 ssize_t fi_fetch_atomicmsg(struct fid_ep *ep,
56 const struct fi_msg_atomic *msg, struct fi_ioc *resultv,
57 void **result_desc, size_t result_count, uint64_t flags);
58
59 ssize_t fi_compare_atomic(struct fid_ep *ep, const void *buf,
60 size_t count, void *desc, const void *compare,
61 void *compare_desc, void *result, void *result_desc,
62 fi_addr_t dest_addr, uint64_t addr, uint64_t key,
63 enum fi_datatype datatype, enum fi_op op, void *context);
64
65 size_t fi_compare_atomicv(struct fid_ep *ep, const struct fi_ioc *iov,
66 void **desc, size_t count, const struct fi_ioc *comparev,
67 void **compare_desc, size_t compare_count, struct fi_ioc *resultv,
68 void **result_desc, size_t result_count, fi_addr_t dest_addr,
69 uint64_t addr, uint64_t key, enum fi_datatype datatype,
70 enum fi_op op, void *context);
71
72 ssize_t fi_compare_atomicmsg(struct fid_ep *ep,
73 const struct fi_msg_atomic *msg, const struct fi_ioc *comparev,
74 void **compare_desc, size_t compare_count,
75 struct fi_ioc *resultv, void **result_desc, size_t result_count,
76 uint64_t flags);
77
78 int fi_atomicvalid(struct fid_ep *ep, enum fi_datatype datatype,
79 enum fi_op op, size_t *count);
80
81 int fi_fetch_atomicvalid(struct fid_ep *ep, enum fi_datatype datatype,
82 enum fi_op op, size_t *count);
83
84 int fi_compare_atomicvalid(struct fid_ep *ep, enum fi_datatype datatype,
85 enum fi_op op, size_t *count);
86
87 int fi_query_atomic(struct fid_domain *domain,
88 enum fi_datatype datatype, enum fi_op op,
89 struct fi_atomic_attr *attr, uint64_t flags);
90
92 ep Fabric endpoint on which to initiate atomic operation.
93
94 buf Local data buffer that specifies first operand of atomic opera‐
95 tion
96
97 iov / comparev / resultv
98 Vectored data buffer(s).
99
100 count / compare_count / result_count
101 Count of vectored data entries. The number of elements refer‐
102 enced, where each element is the indicated datatype.
103
104 addr Address of remote memory to access.
105
106 key Protection key associated with the remote memory.
107
108 datatype
109 Datatype associated with atomic operands
110
111 op Atomic operation to perform
112
113 compare
114 Local compare buffer, containing comparison data.
115
116 result Local data buffer to store initial value of remote buffer
117
118 desc / compare_desc / result_desc
119 Data descriptor associated with the local data buffer, local
120 compare buffer, and local result buffer, respectively. See
121 fi_mr(3).
122
123 dest_addr
124 Destination address for connectionless atomic operations. Ig‐
125 nored for connected endpoints.
126
127 msg Message descriptor for atomic operations
128
129 flags Additional flags to apply for the atomic operation
130
131 context
132 User specified pointer to associate with the operation. This
133 parameter is ignored if the operation will not generate a suc‐
134 cessful completion, unless an op flag specifies the context pa‐
135 rameter be used for required input.
136
138 Atomic transfers are used to read and update data located in remote
139 memory regions in an atomic fashion. Conceptually, they are similar to
140 local atomic operations of a similar nature (e.g. atomic increment,
141 compare and swap, etc.). Updates to remote data involve one of several
142 operations on the data, and act on specific types of data, as listed
143 below. As such, atomic transfers have knowledge of the format of the
144 data being accessed. A single atomic function may operate across an
145 array of data applying an atomic operation to each entry, but the atom‐
146 icity of an operation is limited to a single datatype or entry.
147
148 Atomic Data Types
149 Atomic functions may operate on one of the following identified data
150 types. A given atomic function may support any datatype, subject to
151 provider implementation constraints.
152
153 FI_INT8
154 Signed 8-bit integer.
155
156 FI_UINT8
157 Unsigned 8-bit integer.
158
159 FI_INT16
160 Signed 16-bit integer.
161
162 FI_UINT16
163 Unsigned 16-bit integer.
164
165 FI_INT32
166 Signed 32-bit integer.
167
168 FI_UINT32
169 Unsigned 32-bit integer.
170
171 FI_INT64
172 Signed 64-bit integer.
173
174 FI_UINT64
175 Unsigned 64-bit integer.
176
177 FI_FLOAT
178 A single-precision floating point value (IEEE 754).
179
180 FI_DOUBLE
181 A double-precision floating point value (IEEE 754).
182
183 FI_FLOAT_COMPLEX
184 An ordered pair of single-precision floating point values (IEEE
185 754), with the first value representing the real portion of a
186 complex number and the second representing the imaginary por‐
187 tion.
188
189 FI_DOUBLE_COMPLEX
190 An ordered pair of double-precision floating point values (IEEE
191 754), with the first value representing the real portion of a
192 complex number and the second representing the imaginary por‐
193 tion.
194
195 FI_LONG_DOUBLE
196 A double-extended precision floating point value (IEEE 754).
197 Note that the size of a long double and number of bits used for
198 precision is compiler, platform, and/or provider specific. De‐
199 velopers that use long double should ensure that libfabric is
200 built using a long double format that is compatible with their
201 application, and that format is supported by the provider. The
202 mechanism used for this validation is currently beyond the scope
203 of the libfabric API.
204
205 FI_LONG_DOUBLE_COMPLEX
206 An ordered pair of double-extended precision floating point val‐
207 ues (IEEE 754), with the first value representing the real por‐
208 tion of a complex number and the second representing the imagi‐
209 nary portion.
210
211 Atomic Operations
212 The following atomic operations are defined. An atomic operation often
213 acts against a target value in the remote memory buffer and source val‐
214 ue provided with the atomic function. It may also carry source data to
215 replace the target value in compare and swap operations. A conceptual
216 description of each operation is provided.
217
218 FI_MIN Minimum
219
220 if (buf[i] < addr[i])
221 addr[i] = buf[i]
222
223 FI_MAX Maximum
224
225 if (buf[i] > addr[i])
226 addr[i] = buf[i]
227
228 FI_SUM Sum
229
230 addr[i] = addr[i] + buf[i]
231
232 FI_PROD
233 Product
234
235 addr[i] = addr[i] * buf[i]
236
237 FI_LOR Logical OR
238
239 addr[i] = (addr[i] || buf[i])
240
241 FI_LAND
242 Logical AND
243
244 addr[i] = (addr[i] && buf[i])
245
246 FI_BOR Bitwise OR
247
248 addr[i] = addr[i] | buf[i]
249
250 FI_BAND
251 Bitwise AND
252
253 addr[i] = addr[i] & buf[i]
254
255 FI_LXOR
256 Logical exclusive-OR (XOR)
257
258 addr[i] = ((addr[i] && !buf[i]) || (!addr[i] && buf[i]))
259
260 FI_BXOR
261 Bitwise exclusive-OR (XOR)
262
263 addr[i] = addr[i] ^ buf[i]
264
265 FI_ATOMIC_READ
266 Read data atomically
267
268 result[i] = addr[i]
269
270 FI_ATOMIC_WRITE
271 Write data atomically
272
273 addr[i] = buf[i]
274
275 FI_CSWAP
276 Compare values and if equal swap with data
277
278 if (compare[i] == addr[i])
279 addr[i] = buf[i]
280
281 FI_CSWAP_NE
282 Compare values and if not equal swap with data
283
284 if (compare[i] != addr[i])
285 addr[i] = buf[i]
286
287 FI_CSWAP_LE
288 Compare values and if less than or equal swap with data
289
290 if (compare[i] <= addr[i])
291 addr[i] = buf[i]
292
293 FI_CSWAP_LT
294 Compare values and if less than swap with data
295
296 if (compare[i] < addr[i])
297 addr[i] = buf[i]
298
299 FI_CSWAP_GE
300 Compare values and if greater than or equal swap with data
301
302 if (compare[i] >= addr[i])
303 addr[i] = buf[i]
304
305 FI_CSWAP_GT
306 Compare values and if greater than swap with data
307
308 if (compare[i] > addr[i])
309 addr[i] = buf[i]
310
311 FI_MSWAP
312 Swap masked bits with data
313
314 addr[i] = (buf[i] & compare[i]) | (addr[i] & ~compare[i])
315
316 Base Atomic Functions
317 The base atomic functions -- fi_atomic, fi_atomicv, fi_atomicmsg -- are
318 used to transmit data to a remote node, where the specified atomic op‐
319 eration is performed against the target data. The result of a base
320 atomic function is stored at the remote memory region. The main dif‐
321 ference between atomic functions are the number and type of parameters
322 that they accept as input. Otherwise, they perform the same general
323 function.
324
325 The call fi_atomic transfers the data contained in the user-specified
326 data buffer to a remote node. For unconnected endpoints, the destina‐
327 tion endpoint is specified through the dest_addr parameter. Unless the
328 endpoint has been configured differently, the data buffer passed into
329 fi_atomic must not be touched by the application until the fi_atomic
330 call completes asynchronously. The target buffer of a base atomic op‐
331 eration must allow for remote read an/or write access, as appropriate.
332
333 The fi_atomicv call adds support for a scatter-gather list to fi_atom‐
334 ic. The fi_atomicv transfers the set of data buffers referenced by the
335 ioc parameter to the remote node for processing.
336
337 The fi_inject_atomic call is an optimized version of fi_atomic. The
338 fi_inject_atomic function behaves as if the FI_INJECT transfer flag
339 were set, and FI_COMPLETION were not. That is, the data buffer is
340 available for reuse immediately on returning from from fi_inject_atom‐
341 ic, and no completion event will be generated for this atomic. The
342 completion event will be suppressed even if the endpoint has not been
343 configured with FI_SELECTIVE_COMPLETION. See the flags discussion be‐
344 low for more details. The requested message size that can be used with
345 fi_inject_atomic is limited by inject_size.
346
347 The fi_atomicmsg call supports atomic functions over both connected and
348 unconnected endpoints, with the ability to control the atomic operation
349 per call through the use of flags. The fi_atomicmsg function takes a
350 struct fi_msg_atomic as input.
351
352 struct fi_msg_atomic {
353 const struct fi_ioc *msg_iov; /* local scatter-gather array */
354 void **desc; /* local access descriptors */
355 size_t iov_count;/* # elements in ioc */
356 const void *addr; /* optional endpoint address */
357 const struct fi_rma_ioc *rma_iov; /* remote SGL */
358 size_t rma_iov_count;/* # elements in remote SGL */
359 enum fi_datatype datatype; /* operand datatype */
360 enum fi_op op; /* atomic operation */
361 void *context; /* user-defined context */
362 uint64_t data; /* optional data */
363 };
364
365 struct fi_ioc {
366 void *addr; /* local address */
367 size_t count; /* # target operands */
368 };
369
370 struct fi_rma_ioc {
371 uint64_t addr; /* target address */
372 size_t count; /* # target operands */
373 uint64_t key; /* access key */
374 };
375
376 The following list of atomic operations are usable with base atomic op‐
377 erations: FI_MIN, FI_MAX, FI_SUM, FI_PROD, FI_LOR, FI_LAND, FI_BOR,
378 FI_BAND, FI_LXOR, FI_BXOR, and FI_ATOMIC_WRITE.
379
380 Fetch-Atomic Functions
381 The fetch atomic functions -- fi_fetch_atomic, fi_fetch_atomicv, and
382 fi_fetch atomicmsg -- behave similar to the equivalent base atomic
383 function. The difference between the fetch and base atomic calls are
384 the fetch atomic routines return the initial value that was stored at
385 the target to the user. The initial value is read into the user pro‐
386 vided result buffer. The target buffer of fetch-atomic operations must
387 be enabled for remote read access.
388
389 The following list of atomic operations are usable with fetch atomic
390 operations: FI_MIN, FI_MAX, FI_SUM, FI_PROD, FI_LOR, FI_LAND, FI_BOR,
391 FI_BAND, FI_LXOR, FI_BXOR, FI_ATOMIC_READ, and FI_ATOMIC_WRITE.
392
393 For FI_ATOMIC_READ operations, the source buffer operand (e.g.
394 fi_fetch_atomic buf parameter) is ignored and may be NULL. The results
395 are written into the result buffer.
396
397 Compare-Atomic Functions
398 The compare atomic functions -- fi_compare_atomic, fi_compare_atomicv,
399 and fi_compare atomicmsg -- are used for operations that require com‐
400 paring the target data against a value before performing a swap opera‐
401 tion. The compare atomic functions support: FI_CSWAP, FI_CSWAP_NE,
402 FI_CSWAP_LE, FI_CSWAP_LT, FI_CSWAP_GE, FI_CSWAP_GT, and FI_MSWAP.
403
404 Atomic Valid Functions
405 The atomic valid functions -- fi_atomicvalid, fi_fetch_atomicvalid, and
406 fi_compare_atomicvalid --indicate which operations the local provider
407 supports. Needed operations not supported by the provider must be emu‐
408 lated by the application. Each valid call corresponds to a set of
409 atomic functions. fi_atomicvalid checks whether a provider supports a
410 specific base atomic operation for a given datatype and operation.
411 fi_fetch_atomicvalid indicates if a provider supports a specific
412 fetch-atomic operation for a given datatype and operation. And fi_com‐
413 pare_atomicvalid checks if a provider supports a specified com‐
414 pare-atomic operation for a given datatype and operation.
415
416 If an operation is supported, an atomic valid call will return 0, along
417 with a count of atomic data units that a single function call will op‐
418 erate on.
419
420 Query Atomic Attributes
421 The fi_query_atomic call acts as an enhanced atomic valid operation
422 (see the atomic valid function definitions above). It is provided, in
423 part, for future extensibility. The query operation reports which
424 atomic operations are supported by the domain, for suitably configured
425 endpoints.
426
427 The behavior of fi_query_atomic is adjusted based on the flags parame‐
428 ter. If flags is 0, then the operation reports the supported atomic
429 attributes for base atomic operations, similar to fi_atomicvalid for
430 endpoints. If flags has the FI_FETCH_ATOMIC bit set, the operation be‐
431 haves similar to fi_fetch_atomicvalid. Similarly, the flag bit FI_COM‐
432 PARE_ATOMIC results in query acting as fi_compare_atomicvalid. The
433 FI_FETCH_ATOMIC and FI_COMPARE_ATOMIC bits may not both be set.
434
435 If the FI_TAGGED bit is set, the provider will indicate if it supports
436 atomic operations to tagged receive buffers. The FI_TAGGED bit may be
437 used by itself, or in conjunction with the FI_FETCH_ATOMIC and FI_COM‐
438 PARE_ATOMIC flags.
439
440 The output of fi_query_atomic is struct fi_atomic_attr:
441
442 struct fi_atomic_attr {
443 size_t count;
444 size_t size;
445 };
446
447 The count attribute field is as defined for the atomic valid calls.
448 The size field indicates the size in bytes of the atomic datatype. The
449 size field is useful for datatypes that may differ in sizes based on
450 the platform or compiler, such FI_LONG_DOUBLE.
451
452 Completions
453 Completed atomic operations are reported to the initiator of the re‐
454 quest through an associated completion queue or counter. Any user pro‐
455 vided context specified with the request will be returned as part of
456 any completion event written to a CQ. See fi_cq for completion event
457 details.
458
459 Any results returned to the initiator as part of an atomic operation
460 will be available prior to a completion event being generated. This
461 will be true even if the requested completion semantic provides a weak‐
462 er guarantee. That is, atomic fetch operations have FI_DELIVERY_COM‐
463 PLETE semantics. Completions generated for other types of atomic oper‐
464 ations indicate that it is safe to re-use the source data buffers.
465
466 Any updates to data at the target of an atomic operation will be visi‐
467 ble to agents (CPU processes, NICs, and other devices) on the target
468 node prior to one of the following occurring. If the atomic operation
469 generates a completion event or updates a completion counter at the
470 target endpoint, the results will be available prior to the completion
471 notification. After processing a completion for the atomic, if the
472 initiator submits a transfer between the same endpoints that generates
473 a completion at the target, the results will be available prior to the
474 subsequent transfer's event. Or, if a fenced data transfer from the
475 initiator follows the atomic request, the results will be available
476 prior to a completion at the target for the fenced transfer.
477
478 The correctness of atomic operations on a target memory region is guar‐
479 anteed only when performed by a single actor for a given window of
480 time. An actor is defined as a single libfabric domain (identified by
481 the domain name, and not an open instance of that domain), a coherent
482 CPU complex, or other device (e.g. GPU) capable of performing atomic
483 operations on the target memory. The results of atomic operations per‐
484 formed by multiple actors simultaneously are undefined. For example,
485 issuing CPU based atomic operations to a target region concurrently be‐
486 ing updated by NIC based atomics may leave the region's data in an un‐
487 known state. The results of a first actor's atomic operations must be
488 visible to a second actor prior to the second actor issuing its own
489 atomics.
490
492 The fi_atomicmsg, fi_fetch_atomicmsg, and fi_compare_atomicmsg calls
493 allow the user to specify flags which can change the default data
494 transfer operation. Flags specified with atomic message operations
495 override most flags previously configured with the endpoint, except
496 where noted (see fi_control). The following list of flags are usable
497 with atomic message calls.
498
499 FI_COMPLETION
500 Indicates that a completion entry should be generated for the
501 specified operation. The endpoint must be bound to a completion
502 queue with FI_SELECTIVE_COMPLETION that corresponds to the spec‐
503 ified operation, or this flag is ignored.
504
505 FI_MORE
506 Indicates that the user has additional requests that will imme‐
507 diately be posted after the current call returns. Use of this
508 flag may improve performance by enabling the provider to opti‐
509 mize its access to the fabric hardware.
510
511 FI_INJECT
512 Indicates that the control of constant data buffers should be
513 returned to the user immediately after the call returns, even if
514 the operation is handled asynchronously. This may require that
515 the underlying provider implementation copy the data into a lo‐
516 cal buffer and transfer out of that buffer. Constant data buf‐
517 fers refers to any data buffer or iovec used by the atomic APIs
518 that are marked as 'const'. Non-constant or output buffers are
519 unaffected by this flag and may be accessed by the provider at
520 anytime until the operation has completed. This flag can only
521 be used with messages smaller than inject_size.
522
523 FI_FENCE
524 Applies to transmits. Indicates that the requested operation,
525 also known as the fenced operation, and any operation posted af‐
526 ter the fenced operation will be deferred until all previous op‐
527 erations targeting the same peer endpoint have completed. Oper‐
528 ations posted after the fencing will see and/or replace the re‐
529 sults of any operations initiated prior to the fenced operation.
530
531 The ordering of operations starting at the posting of the fenced opera‐
532 tion (inclusive) to the posting of a subsequent fenced operation (ex‐
533 clusive) is controlled by the endpoint's ordering semantics.
534
535 FI_TAGGED
536 Specifies that the target of the atomic operation is a tagged
537 receive buffer instead of an RMA buffer. When a tagged buffer
538 is the target memory region, the addr parameter is used as a
539 0-based byte offset into the tagged buffer, with the key parame‐
540 ter specifying the tag.
541
543 Returns 0 on success. On error, a negative value corresponding to fab‐
544 ric errno is returned. Fabric errno values are defined in rdma/fi_er‐
545 rno.h.
546
548 -FI_EAGAIN
549 See fi_msg(3) for a detailed description of handling FI_EAGAIN.
550
551 -FI_EOPNOTSUPP
552 The requested atomic operation is not supported on this end‐
553 point.
554
555 -FI_EMSGSIZE
556 The number of atomic operations in a single request exceeds that
557 supported by the underlying provider.
558
560 Atomic operations operate on an array of values of a specific data
561 type. Atomicity is only guaranteed for each data type operation, not
562 across the entire array. The following pseudo-code demonstrates this
563 operation for 64-bit unsigned atomic write. ATOMIC_WRITE_U64 is a
564 platform dependent macro that atomically writes 8 bytes to an aligned
565 memory location.
566
567 fi_atomic(ep, buf, count, NULL, dest_addr, addr, key,
568 FI_UINT64, FI_ATOMIC_WRITE, context)
569 {
570 for (i = 1; i < count; i ++)
571 ATOMIC_WRITE_U64(((uint64_t *) addr)[i],
572 ((uint64_t *) buf)[i]);
573 }
574
575 The number of array elements to operate on is specified through a count
576 parameter. This must be between 1 and the maximum returned through the
577 relevant valid operation, inclusive. The requested operation and data
578 type must also be valid for the given provider.
579
580 The ordering of atomic operations carried as part of different request
581 messages is subject to the message and data ordering definitions as‐
582 signed to the transmitting and receiving endpoints. Both message and
583 data ordering are required if the results of two atomic operations to
584 the same memory buffers are to reflect the second operation acting on
585 the results of the first. See fi_endpoint(3) for further details and
586 message size restrictions.
587
589 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cq(3), fi_rma(3)
590
592 OpenFabrics.
593
594
595
596Libfabric Programmer's Manual 2019-09-27 fi_atomic(3)