1fi_atomic(3) Libfabric v1.12.1 fi_atomic(3)
2
3
4
6 fi_atomic - Remote atomic functions
7
8 fi_atomic / fi_atomicv / fi_atomicmsg / fi_inject_atomic
9 Initiates an atomic operation to remote memory
10
11 fi_fetch_atomic / fi_fetch_atomicv / fi_fetch_atomicmsg
12 Initiates an atomic operation to remote memory, retrieving the
13 initial value.
14
15 fi_compare_atomic / fi_compare_atomicv / fi_compare_atomicmsg
16 Initiates an atomic compare-operation to remote memory, retriev‐
17 ing the initial value.
18
19 fi_atomicvalid / fi_fetch_atomicvalid / fi_compare_atomicvalid /
20 fi_query_atomic : Indicates if a provider supports a specific atomic
21 operation
22
24 #include <rdma/fi_atomic.h>
25
26 ssize_t fi_atomic(struct fid_ep *ep, const void *buf,
27 size_t count, void *desc, fi_addr_t dest_addr,
28 uint64_t addr, uint64_t key,
29 enum fi_datatype datatype, enum fi_op op, void *context);
30
31 ssize_t fi_atomicv(struct fid_ep *ep, const struct fi_ioc *iov,
32 void **desc, size_t count, fi_addr_t dest_addr,
33 uint64_t addr, uint64_t key,
34 enum fi_datatype datatype, enum fi_op op, void *context);
35
36 ssize_t fi_atomicmsg(struct fid_ep *ep, const struct fi_msg_atomic *msg,
37 uint64_t flags);
38
39 ssize_t fi_inject_atomic(struct fid_ep *ep, const void *buf,
40 size_t count, fi_addr_t dest_addr,
41 uint64_t addr, uint64_t key,
42 enum fi_datatype datatype, enum fi_op op);
43
44 ssize_t fi_fetch_atomic(struct fid_ep *ep, const void *buf,
45 size_t count, void *desc, void *result, void *result_desc,
46 fi_addr_t dest_addr, uint64_t addr, uint64_t key,
47 enum fi_datatype datatype, enum fi_op op, void *context);
48
49 ssize_t fi_fetch_atomicv(struct fid_ep *ep, const struct fi_ioc *iov,
50 void **desc, size_t count, struct fi_ioc *resultv,
51 void **result_desc, size_t result_count, fi_addr_t dest_addr,
52 uint64_t addr, uint64_t key, enum fi_datatype datatype,
53 enum fi_op op, void *context);
54
55 ssize_t fi_fetch_atomicmsg(struct fid_ep *ep,
56 const struct fi_msg_atomic *msg, struct fi_ioc *resultv,
57 void **result_desc, size_t result_count, uint64_t flags);
58
59 ssize_t fi_compare_atomic(struct fid_ep *ep, const void *buf,
60 size_t count, void *desc, const void *compare,
61 void *compare_desc, void *result, void *result_desc,
62 fi_addr_t dest_addr, uint64_t addr, uint64_t key,
63 enum fi_datatype datatype, enum fi_op op, void *context);
64
65 size_t fi_compare_atomicv(struct fid_ep *ep, const struct fi_ioc *iov,
66 void **desc, size_t count, const struct fi_ioc *comparev,
67 void **compare_desc, size_t compare_count, struct fi_ioc *resultv,
68 void **result_desc, size_t result_count, fi_addr_t dest_addr,
69 uint64_t addr, uint64_t key, enum fi_datatype datatype,
70 enum fi_op op, void *context);
71
72 ssize_t fi_compare_atomicmsg(struct fid_ep *ep,
73 const struct fi_msg_atomic *msg, const struct fi_ioc *comparev,
74 void **compare_desc, size_t compare_count,
75 struct fi_ioc *resultv, void **result_desc, size_t result_count,
76 uint64_t flags);
77
78 int fi_atomicvalid(struct fid_ep *ep, enum fi_datatype datatype,
79 enum fi_op op, size_t *count);
80
81 int fi_fetch_atomicvalid(struct fid_ep *ep, enum fi_datatype datatype,
82 enum fi_op op, size_t *count);
83
84 int fi_compare_atomicvalid(struct fid_ep *ep, enum fi_datatype datatype,
85 enum fi_op op, size_t *count);
86
87 int fi_query_atomic(struct fid_domain *domain,
88 enum fi_datatype datatype, enum fi_op op,
89 struct fi_atomic_attr *attr, uint64_t flags);
90
92 ep Fabric endpoint on which to initiate atomic operation.
93
94 buf Local data buffer that specifies first operand of atomic opera‐
95 tion
96
97 iov / comparev / resultv
98 Vectored data buffer(s).
99
100 count / compare_count / result_count
101 Count of vectored data entries. The number of elements refer‐
102 enced, where each element is the indicated datatype.
103
104 addr Address of remote memory to access.
105
106 key Protection key associated with the remote memory.
107
108 datatype
109 Datatype associated with atomic operands
110
111 op Atomic operation to perform
112
113 compare
114 Local compare buffer, containing comparison data.
115
116 result Local data buffer to store initial value of remote buffer
117
118 desc / compare_desc / result_desc
119 Data descriptor associated with the local data buffer, local
120 compare buffer, and local result buffer, respectively. See
121 fi_mr(3).
122
123 dest_addr
124 Destination address for connectionless atomic operations. Ig‐
125 nored for connected endpoints.
126
127 msg Message descriptor for atomic operations
128
129 flags Additional flags to apply for the atomic operation
130
131 context
132 User specified pointer to associate with the operation. This
133 parameter is ignored if the operation will not generate a suc‐
134 cessful completion, unless an op flag specifies the context pa‐
135 rameter be used for required input.
136
138 Atomic transfers are used to read and update data located in remote
139 memory regions in an atomic fashion. Conceptually, they are similar to
140 local atomic operations of a similar nature (e.g. atomic increment,
141 compare and swap, etc.). Updates to remote data involve one of several
142 operations on the data, and act on specific types of data, as listed
143 below. As such, atomic transfers have knowledge of the format of the
144 data being accessed. A single atomic function may operate across an
145 array of data applying an atomic operation to each entry, but the atom‐
146 icity of an operation is limited to a single datatype or entry.
147
148 Atomic Data Types
149 Atomic functions may operate on one of the following identified data
150 types. A given atomic function may support any datatype, subject to
151 provider implementation constraints.
152
153 FI_INT8
154 Signed 8-bit integer.
155
156 FI_UINT8
157 Unsigned 8-bit integer.
158
159 FI_INT16
160 Signed 16-bit integer.
161
162 FI_UINT16
163 Unsigned 16-bit integer.
164
165 FI_INT32
166 Signed 32-bit integer.
167
168 FI_UINT32
169 Unsigned 32-bit integer.
170
171 FI_INT64
172 Signed 64-bit integer.
173
174 FI_UINT64
175 Unsigned 64-bit integer.
176
177 FI_FLOAT
178 A single-precision floating point value (IEEE 754).
179
180 FI_DOUBLE
181 A double-precision floating point value (IEEE 754).
182
183 FI_FLOAT_COMPLEX
184 An ordered pair of single-precision floating point values (IEEE
185 754), with the first value representing the real portion of a
186 complex number and the second representing the imaginary por‐
187 tion.
188
189 FI_DOUBLE_COMPLEX
190 An ordered pair of double-precision floating point values (IEEE
191 754), with the first value representing the real portion of a
192 complex number and the second representing the imaginary por‐
193 tion.
194
195 FI_LONG_DOUBLE
196 A double-extended precision floating point value (IEEE 754).
197 Note that the size of a long double and number of bits used for
198 precision is compiler, platform, and/or provider specific. De‐
199 velopers that use long double should ensure that libfabric is
200 built using a long double format that is compatible with their
201 application, and that format is supported by the provider. The
202 mechanism used for this validation is currently beyond the scope
203 of the libfabric API.
204
205 FI_LONG_DOUBLE_COMPLEX
206 An ordered pair of double-extended precision floating point val‐
207 ues (IEEE 754), with the first value representing the real por‐
208 tion of a complex number and the second representing the imagi‐
209 nary portion.
210
211 Atomic Operations
212 The following atomic operations are defined. An atomic operation often
213 acts against a target value in the remote memory buffer and source val‐
214 ue provided with the atomic function. It may also carry source data to
215 replace the target value in compare and swap operations. A conceptual
216 description of each operation is provided.
217
218 FI_MIN Minimum
219
220 if (buf[i] < addr[i])
221 addr[i] = buf[i]
222
223 FI_MAX Maximum
224
225 if (buf[i] > addr[i])
226 addr[i] = buf[i]
227
228 FI_SUM Sum
229
230 addr[i] = addr[i] + buf[i]
231
232 FI_PROD
233 Product
234
235 addr[i] = addr[i] * buf[i]
236
237 FI_LOR Logical OR
238
239 addr[i] = (addr[i] || buf[i])
240
241 FI_LAND
242 Logical AND
243
244 addr[i] = (addr[i] && buf[i])
245
246 FI_BOR Bitwise OR
247
248 addr[i] = addr[i] | buf[i]
249
250 FI_BAND
251 Bitwise AND
252
253 addr[i] = addr[i] & buf[i]
254
255 FI_LXOR
256 Logical exclusive-OR (XOR)
257
258 addr[i] = ((addr[i] && !buf[i]) || (!addr[i] && buf[i]))
259
260 FI_BXOR
261 Bitwise exclusive-OR (XOR)
262
263 addr[i] = addr[i] ^ buf[i]
264
265 FI_ATOMIC_READ
266 Read data atomically
267
268 result[i] = addr[i]
269
270 FI_ATOMIC_WRITE
271 Write data atomically
272
273 addr[i] = buf[i]
274
275 FI_CSWAP
276 Compare values and if equal swap with data
277
278 if (compare[i] == addr[i])
279 addr[i] = buf[i]
280
281 FI_CSWAP_NE
282 Compare values and if not equal swap with data
283
284 if (compare[i] != addr[i])
285 addr[i] = buf[i]
286
287 FI_CSWAP_LE
288 Compare values and if less than or equal swap with data
289
290 if (compare[i] <= addr[i])
291 addr[i] = buf[i]
292
293 FI_CSWAP_LT
294 Compare values and if less than swap with data
295
296 if (compare[i] < addr[i])
297 addr[i] = buf[i]
298
299 FI_CSWAP_GE
300 Compare values and if greater than or equal swap with data
301
302 if (compare[i] >= addr[i])
303 addr[i] = buf[i]
304
305 FI_CSWAP_GT
306 Compare values and if greater than swap with data
307
308 if (compare[i] > addr[i])
309 addr[i] = buf[i]
310
311 FI_MSWAP
312 Swap masked bits with data
313
314 addr[i] = (buf[i] & compare[i]) | (addr[i] & ~compare[i])
315
316 Base Atomic Functions
317 The base atomic functions -- fi_atomic, fi_atomicv, fi_atomicmsg -- are
318 used to transmit data to a remote node, where the specified atomic op‐
319 eration is performed against the target data. The result of a base
320 atomic function is stored at the remote memory region. The main dif‐
321 ference between atomic functions are the number and type of parameters
322 that they accept as input. Otherwise, they perform the same general
323 function.
324
325 The call fi_atomic transfers the data contained in the user-specified
326 data buffer to a remote node. For connectionless endpoints, the desti‐
327 nation endpoint is specified through the dest_addr parameter. Unless
328 the endpoint has been configured differently, the data buffer passed
329 into fi_atomic must not be touched by the application until the
330 fi_atomic call completes asynchronously. The target buffer of a base
331 atomic operation must allow for remote read an/or write access, as ap‐
332 propriate.
333
334 The fi_atomicv call adds support for a scatter-gather list to fi_atom‐
335 ic. The fi_atomicv transfers the set of data buffers referenced by the
336 ioc parameter to the remote node for processing.
337
338 The fi_inject_atomic call is an optimized version of fi_atomic. The
339 fi_inject_atomic function behaves as if the FI_INJECT transfer flag
340 were set, and FI_COMPLETION were not. That is, the data buffer is
341 available for reuse immediately on returning from from fi_inject_atom‐
342 ic, and no completion event will be generated for this atomic. The
343 completion event will be suppressed even if the endpoint has not been
344 configured with FI_SELECTIVE_COMPLETION. See the flags discussion be‐
345 low for more details. The requested message size that can be used with
346 fi_inject_atomic is limited by inject_size.
347
348 The fi_atomicmsg call supports atomic functions over both connected and
349 connectionless endpoints, with the ability to control the atomic opera‐
350 tion per call through the use of flags. The fi_atomicmsg function
351 takes a struct fi_msg_atomic as input.
352
353 struct fi_msg_atomic {
354 const struct fi_ioc *msg_iov; /* local scatter-gather array */
355 void **desc; /* local access descriptors */
356 size_t iov_count;/* # elements in ioc */
357 const void *addr; /* optional endpoint address */
358 const struct fi_rma_ioc *rma_iov; /* remote SGL */
359 size_t rma_iov_count;/* # elements in remote SGL */
360 enum fi_datatype datatype; /* operand datatype */
361 enum fi_op op; /* atomic operation */
362 void *context; /* user-defined context */
363 uint64_t data; /* optional data */
364 };
365
366 struct fi_ioc {
367 void *addr; /* local address */
368 size_t count; /* # target operands */
369 };
370
371 struct fi_rma_ioc {
372 uint64_t addr; /* target address */
373 size_t count; /* # target operands */
374 uint64_t key; /* access key */
375 };
376
377 The following list of atomic operations are usable with base atomic op‐
378 erations: FI_MIN, FI_MAX, FI_SUM, FI_PROD, FI_LOR, FI_LAND, FI_BOR,
379 FI_BAND, FI_LXOR, FI_BXOR, and FI_ATOMIC_WRITE.
380
381 Fetch-Atomic Functions
382 The fetch atomic functions -- fi_fetch_atomic, fi_fetch_atomicv, and
383 fi_fetch atomicmsg -- behave similar to the equivalent base atomic
384 function. The difference between the fetch and base atomic calls are
385 the fetch atomic routines return the initial value that was stored at
386 the target to the user. The initial value is read into the user pro‐
387 vided result buffer. The target buffer of fetch-atomic operations must
388 be enabled for remote read access.
389
390 The following list of atomic operations are usable with fetch atomic
391 operations: FI_MIN, FI_MAX, FI_SUM, FI_PROD, FI_LOR, FI_LAND, FI_BOR,
392 FI_BAND, FI_LXOR, FI_BXOR, FI_ATOMIC_READ, and FI_ATOMIC_WRITE.
393
394 For FI_ATOMIC_READ operations, the source buffer operand (e.g.
395 fi_fetch_atomic buf parameter) is ignored and may be NULL. The results
396 are written into the result buffer.
397
398 Compare-Atomic Functions
399 The compare atomic functions -- fi_compare_atomic, fi_compare_atomicv,
400 and fi_compare atomicmsg -- are used for operations that require com‐
401 paring the target data against a value before performing a swap opera‐
402 tion. The compare atomic functions support: FI_CSWAP, FI_CSWAP_NE,
403 FI_CSWAP_LE, FI_CSWAP_LT, FI_CSWAP_GE, FI_CSWAP_GT, and FI_MSWAP.
404
405 Atomic Valid Functions
406 The atomic valid functions -- fi_atomicvalid, fi_fetch_atomicvalid, and
407 fi_compare_atomicvalid --indicate which operations the local provider
408 supports. Needed operations not supported by the provider must be emu‐
409 lated by the application. Each valid call corresponds to a set of
410 atomic functions. fi_atomicvalid checks whether a provider supports a
411 specific base atomic operation for a given datatype and operation.
412 fi_fetch_atomicvalid indicates if a provider supports a specific
413 fetch-atomic operation for a given datatype and operation. And fi_com‐
414 pare_atomicvalid checks if a provider supports a specified com‐
415 pare-atomic operation for a given datatype and operation.
416
417 If an operation is supported, an atomic valid call will return 0, along
418 with a count of atomic data units that a single function call will op‐
419 erate on.
420
421 Query Atomic Attributes
422 The fi_query_atomic call acts as an enhanced atomic valid operation
423 (see the atomic valid function definitions above). It is provided, in
424 part, for future extensibility. The query operation reports which
425 atomic operations are supported by the domain, for suitably configured
426 endpoints.
427
428 The behavior of fi_query_atomic is adjusted based on the flags parame‐
429 ter. If flags is 0, then the operation reports the supported atomic
430 attributes for base atomic operations, similar to fi_atomicvalid for
431 endpoints. If flags has the FI_FETCH_ATOMIC bit set, the operation be‐
432 haves similar to fi_fetch_atomicvalid. Similarly, the flag bit FI_COM‐
433 PARE_ATOMIC results in query acting as fi_compare_atomicvalid. The
434 FI_FETCH_ATOMIC and FI_COMPARE_ATOMIC bits may not both be set.
435
436 If the FI_TAGGED bit is set, the provider will indicate if it supports
437 atomic operations to tagged receive buffers. The FI_TAGGED bit may be
438 used by itself, or in conjunction with the FI_FETCH_ATOMIC and FI_COM‐
439 PARE_ATOMIC flags.
440
441 The output of fi_query_atomic is struct fi_atomic_attr:
442
443 struct fi_atomic_attr {
444 size_t count;
445 size_t size;
446 };
447
448 The count attribute field is as defined for the atomic valid calls.
449 The size field indicates the size in bytes of the atomic datatype. The
450 size field is useful for datatypes that may differ in sizes based on
451 the platform or compiler, such FI_LONG_DOUBLE.
452
453 Completions
454 Completed atomic operations are reported to the initiator of the re‐
455 quest through an associated completion queue or counter. Any user pro‐
456 vided context specified with the request will be returned as part of
457 any completion event written to a CQ. See fi_cq for completion event
458 details.
459
460 Any results returned to the initiator as part of an atomic operation
461 will be available prior to a completion event being generated. This
462 will be true even if the requested completion semantic provides a weak‐
463 er guarantee. That is, atomic fetch operations have FI_DELIVERY_COM‐
464 PLETE semantics. Completions generated for other types of atomic oper‐
465 ations indicate that it is safe to re-use the source data buffers.
466
467 Any updates to data at the target of an atomic operation will be visi‐
468 ble to agents (CPU processes, NICs, and other devices) on the target
469 node prior to one of the following occurring. If the atomic operation
470 generates a completion event or updates a completion counter at the
471 target endpoint, the results will be available prior to the completion
472 notification. After processing a completion for the atomic, if the
473 initiator submits a transfer between the same endpoints that generates
474 a completion at the target, the results will be available prior to the
475 subsequent transfer's event. Or, if a fenced data transfer from the
476 initiator follows the atomic request, the results will be available
477 prior to a completion at the target for the fenced transfer.
478
479 The correctness of atomic operations on a target memory region is guar‐
480 anteed only when performed by a single actor for a given window of
481 time. An actor is defined as a single libfabric domain (identified by
482 the domain name, and not an open instance of that domain), a coherent
483 CPU complex, or other device (e.g. GPU) capable of performing atomic
484 operations on the target memory. The results of atomic operations per‐
485 formed by multiple actors simultaneously are undefined. For example,
486 issuing CPU based atomic operations to a target region concurrently be‐
487 ing updated by NIC based atomics may leave the region's data in an un‐
488 known state. The results of a first actor's atomic operations must be
489 visible to a second actor prior to the second actor issuing its own
490 atomics.
491
493 The fi_atomicmsg, fi_fetch_atomicmsg, and fi_compare_atomicmsg calls
494 allow the user to specify flags which can change the default data
495 transfer operation. Flags specified with atomic message operations
496 override most flags previously configured with the endpoint, except
497 where noted (see fi_control). The following list of flags are usable
498 with atomic message calls.
499
500 FI_COMPLETION
501 Indicates that a completion entry should be generated for the
502 specified operation. The endpoint must be bound to a completion
503 queue with FI_SELECTIVE_COMPLETION that corresponds to the spec‐
504 ified operation, or this flag is ignored.
505
506 FI_MORE
507 Indicates that the user has additional requests that will imme‐
508 diately be posted after the current call returns. Use of this
509 flag may improve performance by enabling the provider to opti‐
510 mize its access to the fabric hardware.
511
512 FI_INJECT
513 Indicates that the control of constant data buffers should be
514 returned to the user immediately after the call returns, even if
515 the operation is handled asynchronously. This may require that
516 the underlying provider implementation copy the data into a lo‐
517 cal buffer and transfer out of that buffer. Constant data buf‐
518 fers refers to any data buffer or iovec used by the atomic APIs
519 that are marked as 'const'. Non-constant or output buffers are
520 unaffected by this flag and may be accessed by the provider at
521 anytime until the operation has completed. This flag can only
522 be used with messages smaller than inject_size.
523
524 FI_FENCE
525 Applies to transmits. Indicates that the requested operation,
526 also known as the fenced operation, and any operation posted af‐
527 ter the fenced operation will be deferred until all previous op‐
528 erations targeting the same peer endpoint have completed. Oper‐
529 ations posted after the fencing will see and/or replace the re‐
530 sults of any operations initiated prior to the fenced operation.
531
532 The ordering of operations starting at the posting of the fenced opera‐
533 tion (inclusive) to the posting of a subsequent fenced operation (ex‐
534 clusive) is controlled by the endpoint's ordering semantics.
535
536 FI_TAGGED
537 Specifies that the target of the atomic operation is a tagged
538 receive buffer instead of an RMA buffer. When a tagged buffer
539 is the target memory region, the addr parameter is used as a
540 0-based byte offset into the tagged buffer, with the key parame‐
541 ter specifying the tag.
542
544 Returns 0 on success. On error, a negative value corresponding to fab‐
545 ric errno is returned. Fabric errno values are defined in rdma/fi_er‐
546 rno.h.
547
549 -FI_EAGAIN
550 See fi_msg(3) for a detailed description of handling FI_EAGAIN.
551
552 -FI_EOPNOTSUPP
553 The requested atomic operation is not supported on this end‐
554 point.
555
556 -FI_EMSGSIZE
557 The number of atomic operations in a single request exceeds that
558 supported by the underlying provider.
559
561 Atomic operations operate on an array of values of a specific data
562 type. Atomicity is only guaranteed for each data type operation, not
563 across the entire array. The following pseudo-code demonstrates this
564 operation for 64-bit unsigned atomic write. ATOMIC_WRITE_U64 is a
565 platform dependent macro that atomically writes 8 bytes to an aligned
566 memory location.
567
568 fi_atomic(ep, buf, count, NULL, dest_addr, addr, key,
569 FI_UINT64, FI_ATOMIC_WRITE, context)
570 {
571 for (i = 1; i < count; i ++)
572 ATOMIC_WRITE_U64(((uint64_t *) addr)[i],
573 ((uint64_t *) buf)[i]);
574 }
575
576 The number of array elements to operate on is specified through a count
577 parameter. This must be between 1 and the maximum returned through the
578 relevant valid operation, inclusive. The requested operation and data
579 type must also be valid for the given provider.
580
581 The ordering of atomic operations carried as part of different request
582 messages is subject to the message and data ordering definitions as‐
583 signed to the transmitting and receiving endpoints. Both message and
584 data ordering are required if the results of two atomic operations to
585 the same memory buffers are to reflect the second operation acting on
586 the results of the first. See fi_endpoint(3) for further details and
587 message size restrictions.
588
590 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cq(3), fi_rma(3)
591
593 OpenFabrics.
594
595
596
597Libfabric Programmer's Manual 2020-10-14 fi_atomic(3)