1fi_mr(3) Libfabric v1.17.0 fi_mr(3)
2
3
4
6 fi_mr - Memory region operations
7
8 fi_mr_reg / fi_mr_regv / fi_mr_regattr
9 Register local memory buffers for direct fabric access
10
11 fi_close
12 Deregister registered memory buffers.
13
14 fi_mr_desc
15 Return a local descriptor associated with a registered memory
16 region
17
18 fi_mr_key
19 Return the remote key needed to access a registered memory re‐
20 gion
21
22 fi_mr_raw_attr
23 Return raw memory region attributes.
24
25 fi_mr_map_raw
26 Converts a raw memory region key into a key that is usable for
27 data transfer operations.
28
29 fi_mr_unmap_key
30 Releases a previously mapped raw memory region key.
31
32 fi_mr_bind
33 Associate a registered memory region with a completion counter
34 or an endpoint.
35
36 fi_mr_refresh
37 Updates the memory pages associated with a memory region.
38
39 fi_mr_enable
40 Enables a memory region for use.
41
43 #include <rdma/fi_domain.h>
44
45 int fi_mr_reg(struct fid_domain *domain, const void *buf, size_t len,
46 uint64_t access, uint64_t offset, uint64_t requested_key,
47 uint64_t flags, struct fid_mr **mr, void *context);
48
49 int fi_mr_regv(struct fid_domain *domain, const struct iovec * iov,
50 size_t count, uint64_t access, uint64_t offset, uint64_t requested_key,
51 uint64_t flags, struct fid_mr **mr, void *context);
52
53 int fi_mr_regattr(struct fid_domain *domain, const struct fi_mr_attr *attr,
54 uint64_t flags, struct fid_mr **mr);
55
56 int fi_close(struct fid *mr);
57
58 void * fi_mr_desc(struct fid_mr *mr);
59
60 uint64_t fi_mr_key(struct fid_mr *mr);
61
62 int fi_mr_raw_attr(struct fid_mr *mr, uint64_t *base_addr,
63 uint8_t *raw_key, size_t *key_size, uint64_t flags);
64
65 int fi_mr_map_raw(struct fid_domain *domain, uint64_t base_addr,
66 uint8_t *raw_key, size_t key_size, uint64_t *key, uint64_t flags);
67
68 int fi_mr_unmap_key(struct fid_domain *domain, uint64_t key);
69
70 int fi_mr_bind(struct fid_mr *mr, struct fid *bfid, uint64_t flags);
71
72 int fi_mr_refresh(struct fid_mr *mr, const struct iovec *iov,
73 size_t count, uint64_t flags);
74
75 int fi_mr_enable(struct fid_mr *mr);
76
78 domain Resource domain
79
80 mr Memory region
81
82 bfid Fabric identifier of an associated resource.
83
84 context
85 User specified context associated with the memory region.
86
87 buf Memory buffer to register with the fabric hardware.
88
89 len Length of memory buffer to register. Must be > 0.
90
91 iov Vectored memory buffer.
92
93 count Count of vectored buffer entries.
94
95 access Memory access permissions associated with registration
96
97 offset Optional specified offset for accessing specified registered
98 buffers. This parameter is reserved for future use and must be
99 0.
100
101 requested_key
102 Requested remote key associated with registered buffers. Param‐
103 eter is ignored if FI_MR_PROV_KEY flag is set in the domain
104 mr_mode bits.
105
106 attr Memory region attributes
107
108 flags Additional flags to apply to the operation.
109
111 Registered memory regions associate memory buffers with permissions
112 granted for access by fabric resources. A memory buffer must be regis‐
113 tered with a resource domain before it can be used as the target of a
114 remote RMA or atomic data transfer. Additionally, a fabric provider
115 may require that data buffers be registered before being used in local
116 transfers. Memory registration restrictions are controlled using a
117 separate set of mode bits, specified through the domain attributes
118 (mr_mode field). Each mr_mode bit requires that an application take
119 specific steps in order to use memory buffers with libfabric inter‐
120 faces.
121
122 The following apply to memory registration.
123
124 Default Memory Registration
125 If no mr_mode bits are set, the default behaviors describe below
126 are followed. Historically, these defaults were collectively
127 referred to as scalable memory registration. The default re‐
128 quirements are outlined below, followed by definitions of how
129 each mr_mode bit alters the definition.
130
131 Compatibility: For library versions 1.4 and earlier, this was indicated
132 by setting mr_mode to FI_MR_SCALABLE and the fi_info mode bit FI_LO‐
133 CAL_MR to 0. FI_MR_SCALABLE and FI_LOCAL_MR were deprecated in libfab‐
134 ric version 1.5, though they are supported for backwards compatibility
135 purposes.
136
137 For security, memory registration is required for data buffers that are
138 accessed directly by a peer process. For example, registration is re‐
139 quired for RMA target buffers (read or written to), and those accessed
140 by atomic or collective operations.
141
142 By default, registration occurs on virtual address ranges. Because
143 registration refers to address ranges, rather than allocated data buf‐
144 fers, the address ranges do not need to map to data buffers allocated
145 by the application at the time the registration call is made. That is,
146 an application can register any range of addresses in their virtual ad‐
147 dress space, whether or not those addresses are backed by physical
148 pages or have been allocated.
149
150 Note that physical pages must back addresses prior to the addresses be‐
151 ing accessed as part of a data transfer operation, or the data trans‐
152 fers will fail. Additionally, depending on the operation, this could
153 result in the local process receiving a segmentation fault for access‐
154 ing invalid memory.
155
156 Once registered, the resulting memory regions are accessible by peers
157 starting at a base address of 0. That is, the target address that is
158 specified is a byte offset into the registered region.
159
160 The application also selects the access key associated with the MR.
161 The key size is restricted to a maximum of 8 bytes.
162
163 With scalable registration, locally accessed data buffers are not reg‐
164 istered. This includes source buffers for all transmit operations –
165 sends, tagged sends, RMA, and atomics – as well as buffers posted for
166 receive and tagged receive operations.
167
168 Although the default memory registration behavior is convenient for ap‐
169 plication developers, it is difficult to implement in hardware. At‐
170 tempts to hide the hardware requirements from the application often re‐
171 sults in significant and unacceptable impacts to performance. The fol‐
172 lowing mr_mode bits are provided as input into fi_getinfo. If a
173 provider requires the behavior defined for an mr_mode bit, it will
174 leave the bit set on output to fi_getinfo. Otherwise, the provider can
175 clear the bit to indicate that the behavior is not needed.
176
177 By setting an mr_mode bit, the application has agreed to adjust its be‐
178 havior as indicated. Importantly, applications that choose to support
179 an mr_mode must be prepared to handle the case where the mr_mode is not
180 required. A provider will clear an mr_mode bit if it is not needed.
181
182 FI_MR_LOCAL
183 When the FI_MR_LOCAL mode bit is set, applications must register
184 all data buffers that will be accessed by the local hardware and
185 provide a valid desc parameter into applicable data transfer op‐
186 erations. When FI_MR_LOCAL is zero, applications are not re‐
187 quired to register data buffers before using them for local op‐
188 erations (e.g. send and receive data buffers). The desc parame‐
189 ter into data transfer operations will be ignored in this case,
190 unless otherwise required (e.g. se FI_MR_HMEM). It is recom‐
191 mended that applications pass in NULL for desc when not re‐
192 quired.
193
194 A provider may hide local registration requirements from applications
195 by making use of an internal registration cache or similar mechanisms.
196 Such mechanisms, however, may negatively impact performance for some
197 applications, notably those which manage their own network buffers. In
198 order to support as broad range of applications as possible, without
199 unduly affecting their performance, applications that wish to manage
200 their own local memory registrations may do so by using the memory reg‐
201 istration calls.
202
203 Note: the FI_MR_LOCAL mr_mode bit replaces the FI_LOCAL_MR fi_info mode
204 bit. When FI_MR_LOCAL is set, FI_LOCAL_MR is ignored.
205
206 FI_MR_RAW
207 Raw memory regions are used to support providers with keys larg‐
208 er than 64-bits or require setup at the peer. When the
209 FI_MR_RAW bit is set, applications must use fi_mr_raw_attr() lo‐
210 cally and fi_mr_map_raw() at the peer before targeting a memory
211 region as part of any data transfer request.
212
213 FI_MR_VIRT_ADDR
214 The FI_MR_VIRT_ADDR bit indicates that the provider references
215 memory regions by virtual address, rather than a 0-based offset.
216 Peers that target memory regions registered with FI_MR_VIRT_ADDR
217 specify the destination memory buffer using the target’s virtual
218 address, with any offset into the region specified as virtual
219 address + offset. Support of this bit typically implies that
220 peers must exchange addressing data prior to initiating any RMA
221 or atomic operation.
222
223 FI_MR_ALLOCATED
224 When set, all registered memory regions must be backed by physi‐
225 cal memory pages at the time the registration call is made.
226
227 FI_MR_PROV_KEY
228 This memory region mode indicates that the provider does not
229 support application requested MR keys. MR keys are returned by
230 the provider. Applications that support FI_MR_PROV_KEY can ob‐
231 tain the provider key using fi_mr_key(), unless FI_MR_RAW is al‐
232 so set. The returned key should then be exchanged with peers
233 prior to initiating an RMA or atomic operation.
234
235 FI_MR_MMU_NOTIFY
236 FI_MR_MMU_NOTIFY is typically set by providers that support mem‐
237 ory registration against memory regions that are not necessarily
238 backed by allocated physical pages at the time the memory regis‐
239 tration occurs. (That is, FI_MR_ALLOCATED is typically 0).
240 However, such providers require that applications notify the
241 provider prior to the MR being accessed as part of a data trans‐
242 fer operation. This notification informs the provider that all
243 necessary physical pages now back the region. The notification
244 is necessary for providers that cannot hook directly into the
245 operating system page tables or memory management unit. See
246 fi_mr_refresh() for notification details.
247
248 FI_MR_RMA_EVENT
249 This mode bit indicates that the provider must configure memory
250 regions that are associated with RMA events prior to their use.
251 This includes all memory regions that are associated with com‐
252 pletion counters. When set, applications must indicate if a
253 memory region will be associated with a completion counter as
254 part of the region’s creation. This is done by passing in the
255 FI_RMA_EVENT flag to the memory registration call.
256
257 Such memory regions will be created in a disabled state and must be as‐
258 sociated with all completion counters prior to being enabled. To en‐
259 able a memory region, the application must call fi_mr_enable(). After
260 calling fi_mr_enable(), no further resource bindings may be made to the
261 memory region.
262
263 FI_MR_ENDPOINT
264 This mode bit indicates that the provider associates memory re‐
265 gions with endpoints rather than domains. Memory regions that
266 are registered with the provider are created in a disabled state
267 and must be bound to an endpoint prior to being enabled. To
268 bind the MR with an endpoint, the application must use
269 fi_mr_bind(). To enable the memory region, the application must
270 call fi_mr_enable().
271
272 FI_MR_HMEM
273 This mode bit is associated with the FI_HMEM capability. If
274 FI_MR_HMEM is set, the application must register buffers that
275 were allocated using a device call and provide a valid desc pa‐
276 rameter into applicable data transfer operations even if they
277 are only used for local operations (e.g. send and receive data
278 buffers). Device memory must be registered using the fi_mr_re‐
279 gattr call, with the iface and device fields filled out.
280
281 If FI_MR_HMEM is set, but FI_MR_LOCAL is unset, only device buffers
282 must be registered when used locally. In this case, the desc parameter
283 passed into data transfer operations must either be valid or NULL.
284 Similarly, if FI_MR_LOCAL is set, but FI_MR_HMEM is not, the desc pa‐
285 rameter must either be valid or NULL.
286
287 FI_MR_COLLECTIVE
288 This bit is associated with the FI_COLLECTIVE capability. When
289 set, the provider requires that memory regions used in collec‐
290 tion operations must explicitly be registered for use with col‐
291 lective calls. This requires registering regions passed to col‐
292 lective calls using the FI_COLLECTIVE flag.
293
294 Basic Memory Registration
295 Basic memory registration was deprecated in libfabric version
296 1.5, but is supported for backwards compatibility. Basic memory
297 registration is indicated by setting mr_mode equal to FI_MR_BA‐
298 SIC. FI_MR_BASIC must be set alone and not paired with mr_mode
299 bits. Unlike other mr_mode bits, if FI_MR_BASIC is set on input
300 to fi_getinfo(), it will not be cleared by the provider. That
301 is, setting mr_mode equal to FI_MR_BASIC forces basic registra‐
302 tion if the provider supports it.
303
304 The behavior of basic registration is equivalent to requiring the fol‐
305 lowing mr_mode bits: FI_MR_VIRT_ADDR, FI_MR_ALLOCATED, and
306 FI_MR_PROV_KEY. Additionally, providers that support basic registra‐
307 tion usually require the (deprecated) fi_info mode bit FI_LOCAL_MR,
308 which was incorporated into the FI_MR_LOCAL mr_mode bit.
309
310 The registrations functions – fi_mr_reg, fi_mr_regv, and fi_mr_regattr
311 – are used to register one or more memory regions with fabric re‐
312 sources. The main difference between registration functions are the
313 number and type of parameters that they accept as input. Otherwise,
314 they perform the same general function.
315
316 By default, memory registration completes synchronously. I.e. the
317 registration call will not return until the registration has completed.
318 Memory registration can complete asynchronous by binding the resource
319 domain to an event queue using the FI_REG_MR flag. See fi_domain_bind.
320 When memory registration is asynchronous, in order to avoid a race con‐
321 dition between the registration call returning and the corresponding
322 reading of the event from the EQ, the mr output parameter will be writ‐
323 ten before any event associated with the operation may be read by the
324 application. An asynchronous event will not be generated unless the
325 registration call returns success (0).
326
327 fi_mr_reg
328 The fi_mr_reg call registers the user-specified memory buffer with the
329 resource domain. The buffer is enabled for access by the fabric hard‐
330 ware based on the provided access permissions. See the access field
331 description for memory region attributes below.
332
333 Registered memory is associated with a local memory descriptor and, op‐
334 tionally, a remote memory key. A memory descriptor is a provider spe‐
335 cific identifier associated with registered memory. Memory descriptors
336 often map to hardware specific indices or keys associated with the mem‐
337 ory region. Remote memory keys provide limited protection against un‐
338 wanted access by a remote node. Remote accesses to a memory region
339 must provide the key associated with the registration.
340
341 Because MR keys must be provided by a remote process, an application
342 can use the requested_key parameter to indicate that a specific key
343 value be returned. Support for user requested keys is provider specif‐
344 ic and is determined by the FI_MR_PROV_KEY flag value in the mr_mode
345 domain attribute.
346
347 Remote RMA and atomic operations indicate the location within a regis‐
348 tered memory region by specifying an address. The location is refer‐
349 enced by adding the offset to either the base virtual address of the
350 buffer or to 0, depending on the mr_mode.
351
352 The offset parameter is reserved for future use and must be 0.
353
354 For asynchronous memory registration requests, the result will be re‐
355 ported to the user through an event queue associated with the resource
356 domain. If successful, the allocated memory region structure will be
357 returned to the user through the mr parameter. The mr address must re‐
358 main valid until the registration operation completes. The context
359 specified with the registration request is returned with the completion
360 event.
361
362 fi_mr_regv
363 The fi_mr_regv call adds support for a scatter-gather list to
364 fi_mr_reg. Multiple memory buffers are registered as a single memory
365 region. Otherwise, the operation is the same.
366
367 fi_mr_regattr
368 The fi_mr_regattr call is a more generic, extensible registration call
369 that allows the user to specify the registration request using a struct
370 fi_mr_attr (defined below).
371
372 fi_close
373 Fi_close is used to release all resources associated with a registering
374 a memory region. Once unregistered, further access to the registered
375 memory is not guaranteed. Active or queued operations that reference a
376 memory region being closed may fail or result in accesses to invalid
377 memory. Applications are responsible for ensuring that a MR is no
378 longer needed prior to closing it. Note that accesses to a closed MR
379 from a remote peer will result in an error at the peer. The state of
380 the local endpoint will be unaffected.
381
382 When closing the MR, there must be no opened endpoints or counters as‐
383 sociated with the MR. If resources are still associated with the MR
384 when attempting to close, the call will return -FI_EBUSY.
385
386 fi_mr_desc
387 Obtains the local memory descriptor associated with a MR. The memory
388 registration must have completed successfully before invoking this
389 call.
390
391 fi_mr_key
392 Returns the remote protection key associated with a MR. The memory
393 registration must have completed successfully before invoking this.
394 The returned key may be used in data transfer operations at a peer. If
395 the FI_RAW_MR mode bit has been set for the domain, then the memory key
396 must be obtained using the fi_mr_raw_key function instead. A return
397 value of FI_KEY_NOTAVAIL will be returned if the registration has not
398 completed or a raw memory key is required.
399
400 fi_mr_raw_attr
401 Returns the raw, remote protection key and base address associated with
402 a MR. The memory registration must have completed successfully before
403 invoking this routine. Use of this call is required if the FI_RAW_MR
404 mode bit has been set by the provider; however, it is safe to use this
405 call with any memory region.
406
407 On input, the key_size parameter should indicate the size of the
408 raw_key buffer. If the actual key is larger than what can fit into the
409 buffer, it will return -FI_ETOOSMALL. On output, key_size is set to
410 the size of the buffer needed to store the key, which may be larger
411 than the input value. The needed key_size can also be obtained through
412 the mr_key_size domain attribute (fi_domain_attr) field.
413
414 A raw key must be mapped by a peer before it can be used in data trans‐
415 fer operations. See fi_mr_map_raw below.
416
417 fi_mr_map_raw
418 Raw protection keys must be mapped to a usable key value before they
419 can be used for data transfer operations. The mapping is done by the
420 peer that initiates the RMA or atomic operation. The mapping function
421 takes as input the raw key and its size, and returns the mapped key.
422 Use of the fi_mr_map_raw function is required if the peer has the
423 FI_RAW_MR mode bit set, but this routine may be called on any valid
424 key. All mapped keys must be freed by calling fi_mr_unmap_key when ac‐
425 cess to the peer memory region is no longer necessary.
426
427 fi_mr_unmap_key
428 This call releases any resources that may have been allocated as part
429 of mapping a raw memory key. All mapped keys must be freed before the
430 corresponding domain is closed.
431
432 fi_mr_bind
433 The fi_mr_bind function associates a memory region with a counter or
434 endpoint. Counter bindings are needed by providers that support the
435 generation of completions based on fabric operations. Endpoint bind‐
436 ings are needed if the provider associates memory regions with end‐
437 points (see FI_MR_ENDPOINT).
438
439 When binding with a counter, the type of events tracked against the
440 memory region is based on the bitwise OR of the following flags.
441
442 FI_REMOTE_WRITE
443 Generates an event whenever a remote RMA write or atomic opera‐
444 tion modifies the memory region. Use of this flag requires that
445 the endpoint through which the MR is accessed be created with
446 the FI_RMA_EVENT capability.
447
448 When binding the memory region to an endpoint, flags should be 0.
449
450 fi_mr_refresh
451 The use of this call is required to notify the provider of any change
452 to the physical pages backing a registered memory region if the
453 FI_MR_MMU_NOTIFY mode bit has been set. This call informs the provider
454 that the page table entries associated with the region may have been
455 modified, and the provider should verify and update the registered re‐
456 gion accordingly. The iov parameter is optional and may be used to
457 specify which portions of the registered region requires updating.
458 Providers are only guaranteed to update the specified address ranges.
459
460 The refresh operation has the effect of disabling and re-enabling ac‐
461 cess to the registered region. Any operations from peers that attempt
462 to access the region will fail while the refresh is occurring. Addi‐
463 tionally, attempts to access the region by the local process through
464 libfabric APIs may result in a page fault or other fatal operation.
465
466 The fi_mr_refresh call is only needed if the physical pages might have
467 been updated after the memory region was created.
468
469 fi_mr_enable
470 The enable call is used with memory registration associated with the
471 FI_MR_RMA_EVENT mode bit. Memory regions created in the disabled state
472 must be explicitly enabled after being fully configured by the applica‐
473 tion. Any resource bindings to the MR must be done prior to enabling
474 the MR.
475
477 Memory regions are created using the following attributes. The struct
478 fi_mr_attr is passed into fi_mr_regattr, but individual fields also ap‐
479 ply to other memory registration calls, with the fields passed directly
480 into calls as function parameters.
481
482 struct fi_mr_attr {
483 const struct iovec *mr_iov;
484 size_t iov_count;
485 uint64_t access;
486 uint64_t offset;
487 uint64_t requested_key;
488 void *context;
489 size_t auth_key_size;
490 uint8_t *auth_key;
491 enum fi_hmem_iface iface;
492 union {
493 uint64_t reserved;
494 int cuda;
495 int ze
496 } device;
497 };
498
499 mr_iov
500 This is an IO vector of addresses that will represent a single memory
501 region. The number of entries in the iovec is specified by iov_count.
502
503 iov_count
504 The number of entries in the mr_iov array. The maximum number of memo‐
505 ry buffers that may be associated with a single memory region is speci‐
506 fied as the mr_iov_limit domain attribute. See fi_domain(3).
507
508 access
509 Indicates the type of operations that the local or a peer endpoint may
510 perform on registered memory region. Supported access permissions are
511 the bitwise OR of the following flags:
512
513 FI_SEND
514 The memory buffer may be used in outgoing message data trans‐
515 fers. This includes fi_msg and fi_tagged send operations, as
516 well as fi_collective operations.
517
518 FI_RECV
519 The memory buffer may be used to receive inbound message trans‐
520 fers. This includes fi_msg and fi_tagged receive operations, as
521 well as fi_collective operations.
522
523 FI_READ
524 The memory buffer may be used as the result buffer for RMA read
525 and atomic operations on the initiator side. Note that from the
526 viewpoint of the application, the memory buffer is being written
527 into by the network.
528
529 FI_WRITE
530 The memory buffer may be used as the source buffer for RMA write
531 and atomic operations on the initiator side. Note that from the
532 viewpoint of the application, the endpoint is reading from the
533 memory buffer and copying the data onto the network.
534
535 FI_REMOTE_READ
536 The memory buffer may be used as the source buffer of an RMA
537 read operation on the target side. The contents of the memory
538 buffer are not modified by such operations.
539
540 FI_REMOTE_WRITE
541 The memory buffer may be used as the target buffer of an RMA
542 write or atomic operation. The contents of the memory buffer
543 may be modified as a result of such operations.
544
545 FI_COLLECTIVE
546 This flag provides an explicit indication that the memory buffer
547 may be used with collective operations. Use of this flag is re‐
548 quired if the FI_MR_COLLECTIVE mr_mode bit has been set on the
549 domain. This flag should be paired with FI_SEND and/or FI_RECV
550
551 Note that some providers may not enforce fine grained access permis‐
552 sions. For example, a memory region registered for FI_WRITE access may
553 also behave as if FI_SEND were specified as well. Relaxed enforcement
554 of such access is permitted, though not guaranteed, provided security
555 is maintained.
556
557 offset
558 The offset field is reserved for future use and must be 0.
559
560 requested_key
561 An application specified access key associated with the memory region.
562 The MR key must be provided by a remote process when performing RMA or
563 atomic operations to a memory region. Applications can use the re‐
564 quested_key field to indicate that a specific key be used by the
565 provider. This allows applications to use well known key values, which
566 can avoid applications needing to exchange and store keys. Support for
567 user requested keys is provider specific and is determined by the the
568 FI_MR_PROV_KEY flag in the mr_mode domain attribute field.
569
570 context
571 Application context associated with asynchronous memory registration
572 operations. This value is returned as part of any asynchronous event
573 associated with the registration. This field is ignored for synchro‐
574 nous registration calls.
575
576 auth_key_size
577 The size of key referenced by the auth_key field in bytes, or 0 if no
578 authorization key is given. This field is ignored unless the fabric is
579 opened with API version 1.5 or greater.
580
581 auth_key
582 Indicates the key to associate with this memory registration. Autho‐
583 rization keys are used to limit communication between endpoints. Only
584 peer endpoints that are programmed to use the same authorization key
585 may access the memory region. The domain authorization key will be
586 used if the auth_key_size provided is 0. This field is ignored unless
587 the fabric is opened with API version 1.5 or greater.
588
589 iface
590 Indicates the software interfaces used by the application to allocate
591 and manage the memory region. This field is ignored unless the appli‐
592 cation has requested the FI_HMEM capability.
593
594 FI_HMEM_SYSTEM
595 Uses standard operating system calls and libraries, such as mal‐
596 loc, calloc, realloc, mmap, and free.
597
598 FI_HMEM_CUDA
599 Uses Nvidia CUDA interfaces such as cuMemAlloc, cuMemAllocHost,
600 cuMemAllocManaged, cuMemFree, cudaMalloc, cudaFree.
601
602 FI_HMEM_ROCR
603 Uses AMD ROCR interfaces such as hsa_memory_allocate and
604 hsa_memory_free.
605
606 FI_HMEM_ZE
607 Uses oneAPI Level Zero interfaces such as zeDriverAllocShared‐
608 Mem, zeDriverFreeMem.
609
610 FI_HMEM_NEURON
611 Uses the AWS Neuron SDK to support AWS Trainium devices.
612
613 FI_HMEM_SYNAPSEAI
614 Uses the SynapseAI API to support Habana Gaudi devices.
615
616 device
617 Reserved 64 bits for device identifier if using non-standard HMEM in‐
618 terface. This field is ignore unless the iface field is valid.
619
620 cuda For FI_HMEM_CUDA, this is equivalent to CUdevice (int).
621
622 ze For FI_HMEM_ZE, this is equivalent to the ze_device_handle_t in‐
623 dex (int).
624
625 neuron For FI_HMEM_NEURON, the device identifier for AWS Trainium de‐
626 vices.
627
628 synapseai
629 For FI_HMEM_SYNAPSEAI, the device identifier for Habana Gaudi
630 hardware.
631
633 Direct access to an application’s memory by a remote peer requires that
634 the application register the targeted memory buffer(s). This is typi‐
635 cally done by calling one of the fi_mr_reg* routines. For
636 FI_MR_PROV_KEY, the provider will return a key that must be used by the
637 peer when accessing the memory region. The application is responsible
638 for transferring this key to the peer. If FI_MR_RAW mode has been set,
639 the key must be retrieved using the fi_mr_raw_attr function.
640
641 FI_RAW_MR allows support for providers that require more than 8-bytes
642 for their protection keys or need additional setup before a key can be
643 used for transfers. After a raw key has been retrieved, it must be ex‐
644 changed with the remote peer. The peer must use fi_mr_map_raw to con‐
645 vert the raw key into a usable 64-bit key. The mapping must be done
646 even if the raw key is 64-bits or smaller.
647
648 The raw key support functions are usable with all registered memory re‐
649 gions, even if FI_MR_RAW has not been set. It is recommended that por‐
650 table applications target using those interfaces; however, their use
651 does carry extra message and memory footprint overhead, making it less
652 desirable for highly scalable apps.
653
654 There may be cases where device peer to peer support should not be used
655 or cannot be used, such as when the PCIe ACS configuration does not
656 permit the transfer. The FI_HMEM_DISABLE_P2P environment variable can
657 be set to notify Libfabric that peer to peer transactions should not be
658 used. The provider may choose to perform a copy instead, or will fail
659 support for FI_HMEM if the provider is unable to do that.
660
662 The follow flag may be specified to any memory registration call.
663
664 FI_RMA_EVENT
665 This flag indicates that the specified memory region will be as‐
666 sociated with a completion counter used to count RMA operations
667 that access the MR.
668
669 FI_RMA_PMEM
670 This flag indicates that the underlying memory region is backed
671 by persistent memory and will be used in RMA operations. It
672 must be specified if persistent completion semantics or persis‐
673 tent data transfers are required when accessing the registered
674 region.
675
676 FI_HMEM_DEVICE_ONLY
677 This flag indicates that the memory is only accessible by a de‐
678 vice. Which device is specified by the fi_mr_attr fields iface
679 and device. This refers to memory regions that were allocated
680 using a device API AllocDevice call (as opposed to using the
681 host allocation or unified/shared memory allocation).
682
683 FI_HMEM_HOST_ALLOC
684 This flag indicates that the memory is owned by the host only.
685 Whether it can be accessed by the device is implementation de‐
686 pendent. The fi_mr_attr field iface is still used to identify
687 the device API, but the field device is ignored. This refers to
688 memory regions that were allocated using a device API AllocHost
689 call (as opposed to using malloc-like host allocation, uni‐
690 fied/shared memory allocation, or AllocDevice).
691
693 Memory domains identify the physical separation of memory which may or
694 may not be accessible through the same virtual address space. Tradi‐
695 tionally, applications only dealt with a single memory domain, that of
696 host memory tightly coupled with the system CPUs. With the introduc‐
697 tion of device and non-uniform memory subsystems, applications often
698 need to be aware of which memory domain a particular virtual address
699 maps to.
700
701 As a general rule, separate physical devices can be considered to have
702 their own memory domains. For example, a NIC may have user accessible
703 memory, and would be considered a separate memory domain from memory on
704 a GPU. Both the NIC and GPU memory domains are separate from host sys‐
705 tem memory. Individual GPUs or computation accelerators may have dis‐
706 tinct memory domains, or may be connected in such a way (e.g. a GPU
707 specific fabric) that all GPUs would belong to the same memory domain.
708 Unfortunately, identifying memory domains is specific to each system
709 and its physical and/or virtual configuration.
710
711 Understanding memory domains in heterogenous memory environments is im‐
712 portant as it can impact data ordering and visibility as viewed by an
713 application. It is also important to understand which memory domain an
714 application is most tightly coupled to. In most cases, applications
715 are tightly coupled to host memory. However, an application running
716 directly on a GPU or NIC may be more tightly coupled to memory associ‐
717 ated with those devices.
718
719 Memory regions are often associated with a single memory domain. The
720 domain is often indicated by the fi_mr_attr iface and device fields.
721 Though it is possible for physical pages backing a virtual memory re‐
722 gion to migrate between memory domains based on access patterns. For
723 example, the physical pages referenced by a virtual address range could
724 migrate between host memory and GPU memory, depending on which computa‐
725 tional unit is actively using it.
726
727 See the fi_endpoint(3) and fi_cq(3) man pages for addition discussion
728 on message, data, and completion ordering semantics, including the im‐
729 pact of memory domains.
730
732 Returns 0 on success. On error, a negative value corresponding to fab‐
733 ric errno is returned.
734
735 Fabric errno values are defined in rdma/fi_errno.h.
736
738 -FI_ENOKEY
739 The requested_key is already in use.
740
741 -FI_EKEYREJECTED
742 The requested_key is not available. They key may be out of the
743 range supported by the provider, or the provider may not support
744 user-requested memory registration keys.
745
746 -FI_ENOSYS
747 Returned by fi_mr_bind if the provider does not support report‐
748 ing events based on access to registered memory regions.
749
750 -FI_EBADFLAGS
751 Returned if the specified flags are not supported by the
752 provider.
753
755 Many hardware NICs accessed by libfabric require that data buffers be
756 registered with the hardware while the hardware accesses it. This en‐
757 sures that the virtual to physical address mappings for those buffers
758 do not change while the transfer is occurring. The performance impact
759 of registering memory regions can be significant. As a result, some
760 providers make use of a registration cache, particularly when working
761 with applications that are unable to manage their own network buffers.
762 A registration cache avoids the overhead of registering and unregister‐
763 ing a data buffer with each transfer.
764
765 If a registration cache is going to be used for host and device memory,
766 the device must support unified virtual addressing. If the device does
767 not support unified virtual addressing, either an additional registra‐
768 tion cache is required to track this device memory, or device memory
769 cannot be cached.
770
771 As a general rule, if hardware requires the FI_MR_LOCAL mode bit de‐
772 scribed above, but this is not supported by the application, a memory
773 registration cache may be in use. The following environment variables
774 may be used to configure registration caches.
775
776 FI_MR_CACHE_MAX_SIZE
777 This defines the total number of bytes for all memory regions
778 that may be tracked by the cache. If not set, the cache has no
779 limit on how many bytes may be registered and cached. Setting
780 this will reduce the amount of memory that is not actively being
781 used as part of a data transfer that is registered with a
782 provider. By default, the cache size is unlimited.
783
784 FI_MR_CACHE_MAX_COUNT
785 This defines the total number of memory regions that may be reg‐
786 istered with the cache. If not set, a default limit is chosen.
787 Setting this will reduce the number of regions that are regis‐
788 tered, regardless of their size, which are not actively being
789 used as part of a data transfer. Setting this to zero will dis‐
790 able registration caching.
791
792 FI_MR_CACHE_MONITOR
793 The cache monitor is responsible for detecting system memory
794 (FI_HMEM_SYSTEM) changes made between the virtual addresses used
795 by an application and the underlying physical pages. Valid mon‐
796 itor options are: userfaultfd, memhooks, and disabled. Select‐
797 ing disabled will turn off the registration cache. Userfaultfd
798 is a Linux kernel feature used to report virtual to physical ad‐
799 dress mapping changes to user space. Memhooks operates by in‐
800 tercepting relevant memory allocation and deallocation calls
801 which may result in the mappings changing, such as malloc, mmap,
802 free, etc. Note that memhooks operates at the elf linker layer,
803 and does not use glibc memory hooks.
804
805 FI_MR_CUDA_CACHE_MONITOR_ENABLED
806 The CUDA cache monitor is responsible for detecting CUDA device
807 memory (FI_HMEM_CUDA) changes made between the device virtual
808 addresses used by an application and the underlying device phys‐
809 ical pages. Valid monitor options are: 0 or 1. Note that the
810 CUDA memory monitor requires a CUDA toolkit version with unified
811 virtual addressing enabled.
812
813 FI_MR_ROCR_CACHE_MONITOR_ENABLED
814 The ROCR cache monitor is responsible for detecting ROCR device
815 memory (FI_HMEM_ROCR) changes made between the device virtual
816 addresses used by an application and the underlying device phys‐
817 ical pages. Valid monitor options are: 0 or 1. Note that the
818 ROCR memory monitor requires a ROCR version with unified virtual
819 addressing enabled.
820
821 FI_MR_ZE_CACHE_MONITOR_ENABLED
822 The ZE cache monitor is responsible for detecting oneAPI Level
823 Zero device memory (FI_HMEM_ZE) changes made between the device
824 virtual addresses used by an application and the underlying de‐
825 vice physical pages. Valid monitor options are: 0 or 1.
826
827 More direct access to the internal registration cache is possible
828 through the fi_open() call, using the “mr_cache” service name. Once
829 opened, custom memory monitors may be installed. A memory monitor is a
830 component of the cache responsible for detecting changes in virtual to
831 physical address mappings. Some level of control over the cache is
832 possible through the above mentioned environment variables.
833
835 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_rma(3), fi_msg(3),
836 fi_atomic(3)
837
839 OpenFabrics.
840
841
842
843Libfabric Programmer’s Manual 2022-12-11 fi_mr(3)