1fi_mr(3) Libfabric v1.18.1 fi_mr(3)
2
3
4
6 fi_mr - Memory region operations
7
8 fi_mr_reg / fi_mr_regv / fi_mr_regattr
9 Register local memory buffers for direct fabric access
10
11 fi_close
12 Deregister registered memory buffers.
13
14 fi_mr_desc
15 Return a local descriptor associated with a registered memory
16 region
17
18 fi_mr_key
19 Return the remote key needed to access a registered memory re‐
20 gion
21
22 fi_mr_raw_attr
23 Return raw memory region attributes.
24
25 fi_mr_map_raw
26 Converts a raw memory region key into a key that is usable for
27 data transfer operations.
28
29 fi_mr_unmap_key
30 Releases a previously mapped raw memory region key.
31
32 fi_mr_bind
33 Associate a registered memory region with a completion counter
34 or an endpoint.
35
36 fi_mr_refresh
37 Updates the memory pages associated with a memory region.
38
39 fi_mr_enable
40 Enables a memory region for use.
41
42 fi_hmem_ze_device
43 Returns an hmem device identifier for a level zero driver and
44 device.
45
47 #include <rdma/fi_domain.h>
48
49 int fi_mr_reg(struct fid_domain *domain, const void *buf, size_t len,
50 uint64_t access, uint64_t offset, uint64_t requested_key,
51 uint64_t flags, struct fid_mr **mr, void *context);
52
53 int fi_mr_regv(struct fid_domain *domain, const struct iovec * iov,
54 size_t count, uint64_t access, uint64_t offset, uint64_t requested_key,
55 uint64_t flags, struct fid_mr **mr, void *context);
56
57 int fi_mr_regattr(struct fid_domain *domain, const struct fi_mr_attr *attr,
58 uint64_t flags, struct fid_mr **mr);
59
60 int fi_close(struct fid *mr);
61
62 void * fi_mr_desc(struct fid_mr *mr);
63
64 uint64_t fi_mr_key(struct fid_mr *mr);
65
66 int fi_mr_raw_attr(struct fid_mr *mr, uint64_t *base_addr,
67 uint8_t *raw_key, size_t *key_size, uint64_t flags);
68
69 int fi_mr_map_raw(struct fid_domain *domain, uint64_t base_addr,
70 uint8_t *raw_key, size_t key_size, uint64_t *key, uint64_t flags);
71
72 int fi_mr_unmap_key(struct fid_domain *domain, uint64_t key);
73
74 int fi_mr_bind(struct fid_mr *mr, struct fid *bfid, uint64_t flags);
75
76 int fi_mr_refresh(struct fid_mr *mr, const struct iovec *iov,
77 size_t count, uint64_t flags);
78
79 int fi_mr_enable(struct fid_mr *mr);
80
81 int fi_hmem_ze_device(int driver_index, int device_index);
82
84 domain Resource domain
85
86 mr Memory region
87
88 bfid Fabric identifier of an associated resource.
89
90 context
91 User specified context associated with the memory region.
92
93 buf Memory buffer to register with the fabric hardware.
94
95 len Length of memory buffer to register. Must be > 0.
96
97 iov Vectored memory buffer.
98
99 count Count of vectored buffer entries.
100
101 access Memory access permissions associated with registration
102
103 offset Optional specified offset for accessing specified registered
104 buffers. This parameter is reserved for future use and must be
105 0.
106
107 requested_key
108 Requested remote key associated with registered buffers. Param‐
109 eter is ignored if FI_MR_PROV_KEY flag is set in the domain
110 mr_mode bits.
111
112 attr Memory region attributes
113
114 flags Additional flags to apply to the operation.
115
117 Registered memory regions associate memory buffers with permissions
118 granted for access by fabric resources. A memory buffer must be regis‐
119 tered with a resource domain before it can be used as the target of a
120 remote RMA or atomic data transfer. Additionally, a fabric provider
121 may require that data buffers be registered before being used in local
122 transfers. Memory registration restrictions are controlled using a
123 separate set of mode bits, specified through the domain attributes
124 (mr_mode field). Each mr_mode bit requires that an application take
125 specific steps in order to use memory buffers with libfabric inter‐
126 faces.
127
128 The following apply to memory registration.
129
130 Default Memory Registration
131 If no mr_mode bits are set, the default behaviors describe below
132 are followed. Historically, these defaults were collectively
133 referred to as scalable memory registration. The default re‐
134 quirements are outlined below, followed by definitions of how
135 each mr_mode bit alters the definition.
136
137 Compatibility: For library versions 1.4 and earlier, this was indicated
138 by setting mr_mode to FI_MR_SCALABLE and the fi_info mode bit FI_LO‐
139 CAL_MR to 0. FI_MR_SCALABLE and FI_LOCAL_MR were deprecated in libfab‐
140 ric version 1.5, though they are supported for backwards compatibility
141 purposes.
142
143 For security, memory registration is required for data buffers that are
144 accessed directly by a peer process. For example, registration is re‐
145 quired for RMA target buffers (read or written to), and those accessed
146 by atomic or collective operations.
147
148 By default, registration occurs on virtual address ranges. Because
149 registration refers to address ranges, rather than allocated data buf‐
150 fers, the address ranges do not need to map to data buffers allocated
151 by the application at the time the registration call is made. That is,
152 an application can register any range of addresses in their virtual ad‐
153 dress space, whether or not those addresses are backed by physical
154 pages or have been allocated.
155
156 Note that physical pages must back addresses prior to the addresses be‐
157 ing accessed as part of a data transfer operation, or the data trans‐
158 fers will fail. Additionally, depending on the operation, this could
159 result in the local process receiving a segmentation fault for access‐
160 ing invalid memory.
161
162 Once registered, the resulting memory regions are accessible by peers
163 starting at a base address of 0. That is, the target address that is
164 specified is a byte offset into the registered region.
165
166 The application also selects the access key associated with the MR.
167 The key size is restricted to a maximum of 8 bytes.
168
169 With scalable registration, locally accessed data buffers are not reg‐
170 istered. This includes source buffers for all transmit operations –
171 sends, tagged sends, RMA, and atomics – as well as buffers posted for
172 receive and tagged receive operations.
173
174 Although the default memory registration behavior is convenient for ap‐
175 plication developers, it is difficult to implement in hardware. At‐
176 tempts to hide the hardware requirements from the application often re‐
177 sults in significant and unacceptable impacts to performance. The fol‐
178 lowing mr_mode bits are provided as input into fi_getinfo. If a
179 provider requires the behavior defined for an mr_mode bit, it will
180 leave the bit set on output to fi_getinfo. Otherwise, the provider can
181 clear the bit to indicate that the behavior is not needed.
182
183 By setting an mr_mode bit, the application has agreed to adjust its be‐
184 havior as indicated. Importantly, applications that choose to support
185 an mr_mode must be prepared to handle the case where the mr_mode is not
186 required. A provider will clear an mr_mode bit if it is not needed.
187
188 FI_MR_LOCAL
189 When the FI_MR_LOCAL mode bit is set, applications must register
190 all data buffers that will be accessed by the local hardware and
191 provide a valid desc parameter into applicable data transfer op‐
192 erations. When FI_MR_LOCAL is zero, applications are not re‐
193 quired to register data buffers before using them for local op‐
194 erations (e.g. send and receive data buffers). The desc parame‐
195 ter into data transfer operations will be ignored in this case,
196 unless otherwise required (e.g. se FI_MR_HMEM). It is recom‐
197 mended that applications pass in NULL for desc when not re‐
198 quired.
199
200 A provider may hide local registration requirements from applications
201 by making use of an internal registration cache or similar mechanisms.
202 Such mechanisms, however, may negatively impact performance for some
203 applications, notably those which manage their own network buffers. In
204 order to support as broad range of applications as possible, without
205 unduly affecting their performance, applications that wish to manage
206 their own local memory registrations may do so by using the memory reg‐
207 istration calls.
208
209 Note: the FI_MR_LOCAL mr_mode bit replaces the FI_LOCAL_MR fi_info mode
210 bit. When FI_MR_LOCAL is set, FI_LOCAL_MR is ignored.
211
212 FI_MR_RAW
213 Raw memory regions are used to support providers with keys larg‐
214 er than 64-bits or require setup at the peer. When the
215 FI_MR_RAW bit is set, applications must use fi_mr_raw_attr() lo‐
216 cally and fi_mr_map_raw() at the peer before targeting a memory
217 region as part of any data transfer request.
218
219 FI_MR_VIRT_ADDR
220 The FI_MR_VIRT_ADDR bit indicates that the provider references
221 memory regions by virtual address, rather than a 0-based offset.
222 Peers that target memory regions registered with FI_MR_VIRT_ADDR
223 specify the destination memory buffer using the target’s virtual
224 address, with any offset into the region specified as virtual
225 address + offset. Support of this bit typically implies that
226 peers must exchange addressing data prior to initiating any RMA
227 or atomic operation.
228
229 FI_MR_ALLOCATED
230 When set, all registered memory regions must be backed by physi‐
231 cal memory pages at the time the registration call is made.
232
233 FI_MR_PROV_KEY
234 This memory region mode indicates that the provider does not
235 support application requested MR keys. MR keys are returned by
236 the provider. Applications that support FI_MR_PROV_KEY can ob‐
237 tain the provider key using fi_mr_key(), unless FI_MR_RAW is al‐
238 so set. The returned key should then be exchanged with peers
239 prior to initiating an RMA or atomic operation.
240
241 FI_MR_MMU_NOTIFY
242 FI_MR_MMU_NOTIFY is typically set by providers that support mem‐
243 ory registration against memory regions that are not necessarily
244 backed by allocated physical pages at the time the memory regis‐
245 tration occurs. (That is, FI_MR_ALLOCATED is typically 0).
246 However, such providers require that applications notify the
247 provider prior to the MR being accessed as part of a data trans‐
248 fer operation. This notification informs the provider that all
249 necessary physical pages now back the region. The notification
250 is necessary for providers that cannot hook directly into the
251 operating system page tables or memory management unit. See
252 fi_mr_refresh() for notification details.
253
254 FI_MR_RMA_EVENT
255 This mode bit indicates that the provider must configure memory
256 regions that are associated with RMA events prior to their use.
257 This includes all memory regions that are associated with com‐
258 pletion counters. When set, applications must indicate if a
259 memory region will be associated with a completion counter as
260 part of the region’s creation. This is done by passing in the
261 FI_RMA_EVENT flag to the memory registration call.
262
263 Such memory regions will be created in a disabled state and must be as‐
264 sociated with all completion counters prior to being enabled. To en‐
265 able a memory region, the application must call fi_mr_enable(). After
266 calling fi_mr_enable(), no further resource bindings may be made to the
267 memory region.
268
269 FI_MR_ENDPOINT
270 This mode bit indicates that the provider associates memory re‐
271 gions with endpoints rather than domains. Memory regions that
272 are registered with the provider are created in a disabled state
273 and must be bound to an endpoint prior to being enabled. To
274 bind the MR with an endpoint, the application must use
275 fi_mr_bind(). To enable the memory region, the application must
276 call fi_mr_enable().
277
278 FI_MR_HMEM
279 This mode bit is associated with the FI_HMEM capability. If
280 FI_MR_HMEM is set, the application must register buffers that
281 were allocated using a device call and provide a valid desc pa‐
282 rameter into applicable data transfer operations even if they
283 are only used for local operations (e.g. send and receive data
284 buffers). Device memory must be registered using the fi_mr_re‐
285 gattr call, with the iface and device fields filled out.
286
287 If FI_MR_HMEM is set, but FI_MR_LOCAL is unset, only device buffers
288 must be registered when used locally. In this case, the desc parameter
289 passed into data transfer operations must either be valid or NULL.
290 Similarly, if FI_MR_LOCAL is set, but FI_MR_HMEM is not, the desc pa‐
291 rameter must either be valid or NULL.
292
293 FI_MR_COLLECTIVE
294 This bit is associated with the FI_COLLECTIVE capability. When
295 set, the provider requires that memory regions used in collec‐
296 tion operations must explicitly be registered for use with col‐
297 lective calls. This requires registering regions passed to col‐
298 lective calls using the FI_COLLECTIVE flag.
299
300 Basic Memory Registration
301 Basic memory registration was deprecated in libfabric version
302 1.5, but is supported for backwards compatibility. Basic memory
303 registration is indicated by setting mr_mode equal to FI_MR_BA‐
304 SIC. FI_MR_BASIC must be set alone and not paired with mr_mode
305 bits. Unlike other mr_mode bits, if FI_MR_BASIC is set on input
306 to fi_getinfo(), it will not be cleared by the provider. That
307 is, setting mr_mode equal to FI_MR_BASIC forces basic registra‐
308 tion if the provider supports it.
309
310 The behavior of basic registration is equivalent to requiring the fol‐
311 lowing mr_mode bits: FI_MR_VIRT_ADDR, FI_MR_ALLOCATED, and
312 FI_MR_PROV_KEY. Additionally, providers that support basic registra‐
313 tion usually require the (deprecated) fi_info mode bit FI_LOCAL_MR,
314 which was incorporated into the FI_MR_LOCAL mr_mode bit.
315
316 The registrations functions – fi_mr_reg, fi_mr_regv, and fi_mr_regattr
317 – are used to register one or more memory regions with fabric re‐
318 sources. The main difference between registration functions are the
319 number and type of parameters that they accept as input. Otherwise,
320 they perform the same general function.
321
322 By default, memory registration completes synchronously. I.e. the
323 registration call will not return until the registration has completed.
324 Memory registration can complete asynchronous by binding the resource
325 domain to an event queue using the FI_REG_MR flag. See fi_domain_bind.
326 When memory registration is asynchronous, in order to avoid a race con‐
327 dition between the registration call returning and the corresponding
328 reading of the event from the EQ, the mr output parameter will be writ‐
329 ten before any event associated with the operation may be read by the
330 application. An asynchronous event will not be generated unless the
331 registration call returns success (0).
332
333 fi_mr_reg
334 The fi_mr_reg call registers the user-specified memory buffer with the
335 resource domain. The buffer is enabled for access by the fabric hard‐
336 ware based on the provided access permissions. See the access field
337 description for memory region attributes below.
338
339 Registered memory is associated with a local memory descriptor and, op‐
340 tionally, a remote memory key. A memory descriptor is a provider spe‐
341 cific identifier associated with registered memory. Memory descriptors
342 often map to hardware specific indices or keys associated with the mem‐
343 ory region. Remote memory keys provide limited protection against un‐
344 wanted access by a remote node. Remote accesses to a memory region
345 must provide the key associated with the registration.
346
347 Because MR keys must be provided by a remote process, an application
348 can use the requested_key parameter to indicate that a specific key
349 value be returned. Support for user requested keys is provider specif‐
350 ic and is determined by the FI_MR_PROV_KEY flag value in the mr_mode
351 domain attribute.
352
353 Remote RMA and atomic operations indicate the location within a regis‐
354 tered memory region by specifying an address. The location is refer‐
355 enced by adding the offset to either the base virtual address of the
356 buffer or to 0, depending on the mr_mode.
357
358 The offset parameter is reserved for future use and must be 0.
359
360 For asynchronous memory registration requests, the result will be re‐
361 ported to the user through an event queue associated with the resource
362 domain. If successful, the allocated memory region structure will be
363 returned to the user through the mr parameter. The mr address must re‐
364 main valid until the registration operation completes. The context
365 specified with the registration request is returned with the completion
366 event.
367
368 fi_mr_regv
369 The fi_mr_regv call adds support for a scatter-gather list to
370 fi_mr_reg. Multiple memory buffers are registered as a single memory
371 region. Otherwise, the operation is the same.
372
373 fi_mr_regattr
374 The fi_mr_regattr call is a more generic, extensible registration call
375 that allows the user to specify the registration request using a struct
376 fi_mr_attr (defined below).
377
378 fi_close
379 Fi_close is used to release all resources associated with a registering
380 a memory region. Once unregistered, further access to the registered
381 memory is not guaranteed. Active or queued operations that reference a
382 memory region being closed may fail or result in accesses to invalid
383 memory. Applications are responsible for ensuring that a MR is no
384 longer needed prior to closing it. Note that accesses to a closed MR
385 from a remote peer will result in an error at the peer. The state of
386 the local endpoint will be unaffected.
387
388 When closing the MR, there must be no opened endpoints or counters as‐
389 sociated with the MR. If resources are still associated with the MR
390 when attempting to close, the call will return -FI_EBUSY.
391
392 fi_mr_desc
393 Obtains the local memory descriptor associated with a MR. The memory
394 registration must have completed successfully before invoking this
395 call.
396
397 fi_mr_key
398 Returns the remote protection key associated with a MR. The memory
399 registration must have completed successfully before invoking this.
400 The returned key may be used in data transfer operations at a peer. If
401 the FI_RAW_MR mode bit has been set for the domain, then the memory key
402 must be obtained using the fi_mr_raw_key function instead. A return
403 value of FI_KEY_NOTAVAIL will be returned if the registration has not
404 completed or a raw memory key is required.
405
406 fi_mr_raw_attr
407 Returns the raw, remote protection key and base address associated with
408 a MR. The memory registration must have completed successfully before
409 invoking this routine. Use of this call is required if the FI_RAW_MR
410 mode bit has been set by the provider; however, it is safe to use this
411 call with any memory region.
412
413 On input, the key_size parameter should indicate the size of the
414 raw_key buffer. If the actual key is larger than what can fit into the
415 buffer, it will return -FI_ETOOSMALL. On output, key_size is set to
416 the size of the buffer needed to store the key, which may be larger
417 than the input value. The needed key_size can also be obtained through
418 the mr_key_size domain attribute (fi_domain_attr) field.
419
420 A raw key must be mapped by a peer before it can be used in data trans‐
421 fer operations. See fi_mr_map_raw below.
422
423 fi_mr_map_raw
424 Raw protection keys must be mapped to a usable key value before they
425 can be used for data transfer operations. The mapping is done by the
426 peer that initiates the RMA or atomic operation. The mapping function
427 takes as input the raw key and its size, and returns the mapped key.
428 Use of the fi_mr_map_raw function is required if the peer has the
429 FI_RAW_MR mode bit set, but this routine may be called on any valid
430 key. All mapped keys must be freed by calling fi_mr_unmap_key when ac‐
431 cess to the peer memory region is no longer necessary.
432
433 fi_mr_unmap_key
434 This call releases any resources that may have been allocated as part
435 of mapping a raw memory key. All mapped keys must be freed before the
436 corresponding domain is closed.
437
438 fi_mr_bind
439 The fi_mr_bind function associates a memory region with a counter or
440 endpoint. Counter bindings are needed by providers that support the
441 generation of completions based on fabric operations. Endpoint bind‐
442 ings are needed if the provider associates memory regions with end‐
443 points (see FI_MR_ENDPOINT).
444
445 When binding with a counter, the type of events tracked against the
446 memory region is based on the bitwise OR of the following flags.
447
448 FI_REMOTE_WRITE
449 Generates an event whenever a remote RMA write or atomic opera‐
450 tion modifies the memory region. Use of this flag requires that
451 the endpoint through which the MR is accessed be created with
452 the FI_RMA_EVENT capability.
453
454 When binding the memory region to an endpoint, flags should be 0.
455
456 fi_mr_refresh
457 The use of this call is required to notify the provider of any change
458 to the physical pages backing a registered memory region if the
459 FI_MR_MMU_NOTIFY mode bit has been set. This call informs the provider
460 that the page table entries associated with the region may have been
461 modified, and the provider should verify and update the registered re‐
462 gion accordingly. The iov parameter is optional and may be used to
463 specify which portions of the registered region requires updating.
464 Providers are only guaranteed to update the specified address ranges.
465
466 The refresh operation has the effect of disabling and re-enabling ac‐
467 cess to the registered region. Any operations from peers that attempt
468 to access the region will fail while the refresh is occurring. Addi‐
469 tionally, attempts to access the region by the local process through
470 libfabric APIs may result in a page fault or other fatal operation.
471
472 The fi_mr_refresh call is only needed if the physical pages might have
473 been updated after the memory region was created.
474
475 fi_mr_enable
476 The enable call is used with memory registration associated with the
477 FI_MR_RMA_EVENT mode bit. Memory regions created in the disabled state
478 must be explicitly enabled after being fully configured by the applica‐
479 tion. Any resource bindings to the MR must be done prior to enabling
480 the MR.
481
483 Memory regions are created using the following attributes. The struct
484 fi_mr_attr is passed into fi_mr_regattr, but individual fields also ap‐
485 ply to other memory registration calls, with the fields passed directly
486 into calls as function parameters.
487
488 struct fi_mr_attr {
489 const struct iovec *mr_iov;
490 size_t iov_count;
491 uint64_t access;
492 uint64_t offset;
493 uint64_t requested_key;
494 void *context;
495 size_t auth_key_size;
496 uint8_t *auth_key;
497 enum fi_hmem_iface iface;
498 union {
499 uint64_t reserved;
500 int cuda;
501 int ze
502 int neuron;
503 int synapseai;
504 } device;
505 };
506
507 mr_iov
508 This is an IO vector of addresses that will represent a single memory
509 region. The number of entries in the iovec is specified by iov_count.
510
511 iov_count
512 The number of entries in the mr_iov array. The maximum number of memo‐
513 ry buffers that may be associated with a single memory region is speci‐
514 fied as the mr_iov_limit domain attribute. See fi_domain(3).
515
516 access
517 Indicates the type of operations that the local or a peer endpoint may
518 perform on registered memory region. Supported access permissions are
519 the bitwise OR of the following flags:
520
521 FI_SEND
522 The memory buffer may be used in outgoing message data trans‐
523 fers. This includes fi_msg and fi_tagged send operations, as
524 well as fi_collective operations.
525
526 FI_RECV
527 The memory buffer may be used to receive inbound message trans‐
528 fers. This includes fi_msg and fi_tagged receive operations, as
529 well as fi_collective operations.
530
531 FI_READ
532 The memory buffer may be used as the result buffer for RMA read
533 and atomic operations on the initiator side. Note that from the
534 viewpoint of the application, the memory buffer is being written
535 into by the network.
536
537 FI_WRITE
538 The memory buffer may be used as the source buffer for RMA write
539 and atomic operations on the initiator side. Note that from the
540 viewpoint of the application, the endpoint is reading from the
541 memory buffer and copying the data onto the network.
542
543 FI_REMOTE_READ
544 The memory buffer may be used as the source buffer of an RMA
545 read operation on the target side. The contents of the memory
546 buffer are not modified by such operations.
547
548 FI_REMOTE_WRITE
549 The memory buffer may be used as the target buffer of an RMA
550 write or atomic operation. The contents of the memory buffer
551 may be modified as a result of such operations.
552
553 FI_COLLECTIVE
554 This flag provides an explicit indication that the memory buffer
555 may be used with collective operations. Use of this flag is re‐
556 quired if the FI_MR_COLLECTIVE mr_mode bit has been set on the
557 domain. This flag should be paired with FI_SEND and/or FI_RECV
558
559 Note that some providers may not enforce fine grained access permis‐
560 sions. For example, a memory region registered for FI_WRITE access may
561 also behave as if FI_SEND were specified as well. Relaxed enforcement
562 of such access is permitted, though not guaranteed, provided security
563 is maintained.
564
565 offset
566 The offset field is reserved for future use and must be 0.
567
568 requested_key
569 An application specified access key associated with the memory region.
570 The MR key must be provided by a remote process when performing RMA or
571 atomic operations to a memory region. Applications can use the re‐
572 quested_key field to indicate that a specific key be used by the
573 provider. This allows applications to use well known key values, which
574 can avoid applications needing to exchange and store keys. Support for
575 user requested keys is provider specific and is determined by the the
576 FI_MR_PROV_KEY flag in the mr_mode domain attribute field.
577
578 context
579 Application context associated with asynchronous memory registration
580 operations. This value is returned as part of any asynchronous event
581 associated with the registration. This field is ignored for synchro‐
582 nous registration calls.
583
584 auth_key_size
585 The size of key referenced by the auth_key field in bytes, or 0 if no
586 authorization key is given. This field is ignored unless the fabric is
587 opened with API version 1.5 or greater.
588
589 auth_key
590 Indicates the key to associate with this memory registration. Autho‐
591 rization keys are used to limit communication between endpoints. Only
592 peer endpoints that are programmed to use the same authorization key
593 may access the memory region. The domain authorization key will be
594 used if the auth_key_size provided is 0. This field is ignored unless
595 the fabric is opened with API version 1.5 or greater.
596
597 iface
598 Indicates the software interfaces used by the application to allocate
599 and manage the memory region. This field is ignored unless the appli‐
600 cation has requested the FI_HMEM capability.
601
602 FI_HMEM_SYSTEM
603 Uses standard operating system calls and libraries, such as mal‐
604 loc, calloc, realloc, mmap, and free.
605
606 FI_HMEM_CUDA
607 Uses Nvidia CUDA interfaces such as cuMemAlloc, cuMemAllocHost,
608 cuMemAllocManaged, cuMemFree, cudaMalloc, cudaFree.
609
610 FI_HMEM_ROCR
611 Uses AMD ROCR interfaces such as hsa_memory_allocate and
612 hsa_memory_free.
613
614 FI_HMEM_ZE
615 Uses oneAPI Level Zero interfaces such as zeDriverAllocShared‐
616 Mem, zeDriverFreeMem.
617
618 FI_HMEM_NEURON
619 Uses the AWS Neuron SDK to support AWS Trainium devices.
620
621 FI_HMEM_SYNAPSEAI
622 Uses the SynapseAI API to support Habana Gaudi devices.
623
624 device
625 Reserved 64 bits for device identifier if using non-standard HMEM in‐
626 terface. This field is ignore unless the iface field is valid.
627
628 cuda For FI_HMEM_CUDA, this is equivalent to CUdevice (int).
629
630 ze For FI_HMEM_ZE, this is equivalent to the index of the device in
631 the ze_device_handle_t array. If there is only a single level
632 zero driver present, an application may set this directly. How‐
633 ever, it is recommended that this value be set using the
634 fi_hmem_ze_device() macro, which will encode the driver index
635 with the device.
636
637 neuron For FI_HMEM_NEURON, the device identifier for AWS Trainium de‐
638 vices.
639
640 synapseai
641 For FI_HMEM_SYNAPSEAI, the device identifier for Habana Gaudi
642 hardware.
643
644 fi_hmem_ze_device
645 Returns an hmem device identifier for a level zero <driver, device> tu‐
646 ple. The output of this call should be used to set fi_mr_attr::de‐
647 vice.ze for FI_HMEM_ZE interfaces. The driver and device index values
648 represent their 0-based positions in arrays returned from zeDriverGet
649 and zeDeviceGet, respectively.
650
652 Direct access to an application’s memory by a remote peer requires that
653 the application register the targeted memory buffer(s). This is typi‐
654 cally done by calling one of the fi_mr_reg* routines. For
655 FI_MR_PROV_KEY, the provider will return a key that must be used by the
656 peer when accessing the memory region. The application is responsible
657 for transferring this key to the peer. If FI_MR_RAW mode has been set,
658 the key must be retrieved using the fi_mr_raw_attr function.
659
660 FI_RAW_MR allows support for providers that require more than 8-bytes
661 for their protection keys or need additional setup before a key can be
662 used for transfers. After a raw key has been retrieved, it must be ex‐
663 changed with the remote peer. The peer must use fi_mr_map_raw to con‐
664 vert the raw key into a usable 64-bit key. The mapping must be done
665 even if the raw key is 64-bits or smaller.
666
667 The raw key support functions are usable with all registered memory re‐
668 gions, even if FI_MR_RAW has not been set. It is recommended that por‐
669 table applications target using those interfaces; however, their use
670 does carry extra message and memory footprint overhead, making it less
671 desirable for highly scalable apps.
672
673 There may be cases where device peer to peer support should not be used
674 or cannot be used, such as when the PCIe ACS configuration does not
675 permit the transfer. The FI_HMEM_DISABLE_P2P environment variable can
676 be set to notify Libfabric that peer to peer transactions should not be
677 used. The provider may choose to perform a copy instead, or will fail
678 support for FI_HMEM if the provider is unable to do that.
679
681 The follow flag may be specified to any memory registration call.
682
683 FI_RMA_EVENT
684 This flag indicates that the specified memory region will be as‐
685 sociated with a completion counter used to count RMA operations
686 that access the MR.
687
688 FI_RMA_PMEM
689 This flag indicates that the underlying memory region is backed
690 by persistent memory and will be used in RMA operations. It
691 must be specified if persistent completion semantics or persis‐
692 tent data transfers are required when accessing the registered
693 region.
694
695 FI_HMEM_DEVICE_ONLY
696 This flag indicates that the memory is only accessible by a de‐
697 vice. Which device is specified by the fi_mr_attr fields iface
698 and device. This refers to memory regions that were allocated
699 using a device API AllocDevice call (as opposed to using the
700 host allocation or unified/shared memory allocation).
701
702 FI_HMEM_HOST_ALLOC
703 This flag indicates that the memory is owned by the host only.
704 Whether it can be accessed by the device is implementation de‐
705 pendent. The fi_mr_attr field iface is still used to identify
706 the device API, but the field device is ignored. This refers to
707 memory regions that were allocated using a device API AllocHost
708 call (as opposed to using malloc-like host allocation, uni‐
709 fied/shared memory allocation, or AllocDevice).
710
712 Memory domains identify the physical separation of memory which may or
713 may not be accessible through the same virtual address space. Tradi‐
714 tionally, applications only dealt with a single memory domain, that of
715 host memory tightly coupled with the system CPUs. With the introduc‐
716 tion of device and non-uniform memory subsystems, applications often
717 need to be aware of which memory domain a particular virtual address
718 maps to.
719
720 As a general rule, separate physical devices can be considered to have
721 their own memory domains. For example, a NIC may have user accessible
722 memory, and would be considered a separate memory domain from memory on
723 a GPU. Both the NIC and GPU memory domains are separate from host sys‐
724 tem memory. Individual GPUs or computation accelerators may have dis‐
725 tinct memory domains, or may be connected in such a way (e.g. a GPU
726 specific fabric) that all GPUs would belong to the same memory domain.
727 Unfortunately, identifying memory domains is specific to each system
728 and its physical and/or virtual configuration.
729
730 Understanding memory domains in heterogenous memory environments is im‐
731 portant as it can impact data ordering and visibility as viewed by an
732 application. It is also important to understand which memory domain an
733 application is most tightly coupled to. In most cases, applications
734 are tightly coupled to host memory. However, an application running
735 directly on a GPU or NIC may be more tightly coupled to memory associ‐
736 ated with those devices.
737
738 Memory regions are often associated with a single memory domain. The
739 domain is often indicated by the fi_mr_attr iface and device fields.
740 Though it is possible for physical pages backing a virtual memory re‐
741 gion to migrate between memory domains based on access patterns. For
742 example, the physical pages referenced by a virtual address range could
743 migrate between host memory and GPU memory, depending on which computa‐
744 tional unit is actively using it.
745
746 See the fi_endpoint(3) and fi_cq(3) man pages for addition discussion
747 on message, data, and completion ordering semantics, including the im‐
748 pact of memory domains.
749
751 Returns 0 on success. On error, a negative value corresponding to fab‐
752 ric errno is returned.
753
754 Fabric errno values are defined in rdma/fi_errno.h.
755
757 -FI_ENOKEY
758 The requested_key is already in use.
759
760 -FI_EKEYREJECTED
761 The requested_key is not available. They key may be out of the
762 range supported by the provider, or the provider may not support
763 user-requested memory registration keys.
764
765 -FI_ENOSYS
766 Returned by fi_mr_bind if the provider does not support report‐
767 ing events based on access to registered memory regions.
768
769 -FI_EBADFLAGS
770 Returned if the specified flags are not supported by the
771 provider.
772
774 Many hardware NICs accessed by libfabric require that data buffers be
775 registered with the hardware while the hardware accesses it. This en‐
776 sures that the virtual to physical address mappings for those buffers
777 do not change while the transfer is occurring. The performance impact
778 of registering memory regions can be significant. As a result, some
779 providers make use of a registration cache, particularly when working
780 with applications that are unable to manage their own network buffers.
781 A registration cache avoids the overhead of registering and unregister‐
782 ing a data buffer with each transfer.
783
784 If a registration cache is going to be used for host and device memory,
785 the device must support unified virtual addressing. If the device does
786 not support unified virtual addressing, either an additional registra‐
787 tion cache is required to track this device memory, or device memory
788 cannot be cached.
789
790 As a general rule, if hardware requires the FI_MR_LOCAL mode bit de‐
791 scribed above, but this is not supported by the application, a memory
792 registration cache may be in use. The following environment variables
793 may be used to configure registration caches.
794
795 FI_MR_CACHE_MAX_SIZE
796 This defines the total number of bytes for all memory regions
797 that may be tracked by the cache. If not set, the cache has no
798 limit on how many bytes may be registered and cached. Setting
799 this will reduce the amount of memory that is not actively being
800 used as part of a data transfer that is registered with a
801 provider. By default, the cache size is unlimited.
802
803 FI_MR_CACHE_MAX_COUNT
804 This defines the total number of memory regions that may be reg‐
805 istered with the cache. If not set, a default limit is chosen.
806 Setting this will reduce the number of regions that are regis‐
807 tered, regardless of their size, which are not actively being
808 used as part of a data transfer. Setting this to zero will dis‐
809 able registration caching.
810
811 FI_MR_CACHE_MONITOR
812 The cache monitor is responsible for detecting system memory
813 (FI_HMEM_SYSTEM) changes made between the virtual addresses used
814 by an application and the underlying physical pages. Valid mon‐
815 itor options are: userfaultfd, memhooks, and disabled. Select‐
816 ing disabled will turn off the registration cache. Userfaultfd
817 is a Linux kernel feature used to report virtual to physical ad‐
818 dress mapping changes to user space. Memhooks operates by in‐
819 tercepting relevant memory allocation and deallocation calls
820 which may result in the mappings changing, such as malloc, mmap,
821 free, etc. Note that memhooks operates at the elf linker layer,
822 and does not use glibc memory hooks.
823
824 FI_MR_CUDA_CACHE_MONITOR_ENABLED
825 The CUDA cache monitor is responsible for detecting CUDA device
826 memory (FI_HMEM_CUDA) changes made between the device virtual
827 addresses used by an application and the underlying device phys‐
828 ical pages. Valid monitor options are: 0 or 1. Note that the
829 CUDA memory monitor requires a CUDA toolkit version with unified
830 virtual addressing enabled.
831
832 FI_MR_ROCR_CACHE_MONITOR_ENABLED
833 The ROCR cache monitor is responsible for detecting ROCR device
834 memory (FI_HMEM_ROCR) changes made between the device virtual
835 addresses used by an application and the underlying device phys‐
836 ical pages. Valid monitor options are: 0 or 1. Note that the
837 ROCR memory monitor requires a ROCR version with unified virtual
838 addressing enabled.
839
840 FI_MR_ZE_CACHE_MONITOR_ENABLED
841 The ZE cache monitor is responsible for detecting oneAPI Level
842 Zero device memory (FI_HMEM_ZE) changes made between the device
843 virtual addresses used by an application and the underlying de‐
844 vice physical pages. Valid monitor options are: 0 or 1.
845
846 More direct access to the internal registration cache is possible
847 through the fi_open() call, using the “mr_cache” service name. Once
848 opened, custom memory monitors may be installed. A memory monitor is a
849 component of the cache responsible for detecting changes in virtual to
850 physical address mappings. Some level of control over the cache is
851 possible through the above mentioned environment variables.
852
854 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_rma(3), fi_msg(3),
855 fi_atomic(3)
856
858 OpenFabrics.
859
860
861
862Libfabric Programmer’s Manual 2023-03-10 fi_mr(3)