1fi_mr(3)                       Libfabric v1.15.1                      fi_mr(3)
2
3
4

NAME

6       fi_mr - Memory region operations
7
8       fi_mr_reg / fi_mr_regv / fi_mr_regattr
9              Register local memory buffers for direct fabric access
10
11       fi_close
12              Deregister registered memory buffers.
13
14       fi_mr_desc
15              Return  a  local  descriptor associated with a registered memory
16              region
17
18       fi_mr_key
19              Return the remote key needed to access a registered  memory  re‐
20              gion
21
22       fi_mr_raw_attr
23              Return raw memory region attributes.
24
25       fi_mr_map_raw
26              Converts  a  raw memory region key into a key that is usable for
27              data transfer operations.
28
29       fi_mr_unmap_key
30              Releases a previously mapped raw memory region key.
31
32       fi_mr_bind
33              Associate a registered memory region with a  completion  counter
34              or an endpoint.
35
36       fi_mr_refresh
37              Updates the memory pages associated with a memory region.
38
39       fi_mr_enable
40              Enables a memory region for use.
41

SYNOPSIS

43              #include <rdma/fi_domain.h>
44
45              int fi_mr_reg(struct fid_domain *domain, const void *buf, size_t len,
46                  uint64_t access, uint64_t offset, uint64_t requested_key,
47                  uint64_t flags, struct fid_mr **mr, void *context);
48
49              int fi_mr_regv(struct fid_domain *domain, const struct iovec * iov,
50                  size_t count, uint64_t access, uint64_t offset, uint64_t requested_key,
51                  uint64_t flags, struct fid_mr **mr, void *context);
52
53              int fi_mr_regattr(struct fid_domain *domain, const struct fi_mr_attr *attr,
54                  uint64_t flags, struct fid_mr **mr);
55
56              int fi_close(struct fid *mr);
57
58              void * fi_mr_desc(struct fid_mr *mr);
59
60              uint64_t fi_mr_key(struct fid_mr *mr);
61
62              int fi_mr_raw_attr(struct fid_mr *mr, uint64_t *base_addr,
63                  uint8_t *raw_key, size_t *key_size, uint64_t flags);
64
65              int fi_mr_map_raw(struct fid_domain *domain, uint64_t base_addr,
66                  uint8_t *raw_key, size_t key_size, uint64_t *key, uint64_t flags);
67
68              int fi_mr_unmap_key(struct fid_domain *domain, uint64_t key);
69
70              int fi_mr_bind(struct fid_mr *mr, struct fid *bfid, uint64_t flags);
71
72              int fi_mr_refresh(struct fid_mr *mr, const struct iovec *iov,
73                  size_t count, uint64_t flags);
74
75              int fi_mr_enable(struct fid_mr *mr);
76

ARGUMENTS

78       domain Resource domain
79
80       mr     Memory region
81
82       bfid   Fabric identifier of an associated resource.
83
84       context
85              User specified context associated with the memory region.
86
87       buf    Memory buffer to register with the fabric hardware.
88
89       len    Length of memory buffer to register.  Must be > 0.
90
91       iov    Vectored memory buffer.
92
93       count  Count of vectored buffer entries.
94
95       access Memory access permissions associated with registration
96
97       offset Optional  specified  offset  for  accessing specified registered
98              buffers.  This parameter is reserved for future use and must  be
99              0.
100
101       requested_key
102              Requested remote key associated with registered buffers.  Param‐
103              eter is ignored if FI_MR_PROV_KEY flag  is  set  in  the  domain
104              mr_mode bits.
105
106       attr   Memory region attributes
107
108       flags  Additional flags to apply to the operation.
109

DESCRIPTION

111       Registered  memory  regions  associate  memory buffers with permissions
112       granted for access by fabric resources.  A memory buffer must be regis‐
113       tered  with  a resource domain before it can be used as the target of a
114       remote RMA or atomic data transfer.  Additionally,  a  fabric  provider
115       may  require that data buffers be registered before being used in local
116       transfers.  Memory registration restrictions  are  controlled  using  a
117       separate  set  of  mode  bits,  specified through the domain attributes
118       (mr_mode field).  Each mr_mode bit requires that  an  application  take
119       specific  steps  in  order  to use memory buffers with libfabric inter‐
120       faces.
121
122       The following apply to memory registration.
123
124       Default Memory Registration
125              If no mr_mode bits are set, the default behaviors describe below
126              are  followed.   Historically,  these defaults were collectively
127              referred to as scalable memory registration.   The  default  re‐
128              quirements  are  outlined  below, followed by definitions of how
129              each mr_mode bit alters the definition.
130
131       Compatibility: For library versions 1.4 and earlier, this was indicated
132       by  setting  mr_mode  to FI_MR_SCALABLE and the fi_info mode bit FI_LO‐
133       CAL_MR to 0.  FI_MR_SCALABLE and FI_LOCAL_MR were deprecated in libfab‐
134       ric  version 1.5, though they are supported for backwards compatibility
135       purposes.
136
137       For security, memory registration is required for data buffers that are
138       accessed  directly by a peer process.  For example, registration is re‐
139       quired for RMA target buffers (read or written to), and those  accessed
140       by atomic or collective operations.
141
142       By  default,  registration  occurs  on virtual address ranges.  Because
143       registration refers to address ranges, rather than allocated data  buf‐
144       fers,  the  address ranges do not need to map to data buffers allocated
145       by the application at the time the registration call is made.  That is,
146       an application can register any range of addresses in their virtual ad‐
147       dress space, whether or not those  addresses  are  backed  by  physical
148       pages or have been allocated.
149
150       Note that physical pages must back addresses prior to the addresses be‐
151       ing accessed as part of a data transfer operation, or the  data  trans‐
152       fers  will  fail.  Additionally, depending on the operation, this could
153       result in the local process receiving a segmentation fault for  access‐
154       ing invalid memory.
155
156       Once  registered,  the resulting memory regions are accessible by peers
157       starting at a base address of 0.  That is, the target address  that  is
158       specified is a byte offset into the registered region.
159
160       The  application  also  selects  the access key associated with the MR.
161       The key size is restricted to a maximum of 8 bytes.
162
163       With scalable registration, locally accessed data buffers are not  reg‐
164       istered.   This  includes  source buffers for all transmit operations –
165       sends, tagged sends, RMA, and atomics – as well as buffers  posted  for
166       receive and tagged receive operations.
167
168       Although the default memory registration behavior is convenient for ap‐
169       plication developers, it is difficult to implement  in  hardware.   At‐
170       tempts to hide the hardware requirements from the application often re‐
171       sults in significant and unacceptable impacts to performance.  The fol‐
172       lowing  mr_mode  bits  are  provided  as  input  into fi_getinfo.  If a
173       provider requires the behavior defined for  an  mr_mode  bit,  it  will
174       leave the bit set on output to fi_getinfo.  Otherwise, the provider can
175       clear the bit to indicate that the behavior is not needed.
176
177       By setting an mr_mode bit, the application has agreed to adjust its be‐
178       havior  as indicated.  Importantly, applications that choose to support
179       an mr_mode must be prepared to handle the case where the mr_mode is not
180       required.  A provider will clear an mr_mode bit if it is not needed.
181
182       FI_MR_LOCAL
183              When the FI_MR_LOCAL mode bit is set, applications must register
184              all data buffers that will be accessed by the local hardware and
185              provide a valid desc parameter into applicable data transfer op‐
186              erations.  When FI_MR_LOCAL is zero, applications  are  not  re‐
187              quired  to register data buffers before using them for local op‐
188              erations (e.g. send and receive data buffers).  The desc parame‐
189              ter  into data transfer operations will be ignored in this case,
190              unless otherwise required (e.g. se FI_MR_HMEM).   It  is  recom‐
191              mended  that  applications  pass  in  NULL for desc when not re‐
192              quired.
193
194       A provider may hide local registration requirements  from  applications
195       by  making use of an internal registration cache or similar mechanisms.
196       Such mechanisms, however, may negatively impact  performance  for  some
197       applications, notably those which manage their own network buffers.  In
198       order to support as broad range of applications  as  possible,  without
199       unduly  affecting  their  performance, applications that wish to manage
200       their own local memory registrations may do so by using the memory reg‐
201       istration calls.
202
203       Note: the FI_MR_LOCAL mr_mode bit replaces the FI_LOCAL_MR fi_info mode
204       bit.  When FI_MR_LOCAL is set, FI_LOCAL_MR is ignored.
205
206       FI_MR_RAW
207              Raw memory regions are used to support providers with keys larg‐
208              er  than  64-bits  or  require  setup  at  the  peer.   When the
209              FI_MR_RAW bit is set, applications must use fi_mr_raw_attr() lo‐
210              cally  and fi_mr_map_raw() at the peer before targeting a memory
211              region as part of any data transfer request.
212
213       FI_MR_VIRT_ADDR
214              The FI_MR_VIRT_ADDR bit indicates that the  provider  references
215              memory regions by virtual address, rather than a 0-based offset.
216              Peers that target memory regions registered with FI_MR_VIRT_ADDR
217              specify the destination memory buffer using the target’s virtual
218              address, with any offset into the region  specified  as  virtual
219              address  +  offset.   Support of this bit typically implies that
220              peers must exchange addressing data prior to initiating any  RMA
221              or atomic operation.
222
223       FI_MR_ALLOCATED
224              When set, all registered memory regions must be backed by physi‐
225              cal memory pages at the time the registration call is made.
226
227       FI_MR_PROV_KEY
228              This memory region mode indicates that  the  provider  does  not
229              support  application requested MR keys.  MR keys are returned by
230              the provider.  Applications that support FI_MR_PROV_KEY can  ob‐
231              tain the provider key using fi_mr_key(), unless FI_MR_RAW is al‐
232              so set.  The returned key should then be  exchanged  with  peers
233              prior to initiating an RMA or atomic operation.
234
235       FI_MR_MMU_NOTIFY
236              FI_MR_MMU_NOTIFY is typically set by providers that support mem‐
237              ory registration against memory regions that are not necessarily
238              backed by allocated physical pages at the time the memory regis‐
239              tration occurs.  (That  is,  FI_MR_ALLOCATED  is  typically  0).
240              However,  such  providers  require  that applications notify the
241              provider prior to the MR being accessed as part of a data trans‐
242              fer  operation.  This notification informs the provider that all
243              necessary physical pages now back the region.  The  notification
244              is  necessary  for  providers that cannot hook directly into the
245              operating system page tables or  memory  management  unit.   See
246              fi_mr_refresh() for notification details.
247
248       FI_MR_RMA_EVENT
249              This  mode bit indicates that the provider must configure memory
250              regions that are associated with RMA events prior to their  use.
251              This  includes  all memory regions that are associated with com‐
252              pletion counters.  When set, applications  must  indicate  if  a
253              memory  region  will  be associated with a completion counter as
254              part of the region’s creation.  This is done by passing  in  the
255              FI_RMA_EVENT flag to the memory registration call.
256
257       Such memory regions will be created in a disabled state and must be as‐
258       sociated with all completion counters prior to being enabled.   To  en‐
259       able  a memory region, the application must call fi_mr_enable().  After
260       calling fi_mr_enable(), no further resource bindings may be made to the
261       memory region.
262
263       FI_MR_ENDPOINT
264              This  mode bit indicates that the provider associates memory re‐
265              gions with endpoints rather than domains.  Memory  regions  that
266              are registered with the provider are created in a disabled state
267              and must be bound to an endpoint prior  to  being  enabled.   To
268              bind   the  MR  with  an  endpoint,  the  application  must  use
269              fi_mr_bind().  To enable the memory region, the application must
270              call fi_mr_enable().
271
272       FI_MR_HMEM
273              This  mode  bit  is  associated with the FI_HMEM capability.  If
274              FI_MR_HMEM is set, the application must  register  buffers  that
275              were  allocated using a device call and provide a valid desc pa‐
276              rameter into applicable data transfer operations  even  if  they
277              are  only  used for local operations (e.g. send and receive data
278              buffers).  Device memory must be registered using the  fi_mr_re‐
279              gattr call, with the iface and device fields filled out.
280
281       If  FI_MR_HMEM  is  set,  but FI_MR_LOCAL is unset, only device buffers
282       must be registered when used locally.  In this case, the desc parameter
283       passed  into  data  transfer  operations  must either be valid or NULL.
284       Similarly, if FI_MR_LOCAL is set, but FI_MR_HMEM is not, the  desc  pa‐
285       rameter must either be valid or NULL.
286
287       FI_MR_COLLECTIVE
288              This  bit is associated with the FI_COLLECTIVE capability.  When
289              set, the provider requires that memory regions used  in  collec‐
290              tion  operations must explicitly be registered for use with col‐
291              lective calls.  This requires registering regions passed to col‐
292              lective calls using the FI_COLLECTIVE flag.
293
294       Basic Memory Registration
295              Basic  memory  registration  was deprecated in libfabric version
296              1.5, but is supported for backwards compatibility.  Basic memory
297              registration  is indicated by setting mr_mode equal to FI_MR_BA‐
298              SIC.  FI_MR_BASIC must be set alone and not paired with  mr_mode
299              bits.  Unlike other mr_mode bits, if FI_MR_BASIC is set on input
300              to fi_getinfo(), it will not be cleared by the  provider.   That
301              is,  setting mr_mode equal to FI_MR_BASIC forces basic registra‐
302              tion if the provider supports it.
303
304       The behavior of basic registration is equivalent to requiring the  fol‐
305       lowing    mr_mode    bits:    FI_MR_VIRT_ADDR,   FI_MR_ALLOCATED,   and
306       FI_MR_PROV_KEY.  Additionally, providers that support  basic  registra‐
307       tion  usually  require  the  (deprecated) fi_info mode bit FI_LOCAL_MR,
308       which was incorporated into the FI_MR_LOCAL mr_mode bit.
309
310       The registrations functions – fi_mr_reg, fi_mr_regv, and  fi_mr_regattr
311       –  are  used  to  register  one  or more memory regions with fabric re‐
312       sources.  The main difference between registration  functions  are  the
313       number  and  type  of parameters that they accept as input.  Otherwise,
314       they perform the same general function.
315
316       By default, memory registration  completes  synchronously.   I.e.   the
317       registration call will not return until the registration has completed.
318       Memory registration can complete asynchronous by binding  the  resource
319       domain to an event queue using the FI_REG_MR flag.  See fi_domain_bind.
320       When memory registration is asynchronous, in order to avoid a race con‐
321       dition  between  the  registration call returning and the corresponding
322       reading of the event from the EQ, the mr output parameter will be writ‐
323       ten  before  any event associated with the operation may be read by the
324       application.  An asynchronous event will not be  generated  unless  the
325       registration call returns success (0).
326
327   fi_mr_reg
328       The  fi_mr_reg call registers the user-specified memory buffer with the
329       resource domain.  The buffer is enabled for access by the fabric  hard‐
330       ware  based  on  the provided access permissions.  See the access field
331       description for memory region attributes below.
332
333       Registered memory is associated with a local memory descriptor and, op‐
334       tionally,  a remote memory key.  A memory descriptor is a provider spe‐
335       cific identifier associated with registered memory.  Memory descriptors
336       often map to hardware specific indices or keys associated with the mem‐
337       ory region.  Remote memory keys provide limited protection against  un‐
338       wanted  access  by  a  remote node.  Remote accesses to a memory region
339       must provide the key associated with the registration.
340
341       Because MR keys must be provided by a remote  process,  an  application
342       can  use  the  requested_key  parameter to indicate that a specific key
343       value be returned.  Support for user requested keys is provider specif‐
344       ic  and  is  determined by the FI_MR_PROV_KEY flag value in the mr_mode
345       domain attribute.
346
347       Remote RMA and atomic operations indicate the location within a  regis‐
348       tered  memory  region by specifying an address.  The location is refer‐
349       enced by adding the offset to either the base virtual  address  of  the
350       buffer or to 0, depending on the mr_mode.
351
352       The offset parameter is reserved for future use and must be 0.
353
354       For  asynchronous  memory registration requests, the result will be re‐
355       ported to the user through an event queue associated with the  resource
356       domain.   If  successful, the allocated memory region structure will be
357       returned to the user through the mr parameter.  The mr address must re‐
358       main  valid  until  the  registration operation completes.  The context
359       specified with the registration request is returned with the completion
360       event.
361
362   fi_mr_regv
363       The   fi_mr_regv  call  adds  support  for  a  scatter-gather  list  to
364       fi_mr_reg.  Multiple memory buffers are registered as a  single  memory
365       region.  Otherwise, the operation is the same.
366
367   fi_mr_regattr
368       The  fi_mr_regattr call is a more generic, extensible registration call
369       that allows the user to specify the registration request using a struct
370       fi_mr_attr (defined below).
371
372   fi_close
373       Fi_close is used to release all resources associated with a registering
374       a memory region.  Once unregistered, further access to  the  registered
375       memory is not guaranteed.  Active or queued operations that reference a
376       memory region being closed may fail or result in  accesses  to  invalid
377       memory.   Applications  are  responsible  for  ensuring that a MR is no
378       longer needed prior to closing it.  Note that accesses to a  closed  MR
379       from  a  remote peer will result in an error at the peer.  The state of
380       the local endpoint will be unaffected.
381
382       When closing the MR, there must be no opened endpoints or counters  as‐
383       sociated  with  the  MR.  If resources are still associated with the MR
384       when attempting to close, the call will return -FI_EBUSY.
385
386   fi_mr_desc
387       Obtains the local memory descriptor associated with a MR.   The  memory
388       registration  must  have  completed  successfully  before invoking this
389       call.
390
391   fi_mr_key
392       Returns the remote protection key associated with  a  MR.   The  memory
393       registration  must  have  completed  successfully before invoking this.
394       The returned key may be used in data transfer operations at a peer.  If
395       the FI_RAW_MR mode bit has been set for the domain, then the memory key
396       must be obtained using the fi_mr_raw_key function  instead.   A  return
397       value  of  FI_KEY_NOTAVAIL will be returned if the registration has not
398       completed or a raw memory key is required.
399
400   fi_mr_raw_attr
401       Returns the raw, remote protection key and base address associated with
402       a  MR.  The memory registration must have completed successfully before
403       invoking this routine.  Use of this call is required if  the  FI_RAW_MR
404       mode  bit has been set by the provider; however, it is safe to use this
405       call with any memory region.
406
407       On input, the key_size  parameter  should  indicate  the  size  of  the
408       raw_key buffer.  If the actual key is larger than what can fit into the
409       buffer, it will return -FI_ETOOSMALL.  On output, key_size  is  set  to
410       the  size  of  the  buffer needed to store the key, which may be larger
411       than the input value.  The needed key_size can also be obtained through
412       the mr_key_size domain attribute (fi_domain_attr) field.
413
414       A raw key must be mapped by a peer before it can be used in data trans‐
415       fer operations.  See fi_mr_map_raw below.
416
417   fi_mr_map_raw
418       Raw protection keys must be mapped to a usable key  value  before  they
419       can  be  used for data transfer operations.  The mapping is done by the
420       peer that initiates the RMA or atomic operation.  The mapping  function
421       takes  as  input  the raw key and its size, and returns the mapped key.
422       Use of the fi_mr_map_raw function is  required  if  the  peer  has  the
423       FI_RAW_MR  mode  bit  set,  but this routine may be called on any valid
424       key.  All mapped keys must be freed by calling fi_mr_unmap_key when ac‐
425       cess to the peer memory region is no longer necessary.
426
427   fi_mr_unmap_key
428       This  call  releases any resources that may have been allocated as part
429       of mapping a raw memory key.  All mapped keys must be freed before  the
430       corresponding domain is closed.
431
432   fi_mr_bind
433       The  fi_mr_bind  function  associates a memory region with a counter or
434       endpoint.  Counter bindings are needed by providers  that  support  the
435       generation  of  completions based on fabric operations.  Endpoint bind‐
436       ings are needed if the provider associates  memory  regions  with  end‐
437       points (see FI_MR_ENDPOINT).
438
439       When  binding  with  a  counter, the type of events tracked against the
440       memory region is based on the bitwise OR of the following flags.
441
442       FI_REMOTE_WRITE
443              Generates an event whenever a remote RMA write or atomic  opera‐
444              tion modifies the memory region.  Use of this flag requires that
445              the endpoint through which the MR is accessed  be  created  with
446              the FI_RMA_EVENT capability.
447
448       When binding the memory region to an endpoint, flags should be 0.
449
450   fi_mr_refresh
451       The  use  of this call is required to notify the provider of any change
452       to the physical  pages  backing  a  registered  memory  region  if  the
453       FI_MR_MMU_NOTIFY mode bit has been set.  This call informs the provider
454       that the page table entries associated with the region  may  have  been
455       modified,  and the provider should verify and update the registered re‐
456       gion accordingly.  The iov parameter is optional and  may  be  used  to
457       specify  which  portions  of  the  registered region requires updating.
458       Providers are only guaranteed to update the specified address ranges.
459
460       The refresh operation has the effect of disabling and  re-enabling  ac‐
461       cess  to the registered region.  Any operations from peers that attempt
462       to access the region will fail while the refresh is  occurring.   Addi‐
463       tionally,  attempts  to  access the region by the local process through
464       libfabric APIs may result in a page fault or other fatal operation.
465
466       The fi_mr_refresh call is only needed if the physical pages might  have
467       been updated after the memory region was created.
468
469   fi_mr_enable
470       The  enable  call  is used with memory registration associated with the
471       FI_MR_RMA_EVENT mode bit.  Memory regions created in the disabled state
472       must be explicitly enabled after being fully configured by the applica‐
473       tion.  Any resource bindings to the MR must be done prior  to  enabling
474       the MR.
475

MEMORY REGION ATTRIBUTES

477       Memory  regions are created using the following attributes.  The struct
478       fi_mr_attr is passed into fi_mr_regattr, but individual fields also ap‐
479       ply to other memory registration calls, with the fields passed directly
480       into calls as function parameters.
481
482              struct fi_mr_attr {
483                  const struct iovec *mr_iov;
484                  size_t             iov_count;
485                  uint64_t           access;
486                  uint64_t           offset;
487                  uint64_t           requested_key;
488                  void               *context;
489                  size_t             auth_key_size;
490                  uint8_t            *auth_key;
491                  enum fi_hmem_iface iface;
492                  union {
493                      uint64_t         reserved;
494                      int              cuda;
495                      int      ze
496                  } device;
497              };
498
499   mr_iov
500       This is an IO vector of addresses that will represent a  single  memory
501       region.  The number of entries in the iovec is specified by iov_count.
502
503   iov_count
504       The number of entries in the mr_iov array.  The maximum number of memo‐
505       ry buffers that may be associated with a single memory region is speci‐
506       fied as the mr_iov_limit domain attribute.  See fi_domain(3).
507
508   access
509       Indicates  the type of operations that the local or a peer endpoint may
510       perform on registered memory region.  Supported access permissions  are
511       the bitwise OR of the following flags:
512
513       FI_SEND
514              The  memory  buffer  may be used in outgoing message data trans‐
515              fers.  This includes fi_msg and fi_tagged  send  operations,  as
516              well as fi_collective operations.
517
518       FI_RECV
519              The  memory buffer may be used to receive inbound message trans‐
520              fers.  This includes fi_msg and fi_tagged receive operations, as
521              well as fi_collective operations.
522
523       FI_READ
524              The  memory buffer may be used as the result buffer for RMA read
525              and atomic operations on the initiator side.  Note that from the
526              viewpoint of the application, the memory buffer is being written
527              into by the network.
528
529       FI_WRITE
530              The memory buffer may be used as the source buffer for RMA write
531              and atomic operations on the initiator side.  Note that from the
532              viewpoint of the application, the endpoint is reading  from  the
533              memory buffer and copying the data onto the network.
534
535       FI_REMOTE_READ
536              The  memory  buffer  may  be used as the source buffer of an RMA
537              read operation on the target side.  The contents of  the  memory
538              buffer are not modified by such operations.
539
540       FI_REMOTE_WRITE
541              The  memory  buffer  may  be used as the target buffer of an RMA
542              write or atomic operation.  The contents of  the  memory  buffer
543              may be modified as a result of such operations.
544
545       FI_COLLECTIVE
546              This flag provides an explicit indication that the memory buffer
547              may be used with collective operations.  Use of this flag is re‐
548              quired  if  the FI_MR_COLLECTIVE mr_mode bit has been set on the
549              domain.  This flag should be paired with FI_SEND and/or FI_RECV
550
551       Note that some providers may not enforce fine  grained  access  permis‐
552       sions.  For example, a memory region registered for FI_WRITE access may
553       also behave as if FI_SEND were specified as well.  Relaxed  enforcement
554       of  such  access is permitted, though not guaranteed, provided security
555       is maintained.
556
557   offset
558       The offset field is reserved for future use and must be 0.
559
560   requested_key
561       An application specified access key associated with the memory  region.
562       The  MR key must be provided by a remote process when performing RMA or
563       atomic operations to a memory region.  Applications  can  use  the  re‐
564       quested_key  field  to  indicate  that  a  specific  key be used by the
565       provider.  This allows applications to use well known key values, which
566       can avoid applications needing to exchange and store keys.  Support for
567       user requested keys is provider specific and is determined by  the  the
568       FI_MR_PROV_KEY flag in the mr_mode domain attribute field.
569
570   context
571       Application  context  associated  with asynchronous memory registration
572       operations.  This value is returned as part of any  asynchronous  event
573       associated  with  the registration.  This field is ignored for synchro‐
574       nous registration calls.
575
576   auth_key_size
577       The size of key referenced by the auth_key field in bytes, or 0  if  no
578       authorization key is given.  This field is ignored unless the fabric is
579       opened with API version 1.5 or greater.
580
581   auth_key
582       Indicates the key to associate with this memory  registration.   Autho‐
583       rization  keys are used to limit communication between endpoints.  Only
584       peer endpoints that are programmed to use the  same  authorization  key
585       may  access  the  memory  region.  The domain authorization key will be
586       used if the auth_key_size provided is 0.  This field is ignored  unless
587       the fabric is opened with API version 1.5 or greater.
588
589   iface
590       Indicates  the  software interfaces used by the application to allocate
591       and manage the memory region.  This field is ignored unless the  appli‐
592       cation has requested the FI_HMEM capability.
593
594       FI_HMEM_SYSTEM
595              Uses standard operating system calls and libraries, such as mal‐
596              loc, calloc, realloc, mmap, and free.
597
598       FI_HMEM_CUDA
599              Uses Nvidia CUDA interfaces such as cuMemAlloc,  cuMemAllocHost,
600              cuMemAllocManaged, cuMemFree, cudaMalloc, cudaFree.
601
602       FI_HMEM_ROCR
603              Uses   AMD  ROCR  interfaces  such  as  hsa_memory_allocate  and
604              hsa_memory_free.
605
606       FI_HMEM_ZE
607              Uses oneAPI Level Zero interfaces such  as  zeDriverAllocShared‐
608              Mem, zeDriverFreeMem.
609
610       FI_HMEM_NEURON
611              Uses the AWS Neuron SDK to support AWS Trainium devices.
612
613   device
614       Reserved  64  bits for device identifier if using non-standard HMEM in‐
615       terface.  This field is ignore unless the iface field is valid.
616
617       cuda   For FI_HMEM_CUDA, this is equivalent to CUdevice (int).
618
619       ze     For FI_HMEM_ZE, this is equivalent to the ze_device_handle_t in‐
620              dex (int).
621
622       neuron For  FI_HMEM_NEURON,  the device identifier for AWS Trainium de‐
623              vices.
624

NOTES

626       Direct access to an application’s memory by a remote peer requires that
627       the  application register the targeted memory buffer(s).  This is typi‐
628       cally  done  by  calling  one  of   the   fi_mr_reg*   routines.    For
629       FI_MR_PROV_KEY, the provider will return a key that must be used by the
630       peer when accessing the memory region.  The application is  responsible
631       for transferring this key to the peer.  If FI_MR_RAW mode has been set,
632       the key must be retrieved using the fi_mr_raw_attr function.
633
634       FI_RAW_MR allows support for providers that require more  than  8-bytes
635       for  their protection keys or need additional setup before a key can be
636       used for transfers.  After a raw key has been retrieved, it must be ex‐
637       changed  with the remote peer.  The peer must use fi_mr_map_raw to con‐
638       vert the raw key into a usable 64-bit key.  The mapping  must  be  done
639       even if the raw key is 64-bits or smaller.
640
641       The raw key support functions are usable with all registered memory re‐
642       gions, even if FI_MR_RAW has not been set.  It is recommended that por‐
643       table  applications  target  using those interfaces; however, their use
644       does carry extra message and memory footprint overhead, making it  less
645       desirable for highly scalable apps.
646
647       There may be cases where device peer to peer support should not be used
648       or cannot be used, such as when the PCIe  ACS  configuration  does  not
649       permit  the transfer.  The FI_HMEM_DISABLE_P2P environment variable can
650       be set to notify Libfabric that peer to peer transactions should not be
651       used.   The provider may choose to perform a copy instead, or will fail
652       support for FI_HMEM if the provider is unable to do that.
653

FLAGS

655       The follow flag may be specified to any memory registration call.
656
657       FI_RMA_EVENT
658              This flag indicates that the specified memory region will be as‐
659              sociated  with a completion counter used to count RMA operations
660              that access the MR.
661
662       FI_RMA_PMEM
663              This flag indicates that the underlying memory region is  backed
664              by  persistent  memory  and  will be used in RMA operations.  It
665              must be specified if persistent completion semantics or  persis‐
666              tent  data  transfers are required when accessing the registered
667              region.
668
669       FI_HMEM_DEVICE_ONLY
670              This flag indicates that the memory is only accessible by a  de‐
671              vice.   Which device is specified by the fi_mr_attr fields iface
672              and device.  This refers to memory regions that  were  allocated
673              using  a  device  API  AllocDevice call (as opposed to using the
674              host allocation or unified/shared memory allocation).
675
676       FI_HMEM_HOST_ALLOC
677              This flag indicates that the memory is owned by the  host  only.
678              Whether  it  can be accessed by the device is implementation de‐
679              pendent.  The fi_mr_attr field iface is still used  to  identify
680              the device API, but the field device is ignored.  This refers to
681              memory regions that were allocated using a device API  AllocHost
682              call  (as  opposed  to  using  malloc-like host allocation, uni‐
683              fied/shared memory allocation, or AllocDevice).
684

MEMORY DOMAINS

686       Memory domains identify the physical separation of memory which may  or
687       may  not  be accessible through the same virtual address space.  Tradi‐
688       tionally, applications only dealt with a single memory domain, that  of
689       host  memory  tightly coupled with the system CPUs.  With the introduc‐
690       tion of device and non-uniform memory  subsystems,  applications  often
691       need  to  be  aware of which memory domain a particular virtual address
692       maps to.
693
694       As a general rule, separate physical devices can be considered to  have
695       their  own memory domains.  For example, a NIC may have user accessible
696       memory, and would be considered a separate memory domain from memory on
697       a GPU.  Both the NIC and GPU memory domains are separate from host sys‐
698       tem memory.  Individual GPUs or computation accelerators may have  dis‐
699       tinct  memory  domains,  or  may be connected in such a way (e.g. a GPU
700       specific fabric) that all GPUs would belong to the same memory  domain.
701       Unfortunately,  identifying  memory  domains is specific to each system
702       and its physical and/or virtual configuration.
703
704       Understanding memory domains in heterogenous memory environments is im‐
705       portant  as  it can impact data ordering and visibility as viewed by an
706       application.  It is also important to understand which memory domain an
707       application  is  most  tightly coupled to.  In most cases, applications
708       are tightly coupled to host memory.  However,  an  application  running
709       directly  on a GPU or NIC may be more tightly coupled to memory associ‐
710       ated with those devices.
711
712       Memory regions are often associated with a single memory  domain.   The
713       domain  is  often  indicated by the fi_mr_attr iface and device fields.
714       Though it is possible for physical pages backing a virtual  memory  re‐
715       gion  to  migrate between memory domains based on access patterns.  For
716       example, the physical pages referenced by a virtual address range could
717       migrate between host memory and GPU memory, depending on which computa‐
718       tional unit is actively using it.
719
720       See the fi_endpoint(3) and fi_cq(3) man pages for  addition  discussion
721       on  message, data, and completion ordering semantics, including the im‐
722       pact of memory domains.
723

RETURN VALUES

725       Returns 0 on success.  On error, a negative value corresponding to fab‐
726       ric errno is returned.
727
728       Fabric errno values are defined in rdma/fi_errno.h.
729

ERRORS

731       -FI_ENOKEY
732              The requested_key is already in use.
733
734       -FI_EKEYREJECTED
735              The  requested_key is not available.  They key may be out of the
736              range supported by the provider, or the provider may not support
737              user-requested memory registration keys.
738
739       -FI_ENOSYS
740              Returned  by fi_mr_bind if the provider does not support report‐
741              ing events based on access to registered memory regions.
742
743       -FI_EBADFLAGS
744              Returned if  the  specified  flags  are  not  supported  by  the
745              provider.
746

MEMORY REGISTRATION CACHE

748       Many  hardware  NICs accessed by libfabric require that data buffers be
749       registered with the hardware while the hardware accesses it.  This  en‐
750       sures  that  the virtual to physical address mappings for those buffers
751       do not change while the transfer is occurring.  The performance  impact
752       of  registering  memory  regions can be significant.  As a result, some
753       providers make use of a registration cache, particularly  when  working
754       with  applications that are unable to manage their own network buffers.
755       A registration cache avoids the overhead of registering and unregister‐
756       ing a data buffer with each transfer.
757
758       If a registration cache is going to be used for host and device memory,
759       the device must support unified virtual addressing.  If the device does
760       not  support unified virtual addressing, either an additional registra‐
761       tion cache is required to track this device memory,  or  device  memory
762       cannot be cached.
763
764       As  a  general  rule, if hardware requires the FI_MR_LOCAL mode bit de‐
765       scribed above, but this is not supported by the application,  a  memory
766       registration  cache may be in use.  The following environment variables
767       may be used to configure registration caches.
768
769       FI_MR_CACHE_MAX_SIZE
770              This defines the total number of bytes for  all  memory  regions
771              that  may be tracked by the cache.  If not set, the cache has no
772              limit on how many bytes may be registered and  cached.   Setting
773              this will reduce the amount of memory that is not actively being
774              used as part of a  data  transfer  that  is  registered  with  a
775              provider.  By default, the cache size is unlimited.
776
777       FI_MR_CACHE_MAX_COUNT
778              This defines the total number of memory regions that may be reg‐
779              istered with the cache.  If not set, a default limit is  chosen.
780              Setting  this  will reduce the number of regions that are regis‐
781              tered, regardless of their size, which are  not  actively  being
782              used as part of a data transfer.  Setting this to zero will dis‐
783              able registration caching.
784
785       FI_MR_CACHE_MONITOR
786              The cache monitor is responsible  for  detecting  system  memory
787              (FI_HMEM_SYSTEM) changes made between the virtual addresses used
788              by an application and the underlying physical pages.  Valid mon‐
789              itor  options are: userfaultfd, memhooks, and disabled.  Select‐
790              ing disabled will turn off the registration cache.   Userfaultfd
791              is a Linux kernel feature used to report virtual to physical ad‐
792              dress mapping changes to user space.  Memhooks operates  by  in‐
793              tercepting  relevant  memory  allocation  and deallocation calls
794              which may result in the mappings changing, such as malloc, mmap,
795              free, etc.  Note that memhooks operates at the elf linker layer,
796              and does not use glibc memory hooks.
797
798       FI_MR_CUDA_CACHE_MONITOR_ENABLED
799              The CUDA cache monitor is responsible for detecting CUDA  device
800              memory  (FI_HMEM_CUDA)  changes  made between the device virtual
801              addresses used by an application and the underlying device phys‐
802              ical  pages.   Valid monitor options are: 0 or 1.  Note that the
803              CUDA memory monitor requires a CUDA toolkit version with unified
804              virtual addressing enabled.
805
806       FI_MR_ROCR_CACHE_MONITOR_ENABLED
807              The  ROCR cache monitor is responsible for detecting ROCR device
808              memory (FI_HMEM_ROCR) changes made between  the  device  virtual
809              addresses used by an application and the underlying device phys‐
810              ical pages.  Valid monitor options are: 0 or 1.  Note  that  the
811              ROCR memory monitor requires a ROCR version with unified virtual
812              addressing enabled.
813
814       FI_MR_ZE_CACHE_MONITOR_ENABLED
815              The ZE cache monitor is responsible for detecting  oneAPI  Level
816              Zero  device memory (FI_HMEM_ZE) changes made between the device
817              virtual addresses used by an application and the underlying  de‐
818              vice physical pages.  Valid monitor options are: 0 or 1.
819
820       More  direct  access  to  the  internal  registration cache is possible
821       through the fi_open() call, using the “mr_cache”  service  name.   Once
822       opened, custom memory monitors may be installed.  A memory monitor is a
823       component of the cache responsible for detecting changes in virtual  to
824       physical  address  mappings.   Some  level of control over the cache is
825       possible through the above mentioned environment variables.
826

SEE ALSO

828       fi_getinfo(3),  fi_endpoint(3),  fi_domain(3),  fi_rma(3),   fi_msg(3),
829       fi_atomic(3)
830

AUTHORS

832       OpenFabrics.
833
834
835
836Libfabric Programmer’s Manual     2022-03-31                          fi_mr(3)
Impressum