1fi_mr(3)                       Libfabric v1.18.1                      fi_mr(3)
2
3
4

NAME

6       fi_mr - Memory region operations
7
8       fi_mr_reg / fi_mr_regv / fi_mr_regattr
9              Register local memory buffers for direct fabric access
10
11       fi_close
12              Deregister registered memory buffers.
13
14       fi_mr_desc
15              Return  a  local  descriptor associated with a registered memory
16              region
17
18       fi_mr_key
19              Return the remote key needed to access a registered  memory  re‐
20              gion
21
22       fi_mr_raw_attr
23              Return raw memory region attributes.
24
25       fi_mr_map_raw
26              Converts  a  raw memory region key into a key that is usable for
27              data transfer operations.
28
29       fi_mr_unmap_key
30              Releases a previously mapped raw memory region key.
31
32       fi_mr_bind
33              Associate a registered memory region with a  completion  counter
34              or an endpoint.
35
36       fi_mr_refresh
37              Updates the memory pages associated with a memory region.
38
39       fi_mr_enable
40              Enables a memory region for use.
41
42       fi_hmem_ze_device
43              Returns  an  hmem  device identifier for a level zero driver and
44              device.
45

SYNOPSIS

47              #include <rdma/fi_domain.h>
48
49              int fi_mr_reg(struct fid_domain *domain, const void *buf, size_t len,
50                  uint64_t access, uint64_t offset, uint64_t requested_key,
51                  uint64_t flags, struct fid_mr **mr, void *context);
52
53              int fi_mr_regv(struct fid_domain *domain, const struct iovec * iov,
54                  size_t count, uint64_t access, uint64_t offset, uint64_t requested_key,
55                  uint64_t flags, struct fid_mr **mr, void *context);
56
57              int fi_mr_regattr(struct fid_domain *domain, const struct fi_mr_attr *attr,
58                  uint64_t flags, struct fid_mr **mr);
59
60              int fi_close(struct fid *mr);
61
62              void * fi_mr_desc(struct fid_mr *mr);
63
64              uint64_t fi_mr_key(struct fid_mr *mr);
65
66              int fi_mr_raw_attr(struct fid_mr *mr, uint64_t *base_addr,
67                  uint8_t *raw_key, size_t *key_size, uint64_t flags);
68
69              int fi_mr_map_raw(struct fid_domain *domain, uint64_t base_addr,
70                  uint8_t *raw_key, size_t key_size, uint64_t *key, uint64_t flags);
71
72              int fi_mr_unmap_key(struct fid_domain *domain, uint64_t key);
73
74              int fi_mr_bind(struct fid_mr *mr, struct fid *bfid, uint64_t flags);
75
76              int fi_mr_refresh(struct fid_mr *mr, const struct iovec *iov,
77                  size_t count, uint64_t flags);
78
79              int fi_mr_enable(struct fid_mr *mr);
80
81              int fi_hmem_ze_device(int driver_index, int device_index);
82

ARGUMENTS

84       domain Resource domain
85
86       mr     Memory region
87
88       bfid   Fabric identifier of an associated resource.
89
90       context
91              User specified context associated with the memory region.
92
93       buf    Memory buffer to register with the fabric hardware.
94
95       len    Length of memory buffer to register.  Must be > 0.
96
97       iov    Vectored memory buffer.
98
99       count  Count of vectored buffer entries.
100
101       access Memory access permissions associated with registration
102
103       offset Optional specified offset  for  accessing  specified  registered
104              buffers.   This parameter is reserved for future use and must be
105              0.
106
107       requested_key
108              Requested remote key associated with registered buffers.  Param‐
109              eter  is  ignored  if  FI_MR_PROV_KEY  flag is set in the domain
110              mr_mode bits.
111
112       attr   Memory region attributes
113
114       flags  Additional flags to apply to the operation.
115

DESCRIPTION

117       Registered memory regions associate  memory  buffers  with  permissions
118       granted for access by fabric resources.  A memory buffer must be regis‐
119       tered with a resource domain before it can be used as the target  of  a
120       remote  RMA  or  atomic data transfer.  Additionally, a fabric provider
121       may require that data buffers be registered before being used in  local
122       transfers.   Memory  registration  restrictions  are controlled using a
123       separate set of mode bits,  specified  through  the  domain  attributes
124       (mr_mode  field).   Each  mr_mode bit requires that an application take
125       specific steps in order to use memory  buffers  with  libfabric  inter‐
126       faces.
127
128       The following apply to memory registration.
129
130       Default Memory Registration
131              If no mr_mode bits are set, the default behaviors describe below
132              are followed.  Historically, these  defaults  were  collectively
133              referred  to  as  scalable memory registration.  The default re‐
134              quirements are outlined below, followed by  definitions  of  how
135              each mr_mode bit alters the definition.
136
137       Compatibility: For library versions 1.4 and earlier, this was indicated
138       by setting mr_mode to FI_MR_SCALABLE and the fi_info  mode  bit  FI_LO‐
139       CAL_MR to 0.  FI_MR_SCALABLE and FI_LOCAL_MR were deprecated in libfab‐
140       ric version 1.5, though they are supported for backwards  compatibility
141       purposes.
142
143       For security, memory registration is required for data buffers that are
144       accessed directly by a peer process.  For example, registration is  re‐
145       quired  for RMA target buffers (read or written to), and those accessed
146       by atomic or collective operations.
147
148       By default, registration occurs on  virtual  address  ranges.   Because
149       registration  refers to address ranges, rather than allocated data buf‐
150       fers, the address ranges do not need to map to data  buffers  allocated
151       by the application at the time the registration call is made.  That is,
152       an application can register any range of addresses in their virtual ad‐
153       dress  space,  whether  or  not  those addresses are backed by physical
154       pages or have been allocated.
155
156       Note that physical pages must back addresses prior to the addresses be‐
157       ing  accessed  as part of a data transfer operation, or the data trans‐
158       fers will fail.  Additionally, depending on the operation,  this  could
159       result  in the local process receiving a segmentation fault for access‐
160       ing invalid memory.
161
162       Once registered, the resulting memory regions are accessible  by  peers
163       starting  at  a base address of 0.  That is, the target address that is
164       specified is a byte offset into the registered region.
165
166       The application also selects the access key  associated  with  the  MR.
167       The key size is restricted to a maximum of 8 bytes.
168
169       With  scalable registration, locally accessed data buffers are not reg‐
170       istered.  This includes source buffers for all  transmit  operations  –
171       sends,  tagged  sends, RMA, and atomics – as well as buffers posted for
172       receive and tagged receive operations.
173
174       Although the default memory registration behavior is convenient for ap‐
175       plication  developers,  it  is difficult to implement in hardware.  At‐
176       tempts to hide the hardware requirements from the application often re‐
177       sults in significant and unacceptable impacts to performance.  The fol‐
178       lowing mr_mode bits are  provided  as  input  into  fi_getinfo.   If  a
179       provider  requires  the  behavior  defined  for an mr_mode bit, it will
180       leave the bit set on output to fi_getinfo.  Otherwise, the provider can
181       clear the bit to indicate that the behavior is not needed.
182
183       By setting an mr_mode bit, the application has agreed to adjust its be‐
184       havior as indicated.  Importantly, applications that choose to  support
185       an mr_mode must be prepared to handle the case where the mr_mode is not
186       required.  A provider will clear an mr_mode bit if it is not needed.
187
188       FI_MR_LOCAL
189              When the FI_MR_LOCAL mode bit is set, applications must register
190              all data buffers that will be accessed by the local hardware and
191              provide a valid desc parameter into applicable data transfer op‐
192              erations.   When  FI_MR_LOCAL  is zero, applications are not re‐
193              quired to register data buffers before using them for local  op‐
194              erations (e.g. send and receive data buffers).  The desc parame‐
195              ter into data transfer operations will be ignored in this  case,
196              unless  otherwise  required  (e.g. se FI_MR_HMEM).  It is recom‐
197              mended that applications pass in NULL  for  desc  when  not  re‐
198              quired.
199
200       A  provider  may hide local registration requirements from applications
201       by making use of an internal registration cache or similar  mechanisms.
202       Such  mechanisms,  however,  may negatively impact performance for some
203       applications, notably those which manage their own network buffers.  In
204       order  to  support  as broad range of applications as possible, without
205       unduly affecting their performance, applications that  wish  to  manage
206       their own local memory registrations may do so by using the memory reg‐
207       istration calls.
208
209       Note: the FI_MR_LOCAL mr_mode bit replaces the FI_LOCAL_MR fi_info mode
210       bit.  When FI_MR_LOCAL is set, FI_LOCAL_MR is ignored.
211
212       FI_MR_RAW
213              Raw memory regions are used to support providers with keys larg‐
214              er than  64-bits  or  require  setup  at  the  peer.   When  the
215              FI_MR_RAW bit is set, applications must use fi_mr_raw_attr() lo‐
216              cally and fi_mr_map_raw() at the peer before targeting a  memory
217              region as part of any data transfer request.
218
219       FI_MR_VIRT_ADDR
220              The  FI_MR_VIRT_ADDR  bit indicates that the provider references
221              memory regions by virtual address, rather than a 0-based offset.
222              Peers that target memory regions registered with FI_MR_VIRT_ADDR
223              specify the destination memory buffer using the target’s virtual
224              address,  with  any  offset into the region specified as virtual
225              address + offset.  Support of this bit  typically  implies  that
226              peers  must exchange addressing data prior to initiating any RMA
227              or atomic operation.
228
229       FI_MR_ALLOCATED
230              When set, all registered memory regions must be backed by physi‐
231              cal memory pages at the time the registration call is made.
232
233       FI_MR_PROV_KEY
234              This  memory  region  mode  indicates that the provider does not
235              support application requested MR keys.  MR keys are returned  by
236              the  provider.  Applications that support FI_MR_PROV_KEY can ob‐
237              tain the provider key using fi_mr_key(), unless FI_MR_RAW is al‐
238              so  set.   The  returned key should then be exchanged with peers
239              prior to initiating an RMA or atomic operation.
240
241       FI_MR_MMU_NOTIFY
242              FI_MR_MMU_NOTIFY is typically set by providers that support mem‐
243              ory registration against memory regions that are not necessarily
244              backed by allocated physical pages at the time the memory regis‐
245              tration  occurs.   (That  is,  FI_MR_ALLOCATED  is typically 0).
246              However, such providers require  that  applications  notify  the
247              provider prior to the MR being accessed as part of a data trans‐
248              fer operation.  This notification informs the provider that  all
249              necessary  physical pages now back the region.  The notification
250              is necessary for providers that cannot hook  directly  into  the
251              operating  system  page  tables  or memory management unit.  See
252              fi_mr_refresh() for notification details.
253
254       FI_MR_RMA_EVENT
255              This mode bit indicates that the provider must configure  memory
256              regions  that are associated with RMA events prior to their use.
257              This includes all memory regions that are associated  with  com‐
258              pletion  counters.   When  set,  applications must indicate if a
259              memory region will be associated with a  completion  counter  as
260              part  of  the region’s creation.  This is done by passing in the
261              FI_RMA_EVENT flag to the memory registration call.
262
263       Such memory regions will be created in a disabled state and must be as‐
264       sociated  with  all completion counters prior to being enabled.  To en‐
265       able a memory region, the application must call fi_mr_enable().   After
266       calling fi_mr_enable(), no further resource bindings may be made to the
267       memory region.
268
269       FI_MR_ENDPOINT
270              This mode bit indicates that the provider associates memory  re‐
271              gions  with  endpoints rather than domains.  Memory regions that
272              are registered with the provider are created in a disabled state
273              and  must  be  bound  to an endpoint prior to being enabled.  To
274              bind  the  MR  with  an  endpoint,  the  application  must   use
275              fi_mr_bind().  To enable the memory region, the application must
276              call fi_mr_enable().
277
278       FI_MR_HMEM
279              This mode bit is associated with  the  FI_HMEM  capability.   If
280              FI_MR_HMEM  is  set,  the application must register buffers that
281              were allocated using a device call and provide a valid desc  pa‐
282              rameter  into  applicable  data transfer operations even if they
283              are only used for local operations (e.g. send and  receive  data
284              buffers).   Device memory must be registered using the fi_mr_re‐
285              gattr call, with the iface and device fields filled out.
286
287       If FI_MR_HMEM is set, but FI_MR_LOCAL is  unset,  only  device  buffers
288       must be registered when used locally.  In this case, the desc parameter
289       passed into data transfer operations must  either  be  valid  or  NULL.
290       Similarly,  if  FI_MR_LOCAL is set, but FI_MR_HMEM is not, the desc pa‐
291       rameter must either be valid or NULL.
292
293       FI_MR_COLLECTIVE
294              This bit is associated with the FI_COLLECTIVE capability.   When
295              set,  the  provider requires that memory regions used in collec‐
296              tion operations must explicitly be registered for use with  col‐
297              lective calls.  This requires registering regions passed to col‐
298              lective calls using the FI_COLLECTIVE flag.
299
300       Basic Memory Registration
301              Basic memory registration was deprecated  in  libfabric  version
302              1.5, but is supported for backwards compatibility.  Basic memory
303              registration is indicated by setting mr_mode equal to  FI_MR_BA‐
304              SIC.   FI_MR_BASIC must be set alone and not paired with mr_mode
305              bits.  Unlike other mr_mode bits, if FI_MR_BASIC is set on input
306              to  fi_getinfo(),  it will not be cleared by the provider.  That
307              is, setting mr_mode equal to FI_MR_BASIC forces basic  registra‐
308              tion if the provider supports it.
309
310       The  behavior of basic registration is equivalent to requiring the fol‐
311       lowing   mr_mode   bits:    FI_MR_VIRT_ADDR,    FI_MR_ALLOCATED,    and
312       FI_MR_PROV_KEY.   Additionally,  providers that support basic registra‐
313       tion usually require the (deprecated)  fi_info  mode  bit  FI_LOCAL_MR,
314       which was incorporated into the FI_MR_LOCAL mr_mode bit.
315
316       The  registrations functions – fi_mr_reg, fi_mr_regv, and fi_mr_regattr
317       – are used to register one or  more  memory  regions  with  fabric  re‐
318       sources.   The  main  difference between registration functions are the
319       number and type of parameters that they accept  as  input.   Otherwise,
320       they perform the same general function.
321
322       By  default,  memory  registration  completes synchronously.  I.e.  the
323       registration call will not return until the registration has completed.
324       Memory  registration  can complete asynchronous by binding the resource
325       domain to an event queue using the FI_REG_MR flag.  See fi_domain_bind.
326       When memory registration is asynchronous, in order to avoid a race con‐
327       dition between the registration call returning  and  the  corresponding
328       reading of the event from the EQ, the mr output parameter will be writ‐
329       ten before any event associated with the operation may be read  by  the
330       application.   An  asynchronous  event will not be generated unless the
331       registration call returns success (0).
332
333   fi_mr_reg
334       The fi_mr_reg call registers the user-specified memory buffer with  the
335       resource  domain.  The buffer is enabled for access by the fabric hard‐
336       ware based on the provided access permissions.  See  the  access  field
337       description for memory region attributes below.
338
339       Registered memory is associated with a local memory descriptor and, op‐
340       tionally, a remote memory key.  A memory descriptor is a provider  spe‐
341       cific identifier associated with registered memory.  Memory descriptors
342       often map to hardware specific indices or keys associated with the mem‐
343       ory  region.  Remote memory keys provide limited protection against un‐
344       wanted access by a remote node.  Remote accesses  to  a  memory  region
345       must provide the key associated with the registration.
346
347       Because  MR  keys  must be provided by a remote process, an application
348       can use the requested_key parameter to indicate  that  a  specific  key
349       value be returned.  Support for user requested keys is provider specif‐
350       ic and is determined by the FI_MR_PROV_KEY flag value  in  the  mr_mode
351       domain attribute.
352
353       Remote  RMA and atomic operations indicate the location within a regis‐
354       tered memory region by specifying an address.  The location  is  refer‐
355       enced  by  adding  the offset to either the base virtual address of the
356       buffer or to 0, depending on the mr_mode.
357
358       The offset parameter is reserved for future use and must be 0.
359
360       For asynchronous memory registration requests, the result will  be  re‐
361       ported  to the user through an event queue associated with the resource
362       domain.  If successful, the allocated memory region structure  will  be
363       returned to the user through the mr parameter.  The mr address must re‐
364       main valid until the registration  operation  completes.   The  context
365       specified with the registration request is returned with the completion
366       event.
367
368   fi_mr_regv
369       The  fi_mr_regv  call  adds  support  for  a  scatter-gather  list   to
370       fi_mr_reg.   Multiple  memory buffers are registered as a single memory
371       region.  Otherwise, the operation is the same.
372
373   fi_mr_regattr
374       The fi_mr_regattr call is a more generic, extensible registration  call
375       that allows the user to specify the registration request using a struct
376       fi_mr_attr (defined below).
377
378   fi_close
379       Fi_close is used to release all resources associated with a registering
380       a  memory  region.  Once unregistered, further access to the registered
381       memory is not guaranteed.  Active or queued operations that reference a
382       memory  region  being  closed may fail or result in accesses to invalid
383       memory.  Applications are responsible for ensuring  that  a  MR  is  no
384       longer  needed  prior to closing it.  Note that accesses to a closed MR
385       from a remote peer will result in an error at the peer.  The  state  of
386       the local endpoint will be unaffected.
387
388       When  closing the MR, there must be no opened endpoints or counters as‐
389       sociated with the MR.  If resources are still associated  with  the  MR
390       when attempting to close, the call will return -FI_EBUSY.
391
392   fi_mr_desc
393       Obtains  the  local memory descriptor associated with a MR.  The memory
394       registration must have  completed  successfully  before  invoking  this
395       call.
396
397   fi_mr_key
398       Returns  the  remote  protection  key associated with a MR.  The memory
399       registration must have completed  successfully  before  invoking  this.
400       The returned key may be used in data transfer operations at a peer.  If
401       the FI_RAW_MR mode bit has been set for the domain, then the memory key
402       must  be  obtained  using the fi_mr_raw_key function instead.  A return
403       value of FI_KEY_NOTAVAIL will be returned if the registration  has  not
404       completed or a raw memory key is required.
405
406   fi_mr_raw_attr
407       Returns the raw, remote protection key and base address associated with
408       a MR.  The memory registration must have completed successfully  before
409       invoking  this  routine.  Use of this call is required if the FI_RAW_MR
410       mode bit has been set by the provider; however, it is safe to use  this
411       call with any memory region.
412
413       On  input,  the  key_size  parameter  should  indicate  the size of the
414       raw_key buffer.  If the actual key is larger than what can fit into the
415       buffer,  it  will  return -FI_ETOOSMALL.  On output, key_size is set to
416       the size of the buffer needed to store the key,  which  may  be  larger
417       than the input value.  The needed key_size can also be obtained through
418       the mr_key_size domain attribute (fi_domain_attr) field.
419
420       A raw key must be mapped by a peer before it can be used in data trans‐
421       fer operations.  See fi_mr_map_raw below.
422
423   fi_mr_map_raw
424       Raw  protection  keys  must be mapped to a usable key value before they
425       can be used for data transfer operations.  The mapping is done  by  the
426       peer  that initiates the RMA or atomic operation.  The mapping function
427       takes as input the raw key and its size, and returns  the  mapped  key.
428       Use  of  the  fi_mr_map_raw  function  is  required if the peer has the
429       FI_RAW_MR mode bit set, but this routine may be  called  on  any  valid
430       key.  All mapped keys must be freed by calling fi_mr_unmap_key when ac‐
431       cess to the peer memory region is no longer necessary.
432
433   fi_mr_unmap_key
434       This call releases any resources that may have been allocated  as  part
435       of  mapping a raw memory key.  All mapped keys must be freed before the
436       corresponding domain is closed.
437
438   fi_mr_bind
439       The fi_mr_bind function associates a memory region with  a  counter  or
440       endpoint.   Counter  bindings  are needed by providers that support the
441       generation of completions based on fabric operations.   Endpoint  bind‐
442       ings  are  needed  if  the provider associates memory regions with end‐
443       points (see FI_MR_ENDPOINT).
444
445       When binding with a counter, the type of  events  tracked  against  the
446       memory region is based on the bitwise OR of the following flags.
447
448       FI_REMOTE_WRITE
449              Generates  an event whenever a remote RMA write or atomic opera‐
450              tion modifies the memory region.  Use of this flag requires that
451              the  endpoint  through  which the MR is accessed be created with
452              the FI_RMA_EVENT capability.
453
454       When binding the memory region to an endpoint, flags should be 0.
455
456   fi_mr_refresh
457       The use of this call is required to notify the provider of  any  change
458       to  the  physical  pages  backing  a  registered  memory  region if the
459       FI_MR_MMU_NOTIFY mode bit has been set.  This call informs the provider
460       that  the  page  table entries associated with the region may have been
461       modified, and the provider should verify and update the registered  re‐
462       gion  accordingly.   The  iov  parameter is optional and may be used to
463       specify which portions of  the  registered  region  requires  updating.
464       Providers are only guaranteed to update the specified address ranges.
465
466       The  refresh  operation has the effect of disabling and re-enabling ac‐
467       cess to the registered region.  Any operations from peers that  attempt
468       to  access  the region will fail while the refresh is occurring.  Addi‐
469       tionally, attempts to access the region by the  local  process  through
470       libfabric APIs may result in a page fault or other fatal operation.
471
472       The  fi_mr_refresh call is only needed if the physical pages might have
473       been updated after the memory region was created.
474
475   fi_mr_enable
476       The enable call is used with memory registration  associated  with  the
477       FI_MR_RMA_EVENT mode bit.  Memory regions created in the disabled state
478       must be explicitly enabled after being fully configured by the applica‐
479       tion.   Any  resource bindings to the MR must be done prior to enabling
480       the MR.
481

MEMORY REGION ATTRIBUTES

483       Memory regions are created using the following attributes.  The  struct
484       fi_mr_attr is passed into fi_mr_regattr, but individual fields also ap‐
485       ply to other memory registration calls, with the fields passed directly
486       into calls as function parameters.
487
488              struct fi_mr_attr {
489                  const struct iovec *mr_iov;
490                  size_t             iov_count;
491                  uint64_t           access;
492                  uint64_t           offset;
493                  uint64_t           requested_key;
494                  void               *context;
495                  size_t             auth_key_size;
496                  uint8_t            *auth_key;
497                  enum fi_hmem_iface iface;
498                  union {
499                      uint64_t       reserved;
500                      int            cuda;
501                      int            ze
502                      int            neuron;
503                      int            synapseai;
504                  } device;
505              };
506
507   mr_iov
508       This  is  an IO vector of addresses that will represent a single memory
509       region.  The number of entries in the iovec is specified by iov_count.
510
511   iov_count
512       The number of entries in the mr_iov array.  The maximum number of memo‐
513       ry buffers that may be associated with a single memory region is speci‐
514       fied as the mr_iov_limit domain attribute.  See fi_domain(3).
515
516   access
517       Indicates the type of operations that the local or a peer endpoint  may
518       perform  on registered memory region.  Supported access permissions are
519       the bitwise OR of the following flags:
520
521       FI_SEND
522              The memory buffer may be used in outgoing  message  data  trans‐
523              fers.   This  includes  fi_msg and fi_tagged send operations, as
524              well as fi_collective operations.
525
526       FI_RECV
527              The memory buffer may be used to receive inbound message  trans‐
528              fers.  This includes fi_msg and fi_tagged receive operations, as
529              well as fi_collective operations.
530
531       FI_READ
532              The memory buffer may be used as the result buffer for RMA  read
533              and atomic operations on the initiator side.  Note that from the
534              viewpoint of the application, the memory buffer is being written
535              into by the network.
536
537       FI_WRITE
538              The memory buffer may be used as the source buffer for RMA write
539              and atomic operations on the initiator side.  Note that from the
540              viewpoint  of  the application, the endpoint is reading from the
541              memory buffer and copying the data onto the network.
542
543       FI_REMOTE_READ
544              The memory buffer may be used as the source  buffer  of  an  RMA
545              read  operation  on the target side.  The contents of the memory
546              buffer are not modified by such operations.
547
548       FI_REMOTE_WRITE
549              The memory buffer may be used as the target  buffer  of  an  RMA
550              write  or  atomic  operation.  The contents of the memory buffer
551              may be modified as a result of such operations.
552
553       FI_COLLECTIVE
554              This flag provides an explicit indication that the memory buffer
555              may be used with collective operations.  Use of this flag is re‐
556              quired if the FI_MR_COLLECTIVE mr_mode bit has been set  on  the
557              domain.  This flag should be paired with FI_SEND and/or FI_RECV
558
559       Note  that  some  providers may not enforce fine grained access permis‐
560       sions.  For example, a memory region registered for FI_WRITE access may
561       also  behave as if FI_SEND were specified as well.  Relaxed enforcement
562       of such access is permitted, though not guaranteed,  provided  security
563       is maintained.
564
565   offset
566       The offset field is reserved for future use and must be 0.
567
568   requested_key
569       An  application specified access key associated with the memory region.
570       The MR key must be provided by a remote process when performing RMA  or
571       atomic  operations  to  a  memory region.  Applications can use the re‐
572       quested_key field to indicate that  a  specific  key  be  used  by  the
573       provider.  This allows applications to use well known key values, which
574       can avoid applications needing to exchange and store keys.  Support for
575       user  requested  keys is provider specific and is determined by the the
576       FI_MR_PROV_KEY flag in the mr_mode domain attribute field.
577
578   context
579       Application context associated with  asynchronous  memory  registration
580       operations.   This  value is returned as part of any asynchronous event
581       associated with the registration.  This field is ignored  for  synchro‐
582       nous registration calls.
583
584   auth_key_size
585       The  size  of key referenced by the auth_key field in bytes, or 0 if no
586       authorization key is given.  This field is ignored unless the fabric is
587       opened with API version 1.5 or greater.
588
589   auth_key
590       Indicates  the  key to associate with this memory registration.  Autho‐
591       rization keys are used to limit communication between endpoints.   Only
592       peer  endpoints  that  are programmed to use the same authorization key
593       may access the memory region.  The domain  authorization  key  will  be
594       used  if the auth_key_size provided is 0.  This field is ignored unless
595       the fabric is opened with API version 1.5 or greater.
596
597   iface
598       Indicates the software interfaces used by the application  to  allocate
599       and  manage the memory region.  This field is ignored unless the appli‐
600       cation has requested the FI_HMEM capability.
601
602       FI_HMEM_SYSTEM
603              Uses standard operating system calls and libraries, such as mal‐
604              loc, calloc, realloc, mmap, and free.
605
606       FI_HMEM_CUDA
607              Uses  Nvidia CUDA interfaces such as cuMemAlloc, cuMemAllocHost,
608              cuMemAllocManaged, cuMemFree, cudaMalloc, cudaFree.
609
610       FI_HMEM_ROCR
611              Uses  AMD  ROCR  interfaces  such  as  hsa_memory_allocate   and
612              hsa_memory_free.
613
614       FI_HMEM_ZE
615              Uses  oneAPI  Level Zero interfaces such as zeDriverAllocShared‐
616              Mem, zeDriverFreeMem.
617
618       FI_HMEM_NEURON
619              Uses the AWS Neuron SDK to support AWS Trainium devices.
620
621       FI_HMEM_SYNAPSEAI
622              Uses the SynapseAI API to support Habana Gaudi devices.
623
624   device
625       Reserved 64 bits for device identifier if using non-standard  HMEM  in‐
626       terface.  This field is ignore unless the iface field is valid.
627
628       cuda   For FI_HMEM_CUDA, this is equivalent to CUdevice (int).
629
630       ze     For FI_HMEM_ZE, this is equivalent to the index of the device in
631              the ze_device_handle_t array.  If there is only a  single  level
632              zero driver present, an application may set this directly.  How‐
633              ever, it is  recommended  that  this  value  be  set  using  the
634              fi_hmem_ze_device()  macro,  which  will encode the driver index
635              with the device.
636
637       neuron For FI_HMEM_NEURON, the device identifier for AWS  Trainium  de‐
638              vices.
639
640       synapseai
641              For  FI_HMEM_SYNAPSEAI,  the  device identifier for Habana Gaudi
642              hardware.
643
644   fi_hmem_ze_device
645       Returns an hmem device identifier for a level zero <driver, device> tu‐
646       ple.   The  output  of  this call should be used to set fi_mr_attr::de‐
647       vice.ze for FI_HMEM_ZE interfaces.  The driver and device index  values
648       represent  their  0-based positions in arrays returned from zeDriverGet
649       and zeDeviceGet, respectively.
650

NOTES

652       Direct access to an application’s memory by a remote peer requires that
653       the  application register the targeted memory buffer(s).  This is typi‐
654       cally  done  by  calling  one  of   the   fi_mr_reg*   routines.    For
655       FI_MR_PROV_KEY, the provider will return a key that must be used by the
656       peer when accessing the memory region.  The application is  responsible
657       for transferring this key to the peer.  If FI_MR_RAW mode has been set,
658       the key must be retrieved using the fi_mr_raw_attr function.
659
660       FI_RAW_MR allows support for providers that require more  than  8-bytes
661       for  their protection keys or need additional setup before a key can be
662       used for transfers.  After a raw key has been retrieved, it must be ex‐
663       changed  with the remote peer.  The peer must use fi_mr_map_raw to con‐
664       vert the raw key into a usable 64-bit key.  The mapping  must  be  done
665       even if the raw key is 64-bits or smaller.
666
667       The raw key support functions are usable with all registered memory re‐
668       gions, even if FI_MR_RAW has not been set.  It is recommended that por‐
669       table  applications  target  using those interfaces; however, their use
670       does carry extra message and memory footprint overhead, making it  less
671       desirable for highly scalable apps.
672
673       There may be cases where device peer to peer support should not be used
674       or cannot be used, such as when the PCIe  ACS  configuration  does  not
675       permit  the transfer.  The FI_HMEM_DISABLE_P2P environment variable can
676       be set to notify Libfabric that peer to peer transactions should not be
677       used.   The provider may choose to perform a copy instead, or will fail
678       support for FI_HMEM if the provider is unable to do that.
679

FLAGS

681       The follow flag may be specified to any memory registration call.
682
683       FI_RMA_EVENT
684              This flag indicates that the specified memory region will be as‐
685              sociated  with a completion counter used to count RMA operations
686              that access the MR.
687
688       FI_RMA_PMEM
689              This flag indicates that the underlying memory region is  backed
690              by  persistent  memory  and  will be used in RMA operations.  It
691              must be specified if persistent completion semantics or  persis‐
692              tent  data  transfers are required when accessing the registered
693              region.
694
695       FI_HMEM_DEVICE_ONLY
696              This flag indicates that the memory is only accessible by a  de‐
697              vice.   Which device is specified by the fi_mr_attr fields iface
698              and device.  This refers to memory regions that  were  allocated
699              using  a  device  API  AllocDevice call (as opposed to using the
700              host allocation or unified/shared memory allocation).
701
702       FI_HMEM_HOST_ALLOC
703              This flag indicates that the memory is owned by the  host  only.
704              Whether  it  can be accessed by the device is implementation de‐
705              pendent.  The fi_mr_attr field iface is still used  to  identify
706              the device API, but the field device is ignored.  This refers to
707              memory regions that were allocated using a device API  AllocHost
708              call  (as  opposed  to  using  malloc-like host allocation, uni‐
709              fied/shared memory allocation, or AllocDevice).
710

MEMORY DOMAINS

712       Memory domains identify the physical separation of memory which may  or
713       may  not  be accessible through the same virtual address space.  Tradi‐
714       tionally, applications only dealt with a single memory domain, that  of
715       host  memory  tightly coupled with the system CPUs.  With the introduc‐
716       tion of device and non-uniform memory  subsystems,  applications  often
717       need  to  be  aware of which memory domain a particular virtual address
718       maps to.
719
720       As a general rule, separate physical devices can be considered to  have
721       their  own memory domains.  For example, a NIC may have user accessible
722       memory, and would be considered a separate memory domain from memory on
723       a GPU.  Both the NIC and GPU memory domains are separate from host sys‐
724       tem memory.  Individual GPUs or computation accelerators may have  dis‐
725       tinct  memory  domains,  or  may be connected in such a way (e.g. a GPU
726       specific fabric) that all GPUs would belong to the same memory  domain.
727       Unfortunately,  identifying  memory  domains is specific to each system
728       and its physical and/or virtual configuration.
729
730       Understanding memory domains in heterogenous memory environments is im‐
731       portant  as  it can impact data ordering and visibility as viewed by an
732       application.  It is also important to understand which memory domain an
733       application  is  most  tightly coupled to.  In most cases, applications
734       are tightly coupled to host memory.  However,  an  application  running
735       directly  on a GPU or NIC may be more tightly coupled to memory associ‐
736       ated with those devices.
737
738       Memory regions are often associated with a single memory  domain.   The
739       domain  is  often  indicated by the fi_mr_attr iface and device fields.
740       Though it is possible for physical pages backing a virtual  memory  re‐
741       gion  to  migrate between memory domains based on access patterns.  For
742       example, the physical pages referenced by a virtual address range could
743       migrate between host memory and GPU memory, depending on which computa‐
744       tional unit is actively using it.
745
746       See the fi_endpoint(3) and fi_cq(3) man pages for  addition  discussion
747       on  message, data, and completion ordering semantics, including the im‐
748       pact of memory domains.
749

RETURN VALUES

751       Returns 0 on success.  On error, a negative value corresponding to fab‐
752       ric errno is returned.
753
754       Fabric errno values are defined in rdma/fi_errno.h.
755

ERRORS

757       -FI_ENOKEY
758              The requested_key is already in use.
759
760       -FI_EKEYREJECTED
761              The  requested_key is not available.  They key may be out of the
762              range supported by the provider, or the provider may not support
763              user-requested memory registration keys.
764
765       -FI_ENOSYS
766              Returned  by fi_mr_bind if the provider does not support report‐
767              ing events based on access to registered memory regions.
768
769       -FI_EBADFLAGS
770              Returned if  the  specified  flags  are  not  supported  by  the
771              provider.
772

MEMORY REGISTRATION CACHE

774       Many  hardware  NICs accessed by libfabric require that data buffers be
775       registered with the hardware while the hardware accesses it.  This  en‐
776       sures  that  the virtual to physical address mappings for those buffers
777       do not change while the transfer is occurring.  The performance  impact
778       of  registering  memory  regions can be significant.  As a result, some
779       providers make use of a registration cache, particularly  when  working
780       with  applications that are unable to manage their own network buffers.
781       A registration cache avoids the overhead of registering and unregister‐
782       ing a data buffer with each transfer.
783
784       If a registration cache is going to be used for host and device memory,
785       the device must support unified virtual addressing.  If the device does
786       not  support unified virtual addressing, either an additional registra‐
787       tion cache is required to track this device memory,  or  device  memory
788       cannot be cached.
789
790       As  a  general  rule, if hardware requires the FI_MR_LOCAL mode bit de‐
791       scribed above, but this is not supported by the application,  a  memory
792       registration  cache may be in use.  The following environment variables
793       may be used to configure registration caches.
794
795       FI_MR_CACHE_MAX_SIZE
796              This defines the total number of bytes for  all  memory  regions
797              that  may be tracked by the cache.  If not set, the cache has no
798              limit on how many bytes may be registered and  cached.   Setting
799              this will reduce the amount of memory that is not actively being
800              used as part of a  data  transfer  that  is  registered  with  a
801              provider.  By default, the cache size is unlimited.
802
803       FI_MR_CACHE_MAX_COUNT
804              This defines the total number of memory regions that may be reg‐
805              istered with the cache.  If not set, a default limit is  chosen.
806              Setting  this  will reduce the number of regions that are regis‐
807              tered, regardless of their size, which are  not  actively  being
808              used as part of a data transfer.  Setting this to zero will dis‐
809              able registration caching.
810
811       FI_MR_CACHE_MONITOR
812              The cache monitor is responsible  for  detecting  system  memory
813              (FI_HMEM_SYSTEM) changes made between the virtual addresses used
814              by an application and the underlying physical pages.  Valid mon‐
815              itor  options are: userfaultfd, memhooks, and disabled.  Select‐
816              ing disabled will turn off the registration cache.   Userfaultfd
817              is a Linux kernel feature used to report virtual to physical ad‐
818              dress mapping changes to user space.  Memhooks operates  by  in‐
819              tercepting  relevant  memory  allocation  and deallocation calls
820              which may result in the mappings changing, such as malloc, mmap,
821              free, etc.  Note that memhooks operates at the elf linker layer,
822              and does not use glibc memory hooks.
823
824       FI_MR_CUDA_CACHE_MONITOR_ENABLED
825              The CUDA cache monitor is responsible for detecting CUDA  device
826              memory  (FI_HMEM_CUDA)  changes  made between the device virtual
827              addresses used by an application and the underlying device phys‐
828              ical  pages.   Valid monitor options are: 0 or 1.  Note that the
829              CUDA memory monitor requires a CUDA toolkit version with unified
830              virtual addressing enabled.
831
832       FI_MR_ROCR_CACHE_MONITOR_ENABLED
833              The  ROCR cache monitor is responsible for detecting ROCR device
834              memory (FI_HMEM_ROCR) changes made between  the  device  virtual
835              addresses used by an application and the underlying device phys‐
836              ical pages.  Valid monitor options are: 0 or 1.  Note  that  the
837              ROCR memory monitor requires a ROCR version with unified virtual
838              addressing enabled.
839
840       FI_MR_ZE_CACHE_MONITOR_ENABLED
841              The ZE cache monitor is responsible for detecting  oneAPI  Level
842              Zero  device memory (FI_HMEM_ZE) changes made between the device
843              virtual addresses used by an application and the underlying  de‐
844              vice physical pages.  Valid monitor options are: 0 or 1.
845
846       More  direct  access  to  the  internal  registration cache is possible
847       through the fi_open() call, using the “mr_cache”  service  name.   Once
848       opened, custom memory monitors may be installed.  A memory monitor is a
849       component of the cache responsible for detecting changes in virtual  to
850       physical  address  mappings.   Some  level of control over the cache is
851       possible through the above mentioned environment variables.
852

SEE ALSO

854       fi_getinfo(3),  fi_endpoint(3),  fi_domain(3),  fi_rma(3),   fi_msg(3),
855       fi_atomic(3)
856

AUTHORS

858       OpenFabrics.
859
860
861
862Libfabric Programmer’s Manual     2023-03-10                          fi_mr(3)
Impressum