fi_domain(3)

1fi_domain(3)                   Libfabric v1.15.1                  fi_domain(3)
2
3
4

NAME

6       fi_domain - Open a fabric access domain
7

SYNOPSIS

9              #include <rdma/fabric.h>
10
11              #include <rdma/fi_domain.h>
12
13              int fi_domain(struct fid_fabric *fabric, struct fi_info *info,
14                  struct fid_domain **domain, void *context);
15
16              int fi_close(struct fid *domain);
17
18              int fi_domain_bind(struct fid_domain *domain, struct fid *eq,
19                  uint64_t flags);
20
21              int fi_open_ops(struct fid *domain, const char *name, uint64_t flags,
22                  void **ops, void *context);
23
24              int fi_set_ops(struct fid *domain, const char *name, uint64_t flags,
25                  void *ops, void *context);
26

ARGUMENTS

28       fabric Fabric domain
29
30       info   Fabric   information,  including  domain  capabilities  and  at‐
31              tributes.
32
33       domain An opened access domain.
34
35       context
36              User specified context associated with the domain.  This context
37              is  returned  as  part of any asynchronous event associated with
38              the domain.
39
40       eq     Event queue for asynchronous operations initiated on the domain.
41
42       name   Name associated with an interface.
43
44       ops    Fabric interface operations.
45

DESCRIPTION

47       An access domain typically refers to a physical or virtual NIC or hard‐
48       ware  port;  however, a domain may span across multiple hardware compo‐
49       nents for fail-over or data striping purposes.  A  domain  defines  the
50       boundary  for  associating  different  resources  together.  Fabric re‐
51       sources belonging to the same domain may share resources.
52
53   fi_domain
54       Opens a fabric access domain, also referred to as  a  resource  domain.
55       Fabric  domains are identified by a name.  The properties of the opened
56       domain are specified using the info parameter.
57
58   fi_open_ops
59       fi_open_ops is used to open provider specific interfaces.  Provider in‐
60       terfaces  may be used to access low-level resources and operations that
61       are specific to the opened resource domain.  The details of domain  in‐
62       terfaces are outside the scope of this documentation.
63
64   fi_set_ops
65       fi_set_ops  assigns callbacks that a provider should invoke in place of
66       performing selected tasks.  This allows users to modify  or  control  a
67       provider’s  default behavior.  Conceptually, it allows the user to hook
68       specific functions used by a provider and replace it with their own.
69
70       The operations being modified are identified using a well-known charac‐
71       ter string, passed as the name parameter.  The format of the ops param‐
72       eter is dependent upon the name value.  The ops parameter  will  refer‐
73       ence  a  structure  containing the callbacks and other fields needed by
74       the provider to invoke the user’s functions.
75
76       If a provider accepts the override, it will return FI_SUCCESS.  If  the
77       override  is  unknown  or  not  supported,  the  provider  will  return
78       -FI_ENOSYS.  Overrides should be set prior to allocating  resources  on
79       the domain.
80
81       The  following  fi_set_ops operations and corresponding callback struc‐
82       tures are defined.
83
84       FI_SET_OPS_HMEM_OVERRIDE – Heterogeneous Memory Overrides
85
86       HMEM override allows  users  to  override  HMEM  related  operations  a
87       provider  may perform.  Currently, the scope of the HMEM override is to
88       allow a user to define the memory movement functions a provider  should
89       use  when  accessing  a  user buffer.  The user-defined memory movement
90       functions need to account for all the  different  HMEM  iface  types  a
91       provider may encounter.
92
93       All objects allocated against a domain will inherit this override.
94
95       The following is the HMEM override operation name and structure.
96
97              #define FI_SET_OPS_HMEM_OVERRIDE "hmem_override_ops"
98
99              struct fi_hmem_override_ops {
100                  size_t  size;
101
102                  ssize_t (*copy_from_hmem_iov)(void *dest, size_t size,
103                      enum fi_hmem_iface iface, uint64_t device, const struct iovec *hmem_iov,
104                      size_t hmem_iov_count, uint64_t hmem_iov_offset);
105
106                  ssize_t (*copy_to_hmem_iov)(enum fi_hmem_iface iface, uint64_t device,
107                  const struct iovec *hmem_iov, size_t hmem_iov_count,
108                      uint64_t hmem_iov_offset, const void *src, size_t size);
109              };
110
111       All  fields  in struct fi_hmem_override_ops must be set (non-null) to a
112       valid value.
113
114       size   This should be set to the  sizeof(struct  fi_hmem_override_ops).
115              The  size  field  is used for forward and backward compatibility
116              purposes.
117
118       copy_from_hmem_iov
119              Copy data from the device/hmem to host  memory.   This  function
120              should  return  a  negative  fi_errno on error, or the number of
121              bytes copied on success.
122
123       copy_to_hmem_iov
124              Copy data from host memory to the  device/hmem.   This  function
125              should  return  a  negative  fi_errno on error, or the number of
126              bytes copied on success.
127
128   fi_domain_bind
129       Associates an event queue with the domain.  An event queue bound  to  a
130       domain  will  be  the  default  EQ associated with asynchronous control
131       events that occur on the domain or active endpoints allocated on a  do‐
132       main.   This  includes  CM  events.  Endpoints may direct their control
133       events to alternate EQs by binding directly with the EQ.
134
135       Binding an event queue to a domain with the  FI_REG_MR  flag  indicates
136       that  the  provider  should  perform all memory registration operations
137       asynchronously, with the completion reported through the  event  queue.
138       If  an  event queue is not bound to the domain with the FI_REG_MR flag,
139       then memory registration requests complete synchronously.
140
141       See fi_av_bind(3), fi_ep_bind(3),  fi_mr_bind(3),  fi_pep_bind(3),  and
142       fi_scalable_ep_bind(3) for more information.
143
144   fi_close
145       The  fi_close  call  is used to release all resources associated with a
146       domain or interface.  All objects associated  with  the  opened  domain
147       must be released prior to calling fi_close, otherwise the call will re‐
148       turn -FI_EBUSY.
149

DOMAIN ATTRIBUTES

151       The fi_domain_attr structure defines the set of  attributes  associated
152       with a domain.
153
154              struct fi_domain_attr {
155                  struct fid_domain     *domain;
156                  char                  *name;
157                  enum fi_threading     threading;
158                  enum fi_progress      control_progress;
159                  enum fi_progress      data_progress;
160                  enum fi_resource_mgmt resource_mgmt;
161                  enum fi_av_type       av_type;
162                  int                   mr_mode;
163                  size_t                mr_key_size;
164                  size_t                cq_data_size;
165                  size_t                cq_cnt;
166                  size_t                ep_cnt;
167                  size_t                tx_ctx_cnt;
168                  size_t                rx_ctx_cnt;
169                  size_t                max_ep_tx_ctx;
170                  size_t                max_ep_rx_ctx;
171                  size_t                max_ep_stx_ctx;
172                  size_t                max_ep_srx_ctx;
173                  size_t                cntr_cnt;
174                  size_t                mr_iov_limit;
175                  uint64_t              caps;
176                  uint64_t              mode;
177                  uint8_t               *auth_key;
178                  size_t                auth_key_size;
179                  size_t                max_err_data;
180                  size_t                mr_cnt;
181                  uint32_t              tclass;
182              };
183
184   domain
185       On  input  to  fi_getinfo,  a user may set this to an opened domain in‐
186       stance to restrict output to the given domain.  On output from  fi_get‐
187       info,  if  no domain was specified, but the user has an opened instance
188       of the named domain, this will reference the first opened instance.  If
189       no instance has been opened, this field will be NULL.
190
191       The  domain  instance  returned by fi_getinfo should only be considered
192       valid if the application does not close any domain instances  from  an‐
193       other thread while fi_getinfo is being processed.
194
195   Name
196       The name of the access domain.
197
198   Multi-threading Support (threading)
199       The threading model specifies the level of serialization required of an
200       application when using the libfabric data transfer interfaces.  Control
201       interfaces  are  always  considered thread safe, and may be accessed by
202       multiple threads.  Applications which can  guarantee  serialization  in
203       their  access  of provider allocated resources and interfaces enables a
204       provider to eliminate lower-level locks.
205
206       FI_THREAD_COMPLETION
207              The completion threading model is intended  for  providers  that
208              make use of manual progress.  Applications must serialize access
209              to all objects that are associated through the use of  having  a
210              shared  completion  structure.  This includes endpoint, transmit
211              context, receive context, completion queue, counter,  wait  set,
212              and poll set objects.
213
214       For example, threads must serialize access to an endpoint and its bound
215       completion queue(s) and/or counters.  Access to  endpoints  that  share
216       the same completion queue must also be serialized.
217
218       The   use   of   FI_THREAD_COMPLETION  can  increase  parallelism  over
219       FI_THREAD_SAFE, but requires the use of isolated resources.
220
221       FI_THREAD_DOMAIN
222              A domain serialization model requires applications to  serialize
223              access to all objects belonging to a domain.
224
225       FI_THREAD_ENDPOINT
226              The  endpoint  threading  model is similar to FI_THREAD_FID, but
227              with the added restriction that serialization is  required  when
228              accessing  the  same endpoint, even if multiple transmit and re‐
229              ceive contexts are used.  Conceptually, FI_THREAD_ENDPOINT  maps
230              well to providers that implement fabric services in hardware but
231              use a single command queue to access different data flows.
232
233       FI_THREAD_FID
234              A fabric descriptor (FID) serialization model requires  applica‐
235              tions to serialize access to individual fabric resources associ‐
236              ated with data transfer operations  and  completions.   Multiple
237              threads  must  be  serialized  when accessing the same endpoint,
238              transmit context, receive context,  completion  queue,  counter,
239              wait  set,  or  poll  set.   Serialization  is  required only by
240              threads accessing the same object.
241
242       For example, one thread may be initiating a data transfer  on  an  end‐
243       point,  while  another  thread reads from a completion queue associated
244       with the endpoint.
245
246       Serialization to endpoint access is only required  when  accessing  the
247       same  endpoint  data  flow.  Multiple threads may initiate transfers on
248       different transmit contexts of the same endpoint  without  serializing,
249       and  no serialization is required between the submission of data trans‐
250       mit requests and data receive operations.
251
252       In general, FI_THREAD_FID allows the provider to be implemented without
253       needing  internal  locking when handling data transfers.  Conceptually,
254       FI_THREAD_FID maps well to providers that implement fabric services  in
255       hardware and provide separate command queues to different data flows.
256
257       FI_THREAD_SAFE
258              A thread safe serialization model allows a multi-threaded appli‐
259              cation to access any allocated resources through  any  interface
260              without  restriction.   All  providers  are  required to support
261              FI_THREAD_SAFE.
262
263       FI_THREAD_UNSPEC
264              This value indicates that no threading model has  been  defined.
265              It  may  be  used  on  input hints to the fi_getinfo call.  When
266              specified, providers will return a threading model  that  allows
267              for the greatest level of parallelism.
268
269   Progress Models (control_progress / data_progress)
270       Progress  is  the  ability of the underlying implementation to complete
271       processing of an asynchronous request.  In many cases,  the  processing
272       of an asynchronous request requires the use of the host processor.  For
273       example, a received message may need to be  matched  with  the  correct
274       buffer,  or a timed out request may need to be retransmitted.  For per‐
275       formance reasons, it may be undesirable for the provider to allocate  a
276       thread  for  this  purpose,  which  will  compete  with the application
277       threads.
278
279       Control progress indicates the method that the provider  uses  to  make
280       progress  on  asynchronous  control operations.  Control operations are
281       functions which do not directly involve the transfer of application da‐
282       ta  between  endpoints.   They include address vector, memory registra‐
283       tion, and connection management routines.
284
285       Data progress indicates the method  that  the  provider  uses  to  make
286       progress  on  data  transfer  operations.  This includes message queue,
287       RMA, tagged messaging, and atomic operations, along with their  comple‐
288       tion processing.
289
290       Progress  frequently  requires action being taken at both the transmit‐
291       ting and receiving sides of an operation.  This is often a  requirement
292       for  reliable  transfers, as a result of retry and acknowledgement pro‐
293       cessing.
294
295       To balance between performance and ease of use, two progress models are
296       defined.
297
298       FI_PROGRESS_AUTO
299              This  progress  model indicates that the provider will make for‐
300              ward progress on an asynchronous operation without  further  in‐
301              tervention by the application.  When FI_PROGRESS_AUTO is provid‐
302              ed as output to fi_getinfo in the absence of any progress hints,
303              it often indicates that the desired functionality is implemented
304              by the provider hardware or is a standard service of the operat‐
305              ing system.
306
307       It is recommended that providers support FI_PROGRESS_AUTO.  However, if
308       a provider does not natively support automatic  progress,  forcing  the
309       use of FI_PROGRESS_AUTO may result in threads being allocated below the
310       fabric interfaces.
311
312       Note that prior versions of the library required providers  to  support
313       FI_PROGRESS_AUTO.   However,  in  some cases progress threads cannot be
314       blocked when communication is idle, which results in  threads  spinning
315       in  progress  functions.   As  a result, those providers only supported
316       FI_PROGRESS_MANUAL.
317
318       FI_PROGRESS_MANUAL
319              This progress model indicates that the provider requires the use
320              of  an  application  thread to complete an asynchronous request.
321              When manual progress is set, the provider will  attempt  to  ad‐
322              vance an asynchronous operation forward when the application at‐
323              tempts to wait on or read an event queue, completion  queue,  or
324              counter   where   the  completed  operation  will  be  reported.
325              Progress also occurs when the application processes  a  poll  or
326              wait  set  that has been associated with the event or completion
327              queue.
328
329       Only wait operations defined by the fabric interface will result in  an
330       operation  progressing.   Operating  system or external wait functions,
331       such as select, poll, or pthread routines, cannot.
332
333       Manual progress requirements not only apply to endpoints that  initiate
334       transmit  operations,  but  also to endpoints that may be the target of
335       such operations.  This holds true even if the target endpoint will  not
336       generate  completion  events  for the operations.  For example, an end‐
337       point that acts purely as the target of RMA or atomic  operations  that
338       uses  manual  progress may still need application assistance to process
339       received operations.
340
341       FI_PROGRESS_UNSPEC
342              This value indicates that no progress model  has  been  defined.
343              It may be used on input hints to the fi_getinfo call.
344
345   Resource Management (resource_mgmt)
346       Resource  management  (RM)  is provider and protocol support to protect
347       against overrunning local and remote resources.   This  includes  local
348       and  remote transmit contexts, receive contexts, completion queues, and
349       source and target data buffers.
350
351       When enabled, applications are given some level of  protection  against
352       overrunning  provider  queues  and local and remote data buffers.  Such
353       support may be built directly into the hardware and/or  network  proto‐
354       col,  but may also require that checks be enabled in the provider soft‐
355       ware.  By disabling resource management, an application assumes all re‐
356       sponsibility for preventing queue and buffer overruns, but doing so may
357       allow a provider to eliminate internal synchronization calls,  such  as
358       atomic variables or locks.
359
360       It  should  be  noted that even if resource management is disabled, the
361       provider implementation and protocol may still provide  some  level  of
362       protection  against  overruns.  However, such protection is not guaran‐
363       teed.  The following values for resource management are defined.
364
365       FI_RM_DISABLED
366              The provider is free to select an  implementation  and  protocol
367              that  does  not protect against resource overruns.  The applica‐
368              tion is responsible for resource protection.
369
370       FI_RM_ENABLED
371              Resource management is enabled for this provider domain.
372
373       FI_RM_UNSPEC
374              This value indicates that no resource management model has  been
375              defined.  It may be used on input hints to the fi_getinfo call.
376
377       The  behavior  of  the  various  resource management options depends on
378       whether the endpoint is reliable or unreliable, as well as provider and
379       protocol specific implementation details, as shown in the following ta‐
380       ble.  The table assumes that all peers enable or disable RM the same.
381
382       Resource    DGRAM EP-no RM    DGRAM EP-with RM   RDM/MSG   EP-no    RDM/MSG EP-with
383                                                        RM                 RM
384       ────────────────────────────────────────────────────────────────────────────────────
385        Tx Ctx     undefined error        EAGAIN        undefined error        EAGAIN
386        Rx Ctx     undefined error        EAGAIN        undefined error        EAGAIN
387        Tx CQ      undefined error        EAGAIN        undefined error        EAGAIN
388        Rx CQ      undefined error        EAGAIN        undefined error        EAGAIN
389        Target         dropped            dropped        transmit error        retried
390        EP
391       No    Rx        dropped            dropped        transmit error        retried
392       Buffer
393       Rx   Buf   truncate or drop   truncate or drop   truncate or er‐    truncate or er‐
394       Overrun                                          ror                ror
395       Un‐         not applicable     not applicable     transmit error    transmit error
396       matched
397       RMA
398       RMA         not applicable     not applicable     transmit error    transmit error
399       Overrun
400
401       The resource column indicates the resource being  accessed  by  a  data
402       transfer operation.
403
404       Tx Ctx / Rx Ctx
405              Refers to the transmit/receive contexts when a data transfer op‐
406              eration is submitted.  When RM is enabled, attempting to  submit
407              a  request will fail if the context is full.  If RM is disabled,
408              an undefined error (provider specific) will occur.  Such  errors
409              should be considered fatal to the context, and applications must
410              take steps to avoid queue overruns.
411
412       Tx CQ / Rx CQ
413              Refers to the completion queue associated with the Tx or Rx con‐
414              text when a local operation completes.  When RM is disabled, ap‐
415              plications must take care to ensure that  completion  queues  do
416              not  get overrun.  When an overrun occurs, an undefined, but fa‐
417              tal, error will occur affecting all  endpoints  associated  with
418              the CQ.  Overruns can be avoided by sizing the CQs appropriately
419              or by deferring the posting of a data transfer operation  unless
420              CQ  space  is available to store its completion.  When RM is en‐
421              abled, providers may use  different  mechanisms  to  prevent  CQ
422              overruns.   This  includes  failing  (returning  -FI_EAGAIN) the
423              posting of operations that could result in CQ overruns,  or  in‐
424              ternally retrying requests (which will be hidden from the appli‐
425              cation).  See notes at the end of this section regarding CQ  re‐
426              source management restrictions.
427
428       Target EP / No Rx Buffer
429              Target  EP refers to resources associated with the endpoint that
430              is the target of a transmit operation.  This includes the target
431              endpoint’s  receive  queue,  posted  receive buffers (no Rx buf‐
432              fers), the receive side  completion  queue,  and  other  related
433              packet  processing queues.  The defined behavior is that seen by
434              the initiator of a request.  For FI_EP_DGRAM endpoints,  if  the
435              target  EP  queues  are  unable to accept incoming messages, re‐
436              ceived messages will be dropped.  For reliable endpoints, if  RM
437              is  disabled,  the transmit operation will complete in error.  A
438              provider may choose to return an error completion with the error
439              code  FI_ENORX for that transmit operation so that it can be re‐
440              tried.  If RM is enabled, the provider will internally retry the
441              operation.
442
443       Rx Buffer Overrun
444              This  refers to buffers posted to receive incoming tagged or un‐
445              tagged messages, with the behavior defined from the viewpoint of
446              the  sender.   The  behavior for handling received messages that
447              are larger than the  buffers  provided  by  the  application  is
448              provider  specific.   Providers  may either truncate the message
449              and report a successful completion, or fail the operation.   For
450              datagram  endpoints, failed sends will result in the message be‐
451              ing dropped.  For reliable endpoints, send operations  may  com‐
452              plete  successfully, yet be truncated at the receive side.  This
453              can occur when the target side buffers received  data  until  an
454              application buffer is made available.  The completion status may
455              also be dependent upon the completion model selected byt the ap‐
456              plication   (e.g. FI_DELIVERY_COMPLETE  versus  FI_TRANSMIT_COM‐
457              PLETE).
458
459       Unmatched RMA / RMA Overrun
460              Unmatched RMA and RMA overruns deal with the processing  of  RMA
461              and  atomic  operations.  Unlike send operations, RMA operations
462              that attempt to access a memory address that is either not  reg‐
463              istered for such operations, or attempt to access outside of the
464              target memory region will fail, resulting in a transmit error.
465
466       When a resource management error occurs on an endpoint, the endpoint is
467       transitioned  into a disabled state.  Any operations which have not al‐
468       ready completed will fail and be discarded.   For  connectionless  end‐
469       points,  the endpoint must be re-enabled before it will accept new data
470       transfer operations.  For connected endpoints, the connection  is  torn
471       down and must be re-established.
472
473       There is one notable restriction on the protections offered by resource
474       management.  This occurs when resource management is enabled on an end‐
475       point  that  has  been bound to completion queue(s) using the FI_SELEC‐
476       TIVE_COMPLETION flag.  Operations posted to such an endpoint may speci‐
477       fy that a successful completion should not generate a entry on the cor‐
478       responding completion queue.  (I.e.  the operation leaves  the  FI_COM‐
479       PLETION  flag unset).  In such situations, the provider is not required
480       to reserve an entry in the completion queue to handle  the  case  where
481       the  operation  fails  and does generate a CQ entry, which would effec‐
482       tively require tracking the operation to completion.  Applications con‐
483       cerned  with  avoiding CQ overruns in the occurrence of errors must en‐
484       sure that there is sufficient space in the CQ to report  failed  opera‐
485       tions.  This can typically be achieved by sizing the CQ to at least the
486       same size as the endpoint queue(s) that are bound to it.
487
488   AV Type (av_type)
489       Specifies the type of address vectors that are usable with this domain.
490       For  additional details on AV type, see fi_av(3).  The following values
491       may be specified.
492
493       FI_AV_MAP
494              Only address vectors of type AV map are requested or supported.
495
496       FI_AV_TABLE
497              Only address vectors of type AV index are requested or  support‐
498              ed.
499
500       FI_AV_UNSPEC
501              Any address vector format is requested and supported.
502
503       Address  vectors  are  only used by connectionless endpoints.  Applica‐
504       tions that require the use of a specific type of address vector  should
505       set  the  domain  attribute av_type to the necessary value when calling
506       fi_getinfo.  The value FI_AV_UNSPEC may be used to  indicate  that  the
507       provider  can  support  either  address vector format.  In this case, a
508       provider may return FI_AV_UNSPEC to indicate that either format is sup‐
509       portable, or may return another AV type to indicate the optimal AV type
510       supported by this domain.
511
512   Memory Registration Mode (mr_mode)
513       Defines memory registration specific mode bits used with  this  domain.
514       Full details on MR mode options are available in fi_mr(3).  The follow‐
515       ing values may be specified.
516
517       FI_MR_ALLOCATED
518              Indicates that memory registration occurs on allocated data buf‐
519              fers,  and  physical pages must back all virtual addresses being
520              registered.
521
522       FI_MR_COLLECTIVE
523              Requires data buffers passed to collective operations be explic‐
524              itly  registered  for collective operations using the FI_COLLEC‐
525              TIVE flag.
526
527       FI_MR_ENDPOINT
528              Memory registration occurs at the endpoint  level,  rather  than
529              domain.
530
531       FI_MR_LOCAL
532              The  provider  is  optimized around having applications register
533              memory for locally accessed data buffers.  Data buffers used  in
534              send and receive operations and as the source buffer for RMA and
535              atomic operations must be registered by the application for  ac‐
536              cess domains opened with this capability.
537
538       FI_MR_MMU_NOTIFY
539              Indicates  that the application is responsible for notifying the
540              provider when the page tables referencing  a  registered  memory
541              region may have been updated.
542
543       FI_MR_PROV_KEY
544              Memory  registration  keys  are  selected  and  returned  by the
545              provider.
546
547       FI_MR_RAW
548              The provider requires additional setup as part of  their  memory
549              registration  process.   This mode is required by providers that
550              use a memory key that is larger than 64-bits.
551
552       FI_MR_RMA_EVENT
553              Indicates that the memory  regions  associated  with  completion
554              counters  must  be  explicitly  enabled after being bound to any
555              counter.
556
557       FI_MR_UNSPEC
558              Defined for compatibility – library versions  1.4  and  earlier.
559              Setting  mr_mode  to 0 indicates that FI_MR_BASIC or FI_MR_SCAL‐
560              ABLE are requested and supported.
561
562       FI_MR_VIRT_ADDR
563              Registered memory regions are referenced by peers using the vir‐
564              tual  address  of  the  registered  memory region, rather than a
565              0-based offset.
566
567       FI_MR_BASIC
568              Defined for compatibility – library versions  1.4  and  earlier.
569              Only  basic memory registration operations are requested or sup‐
570              ported.   This  mode  is  equivalent  to  the   FI_MR_VIRT_ADDR,
571              FI_MR_ALLOCATED, and FI_MR_PROV_KEY flags being set in later li‐
572              brary versions.  This flag may not be used in  conjunction  with
573              other mr_mode bits.
574
575       FI_MR_SCALABLE
576              Defined  for  compatibility  – library versions 1.4 and earlier.
577              Only scalable memory registration operations  are  requested  or
578              supported.   Scalable registration uses offset based addressing,
579              with application selectable memory keys.  For  library  versions
580              1.5  and  later, this is the default if no mr_mode bits are set.
581              This flag may not be used  in  conjunction  with  other  mr_mode
582              bits.
583
584       Buffers  used  in  data  transfer  operations may require notifying the
585       provider of their use before a data transfer can  occur.   The  mr_mode
586       field  indicates  the type of memory registration that is required, and
587       when registration is necessary.  Applications that require the use of a
588       specific  registration  mode should set the domain attribute mr_mode to
589       the necessary value when calling fi_getinfo.   The  value  FI_MR_UNSPEC
590       may be used to indicate support for any registration mode.
591
592   MR Key Size (mr_key_size)
593       Size  of  the  memory region remote access key, in bytes.  Applications
594       that request their own MR key must select  a  value  within  the  range
595       specified  by  this value.  Key sizes larger than 8 bytes require using
596       the FI_RAW_KEY mode bit.
597
598   CQ Data Size (cq_data_size)
599       Applications may include a small message with a data transfer  that  is
600       placed  directly into a remote completion queue as part of a completion
601       event.  This is referred to as remote CQ data (sometimes referred to as
602       immediate  data).   This  field  indicates the number of bytes that the
603       provider supports for remote CQ data.  If supported (non-zero value  is
604       returned), the minimum size of remote CQ data must be at least 4-bytes.
605
606   Completion Queue Count (cq_cnt)
607       The  optimal number of completion queues supported by the domain, rela‐
608       tive to any specified or default CQ attributes.  The cq_cnt  value  may
609       be a fixed value of the maximum number of CQs supported by the underly‐
610       ing hardware, or may be a dynamic  value,  based  on  the  default  at‐
611       tributes of an allocated CQ, such as the CQ size and data format.
612
613   Endpoint Count (ep_cnt)
614       The  total number of endpoints supported by the domain, relative to any
615       specified or default endpoint attributes.  The ep_cnt value  may  be  a
616       fixed  value of the maximum number of endpoints supported by the under‐
617       lying hardware, or may be a dynamic value, based  on  the  default  at‐
618       tributes  of  an  allocated endpoint, such as the endpoint capabilities
619       and size.  The endpoint count is the number  of  addressable  endpoints
620       supported by the provider.  Providers return capability limits based on
621       configured hardware maximum capabilities.  Providers cannot predict all
622       possible  system limitations without posteriori knowledge acquired dur‐
623       ing runtime that will further limit these hardware  maximums  (e.g. ap‐
624       plication memory consumption, FD usage, etc.).
625
626   Transmit Context Count (tx_ctx_cnt)
627       The  number  of  outbound  command  queues  optimally  supported by the
628       provider.  For a low-level provider, this represents the number of com‐
629       mand  queues to the hardware and/or the number of parallel transmit en‐
630       gines effectively supported by the hardware and  caches.   Applications
631       which allocate more transmit contexts than this value will end up shar‐
632       ing underlying resources.  By default, there is a single transmit  con‐
633       text  associated with each endpoint, but in an advanced usage model, an
634       endpoint may be configured with multiple transmit contexts.
635
636   Receive Context Count (rx_ctx_cnt)
637       The number of inbound processing  queues  optimally  supported  by  the
638       provider.   For  a low-level provider, this represents the number hard‐
639       ware queues that can be effectively utilized  for  processing  incoming
640       packets.   Applications  which allocate more receive contexts than this
641       value will end up sharing underlying resources.  By default,  a  single
642       receive  context  is  associated with each endpoint, but in an advanced
643       usage model, an endpoint may be configured with multiple  receive  con‐
644       texts.
645
646   Maximum Endpoint Transmit Context (max_ep_tx_ctx)
647       The  maximum number of transmit contexts that may be associated with an
648       endpoint.
649
650   Maximum Endpoint Receive Context (max_ep_rx_ctx)
651       The maximum number of receive contexts that may be associated  with  an
652       endpoint.
653
654   Maximum Sharing of Transmit Context (max_ep_stx_ctx)
655       The  maximum  number  of endpoints that may be associated with a shared
656       transmit context.
657
658   Maximum Sharing of Receive Context (max_ep_srx_ctx)
659       The maximum number of endpoints that may be associated  with  a  shared
660       receive context.
661
662   Counter Count (cntr_cnt)
663       The optimal number of completion counters supported by the domain.  The
664       cq_cnt value may be a fixed value of the  maximum  number  of  counters
665       supported  by the underlying hardware, or may be a dynamic value, based
666       on the default attributes of the domain.
667
668   MR IOV Limit (mr_iov_limit)
669       This is the maximum number of IO vectors (scatter-gather elements) that
670       a single memory registration operation may reference.
671
672   Capabilities (caps)
673       Domain  level  capabilities.  Domain capabilities indicate domain level
674       features that are supported by the provider.
675
676       FI_LOCAL_COMM
677              At a conceptual level, this field indicates that the  underlying
678              device supports loopback communication.  More specifically, this
679              field indicates that an endpoint may communicate with other end‐
680              points that are allocated from the same underlying named domain.
681              If this field is not set, an application may need to use an  al‐
682              ternate  domain or mechanism (e.g. shared memory) to communicate
683              with peers that execute on the same node.
684
685       FI_REMOTE_COMM
686              This field indicates that the underlying provider supports  com‐
687              munication  with  nodes that are reachable over the network.  If
688              this field is not set, then the provider only supports  communi‐
689              cation  between  processes  that  execute  on  the same node – a
690              shared memory provider, for example.
691
692       FI_SHARED_AV
693              Indicates that the domain supports the ability to share  address
694              vectors  among multiple processes using the named address vector
695              feature.
696
697       See fi_getinfo(3) for a discussion on primary versus secondary capabil‐
698       ities.  All domain capabilities are considered secondary capabilities.
699
700   mode
701       The operational mode bit related to using the domain.
702
703       FI_RESTRICTED_COMP
704              This  bit indicates that the domain limits completion queues and
705              counters to only be used with endpoints, transmit contexts,  and
706              receive contexts that have the same set of capability flags.
707
708   Default authorization key (auth_key)
709       The  default  authorization  key  to associate with endpoint and memory
710       registrations created within the domain.  This field is ignored  unless
711       the fabric is opened with API version 1.5 or greater.
712
713   Default authorization key length (auth_key_size)
714       The  length  in  bytes of the default authorization key for the domain.
715       If set to 0, then no authorization key will  be  associated  with  end‐
716       points and memory registrations created within the domain unless speci‐
717       fied in the endpoint or memory registration attributes.  This field  is
718       ignored unless the fabric is opened with API version 1.5 or greater.
719
720   Max Error Data Size (max_err_data)
721       :  The  maximum amount of error data, in bytes, that may be returned as
722       part of a completion or event queue error.  This value  corresponds  to
723       the   err_data_size   field   in   struct  fi_cq_err_entry  and  struct
724       fi_eq_err_entry.
725
726   Memory Regions Count (mr_cnt)
727       The optimal number of memory regions supported by the domain,  or  end‐
728       point if the mr_mode FI_MR_ENDPOINT bit has been set.  The mr_cnt value
729       may be a fixed value of the maximum number of MRs supported by the  un‐
730       derlying  hardware, or may be a dynamic value, based on the default at‐
731       tributes of the domain,  such  as  the  supported  memory  registration
732       modes.   Applications can set the mr_cnt on input to fi_getinfo, in or‐
733       der to indicate their memory registration requirements.  Doing  so  may
734       allow  the provider to optimize any memory registration cache or lookup
735       tables.
736
737   Traffic Class (tclass)
738       This specifies the default traffic class that will  be  associated  any
739       endpoints created within the domain.  See fi_endpoint(3) for additional
740       information.
741

RETURN VALUE

743       Returns 0 on success.  On error, a negative value corresponding to fab‐
744       ric  errno is returned.  Fabric errno values are defined in rdma/fi_er‐
745       rno.h.
746

NOTES

748       Users should call fi_close to release all resources  allocated  to  the
749       fabric domain.
750
751       The following fabric resources are associated with domains: active end‐
752       points, memory regions, completion event queues, and address vectors.
753
754       Domain attributes reflect the limitations and capabilities of  the  un‐
755       derlying hardware and/or software provider.  They do not reflect system
756       limitations, such as the number of physical pages that  an  application
757       may  pin  or  number of file descriptors that the application may open.
758       As a result, the reported maximums may not be  achievable,  even  on  a
759       lightly loaded systems, without an administrator configuring system re‐
760       sources appropriately for the installed provider(s).
761

AUTHORS

766       OpenFabrics.
767
768
769
770Libfabric Programmer’s Manual     2022-03-30                      fi_domain(3)