fi_domain(3)

1fi_domain(3)                   Libfabric v1.17.0                  fi_domain(3)
2
3
4

NAME

6       fi_domain - Open a fabric access domain
7

SYNOPSIS

9              #include <rdma/fabric.h>
10
11              #include <rdma/fi_domain.h>
12
13              int fi_domain(struct fid_fabric *fabric, struct fi_info *info,
14                  struct fid_domain **domain, void *context);
15
16              int fi_domain2(struct fid_fabric *fabric, struct fi_info *info,
17                  struct fid_domain **domain, uint64_t flags, void *context);
18
19              int fi_close(struct fid *domain);
20
21              int fi_domain_bind(struct fid_domain *domain, struct fid *eq,
22                  uint64_t flags);
23
24              int fi_open_ops(struct fid *domain, const char *name, uint64_t flags,
25                  void **ops, void *context);
26
27              int fi_set_ops(struct fid *domain, const char *name, uint64_t flags,
28                  void *ops, void *context);
29

ARGUMENTS

31       fabric Fabric domain
32
33       info   Fabric   information,  including  domain  capabilities  and  at‐
34              tributes.
35
36       domain An opened access domain.
37
38       context
39              User specified context associated with the domain.  This context
40              is  returned  as  part of any asynchronous event associated with
41              the domain.
42
43       eq     Event queue for asynchronous operations initiated on the domain.
44
45       name   Name associated with an interface.
46
47       ops    Fabric interface operations.
48

DESCRIPTION

50       An access domain typically refers to a physical or virtual NIC or hard‐
51       ware  port;  however, a domain may span across multiple hardware compo‐
52       nents for fail-over or data striping purposes.  A  domain  defines  the
53       boundary  for  associating  different  resources  together.  Fabric re‐
54       sources belonging to the same domain may share resources.
55
56   fi_domain
57       Opens a fabric access domain, also referred to as  a  resource  domain.
58       Fabric  domains are identified by a name.  The properties of the opened
59       domain are specified using the info parameter.
60
61   fi_domain2
62       Similar to fi_domain, but accepts an  extra  parameter  flags.   Mainly
63       used for opening peer domain.  See fi_peer(3).
64
65   fi_open_ops
66       fi_open_ops is used to open provider specific interfaces.  Provider in‐
67       terfaces may be used to access low-level resources and operations  that
68       are  specific to the opened resource domain.  The details of domain in‐
69       terfaces are outside the scope of this documentation.
70
71   fi_set_ops
72       fi_set_ops assigns callbacks that a provider should invoke in place  of
73       performing  selected  tasks.   This allows users to modify or control a
74       provider’s default behavior.  Conceptually, it allows the user to  hook
75       specific functions used by a provider and replace it with their own.
76
77       The operations being modified are identified using a well-known charac‐
78       ter string, passed as the name parameter.  The format of the ops param‐
79       eter  is  dependent upon the name value.  The ops parameter will refer‐
80       ence a structure containing the callbacks and other  fields  needed  by
81       the provider to invoke the user’s functions.
82
83       If  a provider accepts the override, it will return FI_SUCCESS.  If the
84       override  is  unknown  or  not  supported,  the  provider  will  return
85       -FI_ENOSYS.   Overrides  should be set prior to allocating resources on
86       the domain.
87
88       The following fi_set_ops operations and corresponding  callback  struc‐
89       tures are defined.
90
91       FI_SET_OPS_HMEM_OVERRIDE – Heterogeneous Memory Overrides
92
93       HMEM  override  allows  users  to  override  HMEM  related operations a
94       provider may perform.  Currently, the scope of the HMEM override is  to
95       allow  a user to define the memory movement functions a provider should
96       use when accessing a user buffer.   The  user-defined  memory  movement
97       functions  need  to  account  for  all the different HMEM iface types a
98       provider may encounter.
99
100       All objects allocated against a domain will inherit this override.
101
102       The following is the HMEM override operation name and structure.
103
104              #define FI_SET_OPS_HMEM_OVERRIDE "hmem_override_ops"
105
106              struct fi_hmem_override_ops {
107                  size_t  size;
108
109                  ssize_t (*copy_from_hmem_iov)(void *dest, size_t size,
110                      enum fi_hmem_iface iface, uint64_t device, const struct iovec *hmem_iov,
111                      size_t hmem_iov_count, uint64_t hmem_iov_offset);
112
113                  ssize_t (*copy_to_hmem_iov)(enum fi_hmem_iface iface, uint64_t device,
114                  const struct iovec *hmem_iov, size_t hmem_iov_count,
115                      uint64_t hmem_iov_offset, const void *src, size_t size);
116              };
117
118       All fields in struct fi_hmem_override_ops must be set (non-null)  to  a
119       valid value.
120
121       size   This  should  be set to the sizeof(struct fi_hmem_override_ops).
122              The size field is used for forward  and  backward  compatibility
123              purposes.
124
125       copy_from_hmem_iov
126              Copy  data  from  the device/hmem to host memory.  This function
127              should return a negative fi_errno on error,  or  the  number  of
128              bytes copied on success.
129
130       copy_to_hmem_iov
131              Copy  data  from  host memory to the device/hmem.  This function
132              should return a negative fi_errno on error,  or  the  number  of
133              bytes copied on success.
134
135   fi_domain_bind
136       Associates  an  event queue with the domain.  An event queue bound to a
137       domain will be the default  EQ  associated  with  asynchronous  control
138       events  that occur on the domain or active endpoints allocated on a do‐
139       main.  This includes CM events.  Endpoints  may  direct  their  control
140       events to alternate EQs by binding directly with the EQ.
141
142       Binding  an  event  queue to a domain with the FI_REG_MR flag indicates
143       that the provider should perform  all  memory  registration  operations
144       asynchronously,  with  the completion reported through the event queue.
145       If an event queue is not bound to the domain with the  FI_REG_MR  flag,
146       then memory registration requests complete synchronously.
147
148       See  fi_av_bind(3),  fi_ep_bind(3),  fi_mr_bind(3), fi_pep_bind(3), and
149       fi_scalable_ep_bind(3) for more information.
150
151   fi_close
152       The fi_close call is used to release all resources  associated  with  a
153       domain  or  interface.   All  objects associated with the opened domain
154       must be released prior to calling fi_close, otherwise the call will re‐
155       turn -FI_EBUSY.
156

DOMAIN ATTRIBUTES

158       The  fi_domain_attr  structure defines the set of attributes associated
159       with a domain.
160
161              struct fi_domain_attr {
162                  struct fid_domain     *domain;
163                  char                  *name;
164                  enum fi_threading     threading;
165                  enum fi_progress      control_progress;
166                  enum fi_progress      data_progress;
167                  enum fi_resource_mgmt resource_mgmt;
168                  enum fi_av_type       av_type;
169                  int                   mr_mode;
170                  size_t                mr_key_size;
171                  size_t                cq_data_size;
172                  size_t                cq_cnt;
173                  size_t                ep_cnt;
174                  size_t                tx_ctx_cnt;
175                  size_t                rx_ctx_cnt;
176                  size_t                max_ep_tx_ctx;
177                  size_t                max_ep_rx_ctx;
178                  size_t                max_ep_stx_ctx;
179                  size_t                max_ep_srx_ctx;
180                  size_t                cntr_cnt;
181                  size_t                mr_iov_limit;
182                  uint64_t              caps;
183                  uint64_t              mode;
184                  uint8_t               *auth_key;
185                  size_t                auth_key_size;
186                  size_t                max_err_data;
187                  size_t                mr_cnt;
188                  uint32_t              tclass;
189              };
190
191   domain
192       On input to fi_getinfo, a user may set this to  an  opened  domain  in‐
193       stance  to restrict output to the given domain.  On output from fi_get‐
194       info, if no domain was specified, but the user has an  opened  instance
195       of the named domain, this will reference the first opened instance.  If
196       no instance has been opened, this field will be NULL.
197
198       The domain instance returned by fi_getinfo should  only  be  considered
199       valid  if  the application does not close any domain instances from an‐
200       other thread while fi_getinfo is being processed.
201
202   Name
203       The name of the access domain.
204
205   Multi-threading Support (threading)
206       The threading model specifies the level of serialization required of an
207       application when using the libfabric data transfer interfaces.  Control
208       interfaces are always considered thread safe, and may  be  accessed  by
209       multiple  threads.   Applications  which can guarantee serialization in
210       their access of provider allocated resources and interfaces  enables  a
211       provider to eliminate lower-level locks.
212
213       FI_THREAD_COMPLETION
214              The  completion  threading  model is intended for providers that
215              make use of manual progress.  Applications must serialize access
216              to  all  objects that are associated through the use of having a
217              shared completion structure.  This includes  endpoint,  transmit
218              context,  receive  context, completion queue, counter, wait set,
219              and poll set objects.
220
221       For example, threads must serialize access to an endpoint and its bound
222       completion  queue(s)  and/or  counters.  Access to endpoints that share
223       the same completion queue must also be serialized.
224
225       The  use  of  FI_THREAD_COMPLETION  can   increase   parallelism   over
226       FI_THREAD_SAFE, but requires the use of isolated resources.
227
228       FI_THREAD_DOMAIN
229              A  domain serialization model requires applications to serialize
230              access to all objects belonging to a domain.
231
232       FI_THREAD_ENDPOINT
233              The endpoint threading model is similar  to  FI_THREAD_FID,  but
234              with  the  added restriction that serialization is required when
235              accessing the same endpoint, even if multiple transmit  and  re‐
236              ceive  contexts are used.  Conceptually, FI_THREAD_ENDPOINT maps
237              well to providers that implement fabric services in hardware but
238              use a single command queue to access different data flows.
239
240       FI_THREAD_FID
241              A  fabric descriptor (FID) serialization model requires applica‐
242              tions to serialize access to individual fabric resources associ‐
243              ated  with  data  transfer operations and completions.  Multiple
244              threads must be serialized when  accessing  the  same  endpoint,
245              transmit  context,  receive  context, completion queue, counter,
246              wait set, or  poll  set.   Serialization  is  required  only  by
247              threads accessing the same object.
248
249       For  example,  one  thread may be initiating a data transfer on an end‐
250       point, while another thread reads from a  completion  queue  associated
251       with the endpoint.
252
253       Serialization  to  endpoint  access is only required when accessing the
254       same endpoint data flow.  Multiple threads may  initiate  transfers  on
255       different  transmit  contexts of the same endpoint without serializing,
256       and no serialization is required between the submission of data  trans‐
257       mit requests and data receive operations.
258
259       In general, FI_THREAD_FID allows the provider to be implemented without
260       needing internal locking when handling data  transfers.   Conceptually,
261       FI_THREAD_FID  maps well to providers that implement fabric services in
262       hardware and provide separate command queues to different data flows.
263
264       FI_THREAD_SAFE
265              A thread safe serialization model allows a multi-threaded appli‐
266              cation  to  access any allocated resources through any interface
267              without restriction.  All  providers  are  required  to  support
268              FI_THREAD_SAFE.
269
270       FI_THREAD_UNSPEC
271              This  value  indicates that no threading model has been defined.
272              It may be used on input hints  to  the  fi_getinfo  call.   When
273              specified,  providers  will return a threading model that allows
274              for the greatest level of parallelism.
275
276   Progress Models (control_progress / data_progress)
277       Progress is the ability of the underlying  implementation  to  complete
278       processing  of  an asynchronous request.  In many cases, the processing
279       of an asynchronous request requires the use of the host processor.  For
280       example,  a  received  message  may need to be matched with the correct
281       buffer, or a timed out request may need to be retransmitted.  For  per‐
282       formance  reasons, it may be undesirable for the provider to allocate a
283       thread for this  purpose,  which  will  compete  with  the  application
284       threads.
285
286       Control  progress  indicates  the method that the provider uses to make
287       progress on asynchronous control operations.   Control  operations  are
288       functions which do not directly involve the transfer of application da‐
289       ta between endpoints.  They include address  vector,  memory  registra‐
290       tion, and connection management routines.
291
292       Data  progress  indicates  the  method  that  the provider uses to make
293       progress on data transfer operations.   This  includes  message  queue,
294       RMA,  tagged messaging, and atomic operations, along with their comple‐
295       tion processing.
296
297       Progress frequently requires action being taken at both  the  transmit‐
298       ting  and receiving sides of an operation.  This is often a requirement
299       for reliable transfers, as a result of retry and  acknowledgement  pro‐
300       cessing.
301
302       To balance between performance and ease of use, two progress models are
303       defined.
304
305       FI_PROGRESS_AUTO
306              This progress model indicates that the provider will  make  for‐
307              ward  progress  on an asynchronous operation without further in‐
308              tervention by the application.  When FI_PROGRESS_AUTO is provid‐
309              ed as output to fi_getinfo in the absence of any progress hints,
310              it often indicates that the desired functionality is implemented
311              by the provider hardware or is a standard service of the operat‐
312              ing system.
313
314       It is recommended that providers support FI_PROGRESS_AUTO.  However, if
315       a  provider  does  not natively support automatic progress, forcing the
316       use of FI_PROGRESS_AUTO may result in threads being allocated below the
317       fabric interfaces.
318
319       Note  that  prior versions of the library required providers to support
320       FI_PROGRESS_AUTO.  However, in some cases progress  threads  cannot  be
321       blocked  when  communication is idle, which results in threads spinning
322       in progress functions.  As a result,  those  providers  only  supported
323       FI_PROGRESS_MANUAL.
324
325       FI_PROGRESS_MANUAL
326              This progress model indicates that the provider requires the use
327              of an application thread to complete  an  asynchronous  request.
328              When  manual  progress  is set, the provider will attempt to ad‐
329              vance an asynchronous operation forward when the application at‐
330              tempts  to  wait on or read an event queue, completion queue, or
331              counter  where  the  completed  operation  will   be   reported.
332              Progress  also  occurs  when the application processes a poll or
333              wait set that has been associated with the event  or  completion
334              queue.
335
336       Only  wait operations defined by the fabric interface will result in an
337       operation progressing.  Operating system or  external  wait  functions,
338       such as select, poll, or pthread routines, cannot.
339
340       Manual  progress requirements not only apply to endpoints that initiate
341       transmit operations, but also to endpoints that may be  the  target  of
342       such  operations.  This holds true even if the target endpoint will not
343       generate completion events for the operations.  For  example,  an  end‐
344       point  that  acts purely as the target of RMA or atomic operations that
345       uses manual progress may still need application assistance  to  process
346       received operations.
347
348       FI_PROGRESS_UNSPEC
349              This  value  indicates  that no progress model has been defined.
350              It may be used on input hints to the fi_getinfo call.
351
352   Resource Management (resource_mgmt)
353       Resource management (RM) is provider and protocol  support  to  protect
354       against  overrunning  local  and remote resources.  This includes local
355       and remote transmit contexts, receive contexts, completion queues,  and
356       source and target data buffers.
357
358       When  enabled,  applications are given some level of protection against
359       overrunning provider queues and local and remote  data  buffers.   Such
360       support  may  be built directly into the hardware and/or network proto‐
361       col, but may also require that checks be enabled in the provider  soft‐
362       ware.  By disabling resource management, an application assumes all re‐
363       sponsibility for preventing queue and buffer overruns, but doing so may
364       allow  a  provider to eliminate internal synchronization calls, such as
365       atomic variables or locks.
366
367       It should be noted that even if resource management  is  disabled,  the
368       provider  implementation  and  protocol may still provide some level of
369       protection against overruns.  However, such protection is  not  guaran‐
370       teed.  The following values for resource management are defined.
371
372       FI_RM_DISABLED
373              The  provider  is  free to select an implementation and protocol
374              that does not protect against resource overruns.   The  applica‐
375              tion is responsible for resource protection.
376
377       FI_RM_ENABLED
378              Resource management is enabled for this provider domain.
379
380       FI_RM_UNSPEC
381              This  value indicates that no resource management model has been
382              defined.  It may be used on input hints to the fi_getinfo call.
383
384       The behavior of the various  resource  management  options  depends  on
385       whether the endpoint is reliable or unreliable, as well as provider and
386       protocol specific implementation details, as shown in the following ta‐
387       ble.  The table assumes that all peers enable or disable RM the same.
388
389       Resource    DGRAM EP-no RM    DGRAM EP-with RM   RDM/MSG   EP-no    RDM/MSG EP-with
390                                                        RM                 RM
391       ────────────────────────────────────────────────────────────────────────────────────
392        Tx Ctx     undefined error        EAGAIN        undefined error        EAGAIN
393        Rx Ctx     undefined error        EAGAIN        undefined error        EAGAIN
394        Tx CQ      undefined error        EAGAIN        undefined error        EAGAIN
395        Rx CQ      undefined error        EAGAIN        undefined error        EAGAIN
396        Target         dropped            dropped        transmit error        retried
397        EP
398       No    Rx        dropped            dropped        transmit error        retried
399       Buffer
400       Rx   Buf   truncate or drop   truncate or drop   truncate or er‐    truncate or er‐
401       Overrun                                          ror                ror
402
403       Un‐         not applicable     not applicable     transmit error    transmit error
404       matched
405       RMA
406       RMA         not applicable     not applicable     transmit error    transmit error
407       Overrun
408
409       The  resource  column  indicates  the resource being accessed by a data
410       transfer operation.
411
412       Tx Ctx / Rx Ctx
413              Refers to the transmit/receive contexts when a data transfer op‐
414              eration  is submitted.  When RM is enabled, attempting to submit
415              a request will fail if the context is full.  If RM is  disabled,
416              an  undefined error (provider specific) will occur.  Such errors
417              should be considered fatal to the context, and applications must
418              take steps to avoid queue overruns.
419
420       Tx CQ / Rx CQ
421              Refers to the completion queue associated with the Tx or Rx con‐
422              text when a local operation completes.  When RM is disabled, ap‐
423              plications  must  take  care to ensure that completion queues do
424              not get overrun.  When an overrun occurs, an undefined, but  fa‐
425              tal,  error  will  occur affecting all endpoints associated with
426              the CQ.  Overruns can be avoided by sizing the CQs appropriately
427              or  by deferring the posting of a data transfer operation unless
428              CQ space is available to store its completion.  When RM  is  en‐
429              abled,  providers  may  use  different  mechanisms to prevent CQ
430              overruns.  This  includes  failing  (returning  -FI_EAGAIN)  the
431              posting  of  operations that could result in CQ overruns, or in‐
432              ternally retrying requests (which will be hidden from the appli‐
433              cation).   See notes at the end of this section regarding CQ re‐
434              source management restrictions.
435
436       Target EP / No Rx Buffer
437              Target EP refers to resources associated with the endpoint  that
438              is the target of a transmit operation.  This includes the target
439              endpoint’s receive queue, posted receive  buffers  (no  Rx  buf‐
440              fers),  the  receive  side  completion  queue, and other related
441              packet processing queues.  The defined behavior is that seen  by
442              the  initiator  of a request.  For FI_EP_DGRAM endpoints, if the
443              target EP queues are unable to  accept  incoming  messages,  re‐
444              ceived  messages will be dropped.  For reliable endpoints, if RM
445              is disabled, the transmit operation will complete in  error.   A
446              provider may choose to return an error completion with the error
447              code FI_ENORX for that transmit operation so that it can be  re‐
448              tried.  If RM is enabled, the provider will internally retry the
449              operation.
450
451       Rx Buffer Overrun
452              This refers to buffers posted to receive incoming tagged or  un‐
453              tagged messages, with the behavior defined from the viewpoint of
454              the sender.  The behavior for handling  received  messages  that
455              are  larger  than  the  buffers  provided  by the application is
456              provider specific.  Providers may either  truncate  the  message
457              and  report a successful completion, or fail the operation.  For
458              datagram endpoints, failed sends will result in the message  be‐
459              ing  dropped.   For reliable endpoints, send operations may com‐
460              plete successfully, yet be truncated at the receive side.   This
461              can  occur  when  the target side buffers received data until an
462              application buffer is made available.  The completion status may
463              also be dependent upon the completion model selected byt the ap‐
464              plication  (e.g. FI_DELIVERY_COMPLETE  versus   FI_TRANSMIT_COM‐
465              PLETE).
466
467       Unmatched RMA / RMA Overrun
468              Unmatched  RMA  and RMA overruns deal with the processing of RMA
469              and atomic operations.  Unlike send operations,  RMA  operations
470              that  attempt to access a memory address that is either not reg‐
471              istered for such operations, or attempt to access outside of the
472              target memory region will fail, resulting in a transmit error.
473
474       When a resource management error occurs on an endpoint, the endpoint is
475       transitioned into a disabled state.  Any operations which have not  al‐
476       ready  completed  will  fail and be discarded.  For connectionless end‐
477       points, the endpoint must be re-enabled before it will accept new  data
478       transfer  operations.   For connected endpoints, the connection is torn
479       down and must be re-established.
480
481       There is one notable restriction on the protections offered by resource
482       management.  This occurs when resource management is enabled on an end‐
483       point that has been bound to completion queue(s)  using  the  FI_SELEC‐
484       TIVE_COMPLETION flag.  Operations posted to such an endpoint may speci‐
485       fy that a successful completion should not generate a entry on the cor‐
486       responding  completion  queue.  (I.e.  the operation leaves the FI_COM‐
487       PLETION flag unset).  In such situations, the provider is not  required
488       to  reserve  an  entry in the completion queue to handle the case where
489       the operation fails and does generate a CQ entry,  which  would  effec‐
490       tively require tracking the operation to completion.  Applications con‐
491       cerned with avoiding CQ overruns in the occurrence of errors  must  en‐
492       sure  that  there is sufficient space in the CQ to report failed opera‐
493       tions.  This can typically be achieved by sizing the CQ to at least the
494       same size as the endpoint queue(s) that are bound to it.
495
496   AV Type (av_type)
497       Specifies the type of address vectors that are usable with this domain.
498       For additional details on AV type, see fi_av(3).  The following  values
499       may be specified.
500
501       FI_AV_MAP
502              Only address vectors of type AV map are requested or supported.
503
504       FI_AV_TABLE
505              Only  address vectors of type AV index are requested or support‐
506              ed.
507
508       FI_AV_UNSPEC
509              Any address vector format is requested and supported.
510
511       Address vectors are only used by  connectionless  endpoints.   Applica‐
512       tions  that require the use of a specific type of address vector should
513       set the domain attribute av_type to the necessary  value  when  calling
514       fi_getinfo.   The  value  FI_AV_UNSPEC may be used to indicate that the
515       provider can support either address vector format.   In  this  case,  a
516       provider may return FI_AV_UNSPEC to indicate that either format is sup‐
517       portable, or may return another AV type to indicate the optimal AV type
518       supported by this domain.
519
520   Memory Registration Mode (mr_mode)
521       Defines  memory  registration specific mode bits used with this domain.
522       Full details on MR mode options are available in fi_mr(3).  The follow‐
523       ing values may be specified.
524
525       FI_MR_ALLOCATED
526              Indicates that memory registration occurs on allocated data buf‐
527              fers, and physical pages must back all virtual  addresses  being
528              registered.
529
530       FI_MR_COLLECTIVE
531              Requires data buffers passed to collective operations be explic‐
532              itly registered for collective operations using  the  FI_COLLEC‐
533              TIVE flag.
534
535       FI_MR_ENDPOINT
536              Memory  registration  occurs  at the endpoint level, rather than
537              domain.
538
539       FI_MR_LOCAL
540              The provider is optimized around  having  applications  register
541              memory  for locally accessed data buffers.  Data buffers used in
542              send and receive operations and as the source buffer for RMA and
543              atomic  operations must be registered by the application for ac‐
544              cess domains opened with this capability.
545
546       FI_MR_MMU_NOTIFY
547              Indicates that the application is responsible for notifying  the
548              provider  when  the  page tables referencing a registered memory
549              region may have been updated.
550
551       FI_MR_PROV_KEY
552              Memory registration  keys  are  selected  and  returned  by  the
553              provider.
554
555       FI_MR_RAW
556              The  provider  requires additional setup as part of their memory
557              registration process.  This mode is required by  providers  that
558              use a memory key that is larger than 64-bits.
559
560       FI_MR_RMA_EVENT
561              Indicates  that  the  memory  regions associated with completion
562              counters must be explicitly enabled after  being  bound  to  any
563              counter.
564
565       FI_MR_UNSPEC
566              Defined  for  compatibility  – library versions 1.4 and earlier.
567              Setting mr_mode to 0 indicates that FI_MR_BASIC  or  FI_MR_SCAL‐
568              ABLE are requested and supported.
569
570       FI_MR_VIRT_ADDR
571              Registered memory regions are referenced by peers using the vir‐
572              tual address of the registered  memory  region,  rather  than  a
573              0-based offset.
574
575       FI_MR_BASIC
576              Defined  for  compatibility  – library versions 1.4 and earlier.
577              Only basic memory registration operations are requested or  sup‐
578              ported.    This  mode  is  equivalent  to  the  FI_MR_VIRT_ADDR,
579              FI_MR_ALLOCATED, and FI_MR_PROV_KEY flags being set in later li‐
580              brary  versions.   This flag may not be used in conjunction with
581              other mr_mode bits.
582
583       FI_MR_SCALABLE
584              Defined for compatibility – library versions  1.4  and  earlier.
585              Only  scalable  memory  registration operations are requested or
586              supported.  Scalable registration uses offset based  addressing,
587              with  application  selectable memory keys.  For library versions
588              1.5 and later, this is the default if no mr_mode bits  are  set.
589              This  flag  may  not  be  used in conjunction with other mr_mode
590              bits.
591
592       Buffers used in data transfer  operations  may  require  notifying  the
593       provider  of  their  use before a data transfer can occur.  The mr_mode
594       field indicates the type of memory registration that is  required,  and
595       when registration is necessary.  Applications that require the use of a
596       specific registration mode should set the domain attribute  mr_mode  to
597       the  necessary  value  when calling fi_getinfo.  The value FI_MR_UNSPEC
598       may be used to indicate support for any registration mode.
599
600   MR Key Size (mr_key_size)
601       Size of the memory region remote access key,  in  bytes.   Applications
602       that  request  their  own  MR  key must select a value within the range
603       specified by this value.  Key sizes larger than 8 bytes  require  using
604       the FI_RAW_KEY mode bit.
605
606   CQ Data Size (cq_data_size)
607       Applications  may  include a small message with a data transfer that is
608       placed directly into a remote completion queue as part of a  completion
609       event.  This is referred to as remote CQ data (sometimes referred to as
610       immediate data).  This field indicates the number  of  bytes  that  the
611       provider  supports for remote CQ data.  If supported (non-zero value is
612       returned), the minimum size of remote CQ data must be at least 4-bytes.
613
614   Completion Queue Count (cq_cnt)
615       The optimal number of completion queues supported by the domain,  rela‐
616       tive  to  any specified or default CQ attributes.  The cq_cnt value may
617       be a fixed value of the maximum number of CQs supported by the underly‐
618       ing  hardware,  or  may  be  a  dynamic value, based on the default at‐
619       tributes of an allocated CQ, such as the CQ size and data format.
620
621   Endpoint Count (ep_cnt)
622       The total number of endpoints supported by the domain, relative to  any
623       specified  or  default  endpoint attributes.  The ep_cnt value may be a
624       fixed value of the maximum number of endpoints supported by the  under‐
625       lying  hardware,  or  may  be a dynamic value, based on the default at‐
626       tributes of an allocated endpoint, such as  the  endpoint  capabilities
627       and  size.   The  endpoint count is the number of addressable endpoints
628       supported by the provider.  Providers return capability limits based on
629       configured hardware maximum capabilities.  Providers cannot predict all
630       possible system limitations without posteriori knowledge acquired  dur‐
631       ing  runtime  that will further limit these hardware maximums (e.g. ap‐
632       plication memory consumption, FD usage, etc.).
633
634   Transmit Context Count (tx_ctx_cnt)
635       The number of  outbound  command  queues  optimally  supported  by  the
636       provider.  For a low-level provider, this represents the number of com‐
637       mand queues to the hardware and/or the number of parallel transmit  en‐
638       gines  effectively  supported by the hardware and caches.  Applications
639       which allocate more transmit contexts than this value will end up shar‐
640       ing  underlying resources.  By default, there is a single transmit con‐
641       text associated with each endpoint, but in an advanced usage model,  an
642       endpoint may be configured with multiple transmit contexts.
643
644   Receive Context Count (rx_ctx_cnt)
645       The  number  of  inbound  processing  queues optimally supported by the
646       provider.  For a low-level provider, this represents the  number  hard‐
647       ware  queues  that  can be effectively utilized for processing incoming
648       packets.  Applications which allocate more receive contexts  than  this
649       value  will  end up sharing underlying resources.  By default, a single
650       receive context is associated with each endpoint, but  in  an  advanced
651       usage  model,  an endpoint may be configured with multiple receive con‐
652       texts.
653
654   Maximum Endpoint Transmit Context (max_ep_tx_ctx)
655       The maximum number of transmit contexts that may be associated with  an
656       endpoint.
657
658   Maximum Endpoint Receive Context (max_ep_rx_ctx)
659       The  maximum  number of receive contexts that may be associated with an
660       endpoint.
661
662   Maximum Sharing of Transmit Context (max_ep_stx_ctx)
663       The maximum number of endpoints that may be associated  with  a  shared
664       transmit context.
665
666   Maximum Sharing of Receive Context (max_ep_srx_ctx)
667       The  maximum  number  of endpoints that may be associated with a shared
668       receive context.
669
670   Counter Count (cntr_cnt)
671       The optimal number of completion counters supported by the domain.  The
672       cq_cnt  value  may  be  a fixed value of the maximum number of counters
673       supported by the underlying hardware, or may be a dynamic value,  based
674       on the default attributes of the domain.
675
676   MR IOV Limit (mr_iov_limit)
677       This is the maximum number of IO vectors (scatter-gather elements) that
678       a single memory registration operation may reference.
679
680   Capabilities (caps)
681       Domain level capabilities.  Domain capabilities indicate  domain  level
682       features that are supported by the provider.
683
684       FI_LOCAL_COMM
685              At  a conceptual level, this field indicates that the underlying
686              device supports loopback communication.  More specifically, this
687              field indicates that an endpoint may communicate with other end‐
688              points that are allocated from the same underlying named domain.
689              If  this field is not set, an application may need to use an al‐
690              ternate domain or mechanism (e.g. shared memory) to  communicate
691              with peers that execute on the same node.
692
693       FI_REMOTE_COMM
694              This  field indicates that the underlying provider supports com‐
695              munication with nodes that are reachable over the  network.   If
696              this  field is not set, then the provider only supports communi‐
697              cation between processes that execute  on  the  same  node  –  a
698              shared memory provider, for example.
699
700       FI_SHARED_AV
701              Indicates  that the domain supports the ability to share address
702              vectors among multiple processes using the named address  vector
703              feature.
704
705       See fi_getinfo(3) for a discussion on primary versus secondary capabil‐
706       ities.  All domain capabilities are considered secondary capabilities.
707
708   mode
709       The operational mode bit related to using the domain.
710
711       FI_RESTRICTED_COMP
712              This bit indicates that the domain limits completion queues  and
713              counters  to only be used with endpoints, transmit contexts, and
714              receive contexts that have the same set of capability flags.
715
716   Default authorization key (auth_key)
717       The default authorization key to associate  with  endpoint  and  memory
718       registrations  created within the domain.  This field is ignored unless
719       the fabric is opened with API version 1.5 or greater.
720
721   Default authorization key length (auth_key_size)
722       The length in bytes of the default authorization key  for  the  domain.
723       If  set  to  0,  then no authorization key will be associated with end‐
724       points and memory registrations created within the domain unless speci‐
725       fied  in the endpoint or memory registration attributes.  This field is
726       ignored unless the fabric is opened with API version 1.5 or greater.
727
728   Max Error Data Size (max_err_data)
729       : The maximum amount of error data, in bytes, that may be  returned  as
730       part  of  a completion or event queue error.  This value corresponds to
731       the  err_data_size  field  in   struct   fi_cq_err_entry   and   struct
732       fi_eq_err_entry.
733
734   Memory Regions Count (mr_cnt)
735       The  optimal  number of memory regions supported by the domain, or end‐
736       point if the mr_mode FI_MR_ENDPOINT bit has been set.  The mr_cnt value
737       may  be a fixed value of the maximum number of MRs supported by the un‐
738       derlying hardware, or may be a dynamic value, based on the default  at‐
739       tributes  of  the  domain,  such  as  the supported memory registration
740       modes.  Applications can set the mr_cnt on input to fi_getinfo, in  or‐
741       der  to  indicate their memory registration requirements.  Doing so may
742       allow the provider to optimize any memory registration cache or  lookup
743       tables.
744
745   Traffic Class (tclass)
746       This  specifies  the  default traffic class that will be associated any
747       endpoints created within the domain.  See fi_endpoint(3) for additional
748       information.
749

RETURN VALUE

751       Returns 0 on success.  On error, a negative value corresponding to fab‐
752       ric errno is returned.  Fabric errno values are defined in  rdma/fi_er‐
753       rno.h.
754

NOTES

756       Users  should  call  fi_close to release all resources allocated to the
757       fabric domain.
758
759       The following fabric resources are associated with domains: active end‐
760       points, memory regions, completion event queues, and address vectors.
761
762       Domain  attributes  reflect the limitations and capabilities of the un‐
763       derlying hardware and/or software provider.  They do not reflect system
764       limitations,  such  as the number of physical pages that an application
765       may pin or number of file descriptors that the  application  may  open.
766       As  a  result,  the  reported maximums may not be achievable, even on a
767       lightly loaded systems, without an administrator configuring system re‐
768       sources appropriately for the installed provider(s).
769

AUTHORS

774       OpenFabrics.
775
776
777
778Libfabric Programmer’s Manual     2022-12-11                      fi_domain(3)