1fi_domain(3)                   Libfabric v1.14.0                  fi_domain(3)
2
3
4

NAME

6       fi_domain - Open a fabric access domain
7

SYNOPSIS

9              #include <rdma/fabric.h>
10
11              #include <rdma/fi_domain.h>
12
13              int fi_domain(struct fid_fabric *fabric, struct fi_info *info,
14                  struct fid_domain **domain, void *context);
15
16              int fi_close(struct fid *domain);
17
18              int fi_domain_bind(struct fid_domain *domain, struct fid *eq,
19                  uint64_t flags);
20
21              int fi_open_ops(struct fid *domain, const char *name, uint64_t flags,
22                  void **ops, void *context);
23
24              int fi_set_ops(struct fid *domain, const char *name, uint64_t flags,
25                  void *ops, void *context);
26

ARGUMENTS

28       fabric Fabric domain
29
30       info   Fabric   information,  including  domain  capabilities  and  at‐
31              tributes.
32
33       domain An opened access domain.
34
35       context
36              User specified context associated with the domain.  This context
37              is  returned  as  part of any asynchronous event associated with
38              the domain.
39
40       eq     Event queue for asynchronous operations initiated on the domain.
41
42       name   Name associated with an interface.
43
44       ops    Fabric interface operations.
45

DESCRIPTION

47       An access domain typically refers to a physical or virtual NIC or hard‐
48       ware  port;  however, a domain may span across multiple hardware compo‐
49       nents for fail-over or data striping purposes.  A  domain  defines  the
50       boundary  for  associating  different  resources  together.  Fabric re‐
51       sources belonging to the same domain may share resources.
52
53   fi_domain
54       Opens a fabric access domain, also referred to as  a  resource  domain.
55       Fabric  domains are identified by a name.  The properties of the opened
56       domain are specified using the info parameter.
57
58   fi_open_ops
59       fi_open_ops is used to open provider specific interfaces.  Provider in‐
60       terfaces  may be used to access low-level resources and operations that
61       are specific to the opened resource domain.  The details of domain  in‐
62       terfaces are outside the scope of this documentation.
63
64   fi_set_ops
65       fi_set_ops  assigns callbacks that a provider should invoke in place of
66       performing selected tasks.  This allows users to modify  or  control  a
67       provider’s  default behavior.  Conceptually, it allows the user to hook
68       specific functions used by a provider and replace it with their own.
69
70       The operations being modified are identified using a well-known charac‐
71       ter string, passed as the name parameter.  The format of the ops param‐
72       eter is dependent upon the name value.  The ops parameter  will  refer‐
73       ence  a  structure  containing the callbacks and other fields needed by
74       the provider to invoke the user’s functions.
75
76       If a provider accepts the override, it will return FI_SUCCESS.  If  the
77       override  is  unknown  or  not  supported,  the  provider  will  return
78       -FI_ENOSYS.  Overrides should be set prior to allocating  resources  on
79       the domain.
80
81       The  following  fi_set_ops operations and corresponding callback struc‐
82       tures are defined.
83
84       FI_SET_OPS_HMEM_OVERRIDE – Heterogeneous Memory Overrides
85
86       HMEM override allows  users  to  override  HMEM  related  operations  a
87       provider  may perform.  Currently, the scope of the HMEM override is to
88       allow a user to define the memory movement functions a provider  should
89       use  when  accessing  a  user buffer.  The user-defined memory movement
90       functions need to account for all the  different  HMEM  iface  types  a
91       provider may encounter.
92
93       All objects allocated against a domain will inherit this override.
94
95       The following is the HMEM override operation name and structure.
96
97              #define FI_SET_OPS_HMEM_OVERRIDE "hmem_override_ops"
98
99              struct fi_hmem_override_ops {
100                  size_t  size;
101
102                  ssize_t (*copy_from_hmem_iov)(void *dest, size_t size,
103                      enum fi_hmem_iface iface, uint64_t device, const struct iovec *hmem_iov,
104                      size_t hmem_iov_count, uint64_t hmem_iov_offset);
105
106                  ssize_t (*copy_to_hmem_iov)(enum fi_hmem_iface iface, uint64_t device,
107                  const struct iovec *hmem_iov, size_t hmem_iov_count,
108                      uint64_t hmem_iov_offset, const void *src, size_t size);
109              };
110
111       All  fields  in struct fi_hmem_override_ops must be set (non-null) to a
112       valid value.
113
114       size   This should be set to the  sizeof(struct  fi_hmem_override_ops).
115              The  size  field  is used for forward and backward compatibility
116              purposes.
117
118       copy_from_hmem_iov
119              Copy data from the device/hmem to host  memory.   This  function
120              should  return  a  negative  fi_errno on error, or the number of
121              bytes copied on success.
122
123       copy_to_hmem_iov
124              Copy data from host memory to the  device/hmem.   This  function
125              should  return  a  negative  fi_errno on error, or the number of
126              bytes copied on success.
127
128   fi_domain_bind
129       Associates an event queue with the domain.  An event queue bound  to  a
130       domain  will  be  the  default  EQ associated with asynchronous control
131       events that occur on the domain or active endpoints allocated on a  do‐
132       main.   This  includes  CM  events.  Endpoints may direct their control
133       events to alternate EQs by binding directly with the EQ.
134
135       Binding an event queue to a domain with the  FI_REG_MR  flag  indicates
136       that  the  provider  should  perform all memory registration operations
137       asynchronously, with the completion reported through the  event  queue.
138       If  an  event queue is not bound to the domain with the FI_REG_MR flag,
139       then memory registration requests complete synchronously.
140
141       See fi_av_bind(3), fi_ep_bind(3),  fi_mr_bind(3),  fi_pep_bind(3),  and
142       fi_scalable_ep_bind(3) for more information.
143
144   fi_close
145       The  fi_close  call  is used to release all resources associated with a
146       domain or interface.  All objects associated  with  the  opened  domain
147       must be released prior to calling fi_close, otherwise the call will re‐
148       turn -FI_EBUSY.
149

DOMAIN ATTRIBUTES

151       The fi_domain_attr structure defines the set of  attributes  associated
152       with a domain.
153
154              struct fi_domain_attr {
155                  struct fid_domain     *domain;
156                  char                  *name;
157                  enum fi_threading     threading;
158                  enum fi_progress      control_progress;
159                  enum fi_progress      data_progress;
160                  enum fi_resource_mgmt resource_mgmt;
161                  enum fi_av_type       av_type;
162                  int                   mr_mode;
163                  size_t                mr_key_size;
164                  size_t                cq_data_size;
165                  size_t                cq_cnt;
166                  size_t                ep_cnt;
167                  size_t                tx_ctx_cnt;
168                  size_t                rx_ctx_cnt;
169                  size_t                max_ep_tx_ctx;
170                  size_t                max_ep_rx_ctx;
171                  size_t                max_ep_stx_ctx;
172                  size_t                max_ep_srx_ctx;
173                  size_t                cntr_cnt;
174                  size_t                mr_iov_limit;
175                  uint64_t              caps;
176                  uint64_t              mode;
177                  uint8_t               *auth_key;
178                  size_t                auth_key_size;
179                  size_t                max_err_data;
180                  size_t                mr_cnt;
181                  uint32_t              tclass;
182              };
183
184   domain
185       On  input  to  fi_getinfo,  a user may set this to an opened domain in‐
186       stance to restrict output to the given domain.  On output from  fi_get‐
187       info,  if  no domain was specified, but the user has an opened instance
188       of the named domain, this will reference the first opened instance.  If
189       no instance has been opened, this field will be NULL.
190
191       The  domain  instance  returned by fi_getinfo should only be considered
192       valid if the application does not close any domain instances  from  an‐
193       other thread while fi_getinfo is being processed.
194
195   Name
196       The name of the access domain.
197
198   Multi-threading Support (threading)
199       The threading model specifies the level of serialization required of an
200       application when using the libfabric data transfer interfaces.  Control
201       interfaces  are  always  considered thread safe, and may be accessed by
202       multiple threads.  Applications which can  guarantee  serialization  in
203       their  access  of provider allocated resources and interfaces enables a
204       provider to eliminate lower-level locks.
205
206       FI_THREAD_COMPLETION
207              The completion threading model is intended  for  providers  that
208              make use of manual progress.  Applications must serialize access
209              to all objects that are associated through the use of  having  a
210              shared  completion  structure.  This includes endpoint, transmit
211              context, receive context, completion queue, counter,  wait  set,
212              and poll set objects.
213
214       For example, threads must serialize access to an endpoint and its bound
215       completion queue(s) and/or counters.  Access to  endpoints  that  share
216       the same completion queue must also be serialized.
217
218       The   use   of   FI_THREAD_COMPLETION  can  increase  parallelism  over
219       FI_THREAD_SAFE, but requires the use of isolated resources.
220
221       FI_THREAD_DOMAIN
222              A domain serialization model requires applications to  serialize
223              access to all objects belonging to a domain.
224
225       FI_THREAD_ENDPOINT
226              The  endpoint  threading  model is similar to FI_THREAD_FID, but
227              with the added restriction that serialization is  required  when
228              accessing  the  same endpoint, even if multiple transmit and re‐
229              ceive contexts are used.  Conceptually, FI_THREAD_ENDPOINT  maps
230              well to providers that implement fabric services in hardware but
231              use a single command queue to access different data flows.
232
233       FI_THREAD_FID
234              A fabric descriptor (FID) serialization model requires  applica‐
235              tions to serialize access to individual fabric resources associ‐
236              ated with data transfer operations  and  completions.   Multiple
237              threads  must  be  serialized  when accessing the same endpoint,
238              transmit context, receive context,  completion  queue,  counter,
239              wait  set,  or  poll  set.   Serialization  is  required only by
240              threads accessing the same object.
241
242       For example, one thread may be initiating a data transfer  on  an  end‐
243       point,  while  another  thread reads from a completion queue associated
244       with the endpoint.
245
246       Serialization to endpoint access is only required  when  accessing  the
247       same  endpoint  data  flow.  Multiple threads may initiate transfers on
248       different transmit contexts of the same endpoint  without  serializing,
249       and  no serialization is required between the submission of data trans‐
250       mit requests and data receive operations.
251
252       In general, FI_THREAD_FID allows the provider to be implemented without
253       needing  internal  locking when handling data transfers.  Conceptually,
254       FI_THREAD_FID maps well to providers that implement fabric services  in
255       hardware and provide separate command queues to different data flows.
256
257       FI_THREAD_SAFE
258              A thread safe serialization model allows a multi-threaded appli‐
259              cation to access any allocated resources through  any  interface
260              without  restriction.   All  providers  are  required to support
261              FI_THREAD_SAFE.
262
263       FI_THREAD_UNSPEC
264              This value indicates that no threading model has  been  defined.
265              It  may  be  used  on  input hints to the fi_getinfo call.  When
266              specified, providers will return a threading model  that  allows
267              for the greatest level of parallelism.
268
269   Progress Models (control_progress / data_progress)
270       Progress  is  the  ability of the underlying implementation to complete
271       processing of an asynchronous request.  In many cases,  the  processing
272       of an asynchronous request requires the use of the host processor.  For
273       example, a received message may need to be  matched  with  the  correct
274       buffer,  or a timed out request may need to be retransmitted.  For per‐
275       formance reasons, it may be undesirable for the provider to allocate  a
276       thread  for  this  purpose,  which  will  compete  with the application
277       threads.
278
279       Control progress indicates the method that the provider  uses  to  make
280       progress  on  asynchronous  control operations.  Control operations are
281       functions which do not directly involve the transfer of application da‐
282       ta  between  endpoints.   They include address vector, memory registra‐
283       tion, and connection management routines.
284
285       Data progress indicates the method  that  the  provider  uses  to  make
286       progress  on  data  transfer  operations.  This includes message queue,
287       RMA, tagged messaging, and atomic operations, along with their  comple‐
288       tion processing.
289
290       Progress  frequently  requires action being taken at both the transmit‐
291       ting and receiving sides of an operation.  This is often a  requirement
292       for  reliable  transfers, as a result of retry and acknowledgement pro‐
293       cessing.
294
295       To balance between performance and ease of use, two progress models are
296       defined.
297
298       FI_PROGRESS_AUTO
299              This  progress  model indicates that the provider will make for‐
300              ward progress on an asynchronous operation without  further  in‐
301              tervention by the application.  When FI_PROGRESS_AUTO is provid‐
302              ed as output to fi_getinfo in the absence of any progress hints,
303              it often indicates that the desired functionality is implemented
304              by the provider hardware or is a standard service of the operat‐
305              ing system.
306
307       All  providers are required to support FI_PROGRESS_AUTO.  However, if a
308       provider does not natively support automatic progress, forcing the  use
309       of  FI_PROGRESS_AUTO  may  result  in threads being allocated below the
310       fabric interfaces.
311
312       FI_PROGRESS_MANUAL
313              This progress model indicates that the provider requires the use
314              of  an  application  thread to complete an asynchronous request.
315              When manual progress is set, the provider will  attempt  to  ad‐
316              vance an asynchronous operation forward when the application at‐
317              tempts to wait on or read an event queue, completion  queue,  or
318              counter   where   the  completed  operation  will  be  reported.
319              Progress also occurs when the application processes  a  poll  or
320              wait  set  that has been associated with the event or completion
321              queue.
322
323       Only wait operations defined by the fabric interface will result in  an
324       operation  progressing.   Operating  system or external wait functions,
325       such as select, poll, or pthread routines, cannot.
326
327       Manual progress requirements not only apply to endpoints that  initiate
328       transmit  operations,  but  also to endpoints that may be the target of
329       such operations.  This holds true even if the target endpoint will  not
330       generate  completion  events  for the operations.  For example, an end‐
331       point that acts purely as the target of RMA or atomic  operations  that
332       uses  manual  progress may still need application assistance to process
333       received operations.
334
335       FI_PROGRESS_UNSPEC
336              This value indicates that no progress model  has  been  defined.
337              It may be used on input hints to the fi_getinfo call.
338
339   Resource Management (resource_mgmt)
340       Resource  management  (RM)  is provider and protocol support to protect
341       against overrunning local and remote resources.   This  includes  local
342       and  remote transmit contexts, receive contexts, completion queues, and
343       source and target data buffers.
344
345       When enabled, applications are given some level of  protection  against
346       overrunning  provider  queues  and local and remote data buffers.  Such
347       support may be built directly into the hardware and/or  network  proto‐
348       col,  but may also require that checks be enabled in the provider soft‐
349       ware.  By disabling resource management, an application assumes all re‐
350       sponsibility for preventing queue and buffer overruns, but doing so may
351       allow a provider to eliminate internal synchronization calls,  such  as
352       atomic variables or locks.
353
354       It  should  be  noted that even if resource management is disabled, the
355       provider implementation and protocol may still provide  some  level  of
356       protection  against  overruns.  However, such protection is not guaran‐
357       teed.  The following values for resource management are defined.
358
359       FI_RM_DISABLED
360              The provider is free to select an  implementation  and  protocol
361              that  does  not protect against resource overruns.  The applica‐
362              tion is responsible for resource protection.
363
364       FI_RM_ENABLED
365              Resource management is enabled for this provider domain.
366
367       FI_RM_UNSPEC
368              This value indicates that no resource management model has  been
369              defined.  It may be used on input hints to the fi_getinfo call.
370
371       The  behavior  of  the  various  resource management options depends on
372       whether the endpoint is reliable or unreliable, as well as provider and
373       protocol specific implementation details, as shown in the following ta‐
374       ble.  The table assumes that all peers enable or disable RM the same.
375
376       Resource    DGRAM EP-no RM    DGRAM EP-with RM   RDM/MSG   EP-no    RDM/MSG EP-with
377                                                        RM                 RM
378       ────────────────────────────────────────────────────────────────────────────────────
379        Tx Ctx     undefined error        EAGAIN        undefined error        EAGAIN
380        Rx Ctx     undefined error        EAGAIN        undefined error        EAGAIN
381        Tx CQ      undefined error        EAGAIN        undefined error        EAGAIN
382        Rx CQ      undefined error        EAGAIN        undefined error        EAGAIN
383        Target         dropped            dropped        transmit error        retried
384        EP
385       No    Rx        dropped            dropped        transmit error        retried
386       Buffer
387       Rx   Buf   truncate or drop   truncate or drop   truncate or er‐    truncate or er‐
388       Overrun                                          ror                ror
389       Un‐         not applicable     not applicable     transmit error    transmit error
390       matched
391       RMA
392       RMA         not applicable     not applicable     transmit error    transmit error
393       Overrun
394
395       The resource column indicates the resource being  accessed  by  a  data
396       transfer operation.
397
398       Tx Ctx / Rx Ctx
399              Refers to the transmit/receive contexts when a data transfer op‐
400              eration is submitted.  When RM is enabled, attempting to  submit
401              a  request will fail if the context is full.  If RM is disabled,
402              an undefined error (provider specific) will occur.  Such  errors
403              should be considered fatal to the context, and applications must
404              take steps to avoid queue overruns.
405
406       Tx CQ / Rx CQ
407              Refers to the completion queue associated with the Tx or Rx con‐
408              text when a local operation completes.  When RM is disabled, ap‐
409              plications must take care to ensure that  completion  queues  do
410              not  get overrun.  When an overrun occurs, an undefined, but fa‐
411              tal, error will occur affecting all  endpoints  associated  with
412              the CQ.  Overruns can be avoided by sizing the CQs appropriately
413              or by deferring the posting of a data transfer operation  unless
414              CQ  space  is available to store its completion.  When RM is en‐
415              abled, providers may use  different  mechanisms  to  prevent  CQ
416              overruns.   This  includes  failing  (returning  -FI_EAGAIN) the
417              posting of operations that could result in CQ overruns,  or  in‐
418              ternally retrying requests (which will be hidden from the appli‐
419              cation).  See notes at the end of this section regarding CQ  re‐
420              source management restrictions.
421
422       Target EP / No Rx Buffer
423              Target  EP refers to resources associated with the endpoint that
424              is the target of a transmit operation.  This includes the target
425              endpoint’s  receive  queue,  posted  receive buffers (no Rx buf‐
426              fers), the receive side  completion  queue,  and  other  related
427              packet  processing queues.  The defined behavior is that seen by
428              the initiator of a request.  For FI_EP_DGRAM endpoints,  if  the
429              target  EP  queues  are  unable to accept incoming messages, re‐
430              ceived messages will be dropped.  For reliable endpoints, if  RM
431              is  disabled,  the transmit operation will complete in error.  A
432              provider may choose to return an error completion with the error
433              code  FI_ENORX for that transmit operation so that it can be re‐
434              tried.  If RM is enabled, the provider will internally retry the
435              operation.
436
437       Rx Buffer Overrun
438              This  refers to buffers posted to receive incoming tagged or un‐
439              tagged messages, with the behavior defined from the viewpoint of
440              the  sender.   The  behavior for handling received messages that
441              are larger than the  buffers  provided  by  the  application  is
442              provider  specific.   Providers  may either truncate the message
443              and report a successful completion, or fail the operation.   For
444              datagram  endpoints, failed sends will result in the message be‐
445              ing dropped.  For reliable endpoints, send operations  may  com‐
446              plete  successfully, yet be truncated at the receive side.  This
447              can occur when the target side buffers received  data  until  an
448              application buffer is made available.  The completion status may
449              also be dependent upon the completion model selected byt the ap‐
450              plication   (e.g. FI_DELIVERY_COMPLETE  versus  FI_TRANSMIT_COM‐
451              PLETE).
452
453       Unmatched RMA / RMA Overrun
454              Unmatched RMA and RMA overruns deal with the processing  of  RMA
455              and  atomic  operations.  Unlike send operations, RMA operations
456              that attempt to access a memory address that is either not  reg‐
457              istered for such operations, or attempt to access outside of the
458              target memory region will fail, resulting in a transmit error.
459
460       When a resource management error occurs on an endpoint, the endpoint is
461       transitioned  into a disabled state.  Any operations which have not al‐
462       ready completed will fail and be discarded.   For  connectionless  end‐
463       points,  the endpoint must be re-enabled before it will accept new data
464       transfer operations.  For connected endpoints, the connection  is  torn
465       down and must be re-established.
466
467       There is one notable restriction on the protections offered by resource
468       management.  This occurs when resource management is enabled on an end‐
469       point  that  has  been bound to completion queue(s) using the FI_SELEC‐
470       TIVE_COMPLETION flag.  Operations posted to such an endpoint may speci‐
471       fy that a successful completion should not generate a entry on the cor‐
472       responding completion queue.  (I.e.  the operation leaves  the  FI_COM‐
473       PLETION  flag unset).  In such situations, the provider is not required
474       to reserve an entry in the completion queue to handle  the  case  where
475       the  operation  fails  and does generate a CQ entry, which would effec‐
476       tively require tracking the operation to completion.  Applications con‐
477       cerned  with  avoiding CQ overruns in the occurrence of errors must en‐
478       sure that there is sufficient space in the CQ to report  failed  opera‐
479       tions.  This can typically be achieved by sizing the CQ to at least the
480       same size as the endpoint queue(s) that are bound to it.
481
482   AV Type (av_type)
483       Specifies the type of address vectors that are usable with this domain.
484       For  additional details on AV type, see fi_av(3).  The following values
485       may be specified.
486
487       FI_AV_MAP
488              Only address vectors of type AV map are requested or supported.
489
490       FI_AV_TABLE
491              Only address vectors of type AV index are requested or  support‐
492              ed.
493
494       FI_AV_UNSPEC
495              Any address vector format is requested and supported.
496
497       Address  vectors  are  only used by connectionless endpoints.  Applica‐
498       tions that require the use of a specific type of address vector  should
499       set  the  domain  attribute av_type to the necessary value when calling
500       fi_getinfo.  The value FI_AV_UNSPEC may be used to  indicate  that  the
501       provider  can  support  either  address vector format.  In this case, a
502       provider may return FI_AV_UNSPEC to indicate that either format is sup‐
503       portable, or may return another AV type to indicate the optimal AV type
504       supported by this domain.
505
506   Memory Registration Mode (mr_mode)
507       Defines memory registration specific mode bits used with  this  domain.
508       Full details on MR mode options are available in fi_mr(3).  The follow‐
509       ing values may be specified.
510
511       FI_MR_ALLOCATED
512              Indicates that memory registration occurs on allocated data buf‐
513              fers,  and  physical pages must back all virtual addresses being
514              registered.
515
516       FI_MR_COLLECTIVE
517              Requires data buffers passed to collective operations be explic‐
518              itly  registered  for collective operations using the FI_COLLEC‐
519              TIVE flag.
520
521       FI_MR_ENDPOINT
522              Memory registration occurs at the endpoint  level,  rather  than
523              domain.
524
525       FI_MR_LOCAL
526              The  provider  is  optimized around having applications register
527              memory for locally accessed data buffers.  Data buffers used  in
528              send and receive operations and as the source buffer for RMA and
529              atomic operations must be registered by the application for  ac‐
530              cess domains opened with this capability.
531
532       FI_MR_MMU_NOTIFY
533              Indicates  that the application is responsible for notifying the
534              provider when the page tables referencing  a  registered  memory
535              region may have been updated.
536
537       FI_MR_PROV_KEY
538              Memory  registration  keys  are  selected  and  returned  by the
539              provider.
540
541       FI_MR_RAW
542              The provider requires additional setup as part of  their  memory
543              registration  process.   This mode is required by providers that
544              use a memory key that is larger than 64-bits.
545
546       FI_MR_RMA_EVENT
547              Indicates that the memory  regions  associated  with  completion
548              counters  must  be  explicitly  enabled after being bound to any
549              counter.
550
551       FI_MR_UNSPEC
552              Defined for compatibility – library versions  1.4  and  earlier.
553              Setting  mr_mode  to 0 indicates that FI_MR_BASIC or FI_MR_SCAL‐
554              ABLE are requested and supported.
555
556       FI_MR_VIRT_ADDR
557              Registered memory regions are referenced by peers using the vir‐
558              tual  address  of  the  registered  memory region, rather than a
559              0-based offset.
560
561       FI_MR_BASIC
562              Defined for compatibility – library versions  1.4  and  earlier.
563              Only  basic memory registration operations are requested or sup‐
564              ported.   This  mode  is  equivalent  to  the   FI_MR_VIRT_ADDR,
565              FI_MR_ALLOCATED, and FI_MR_PROV_KEY flags being set in later li‐
566              brary versions.  This flag may not be used in  conjunction  with
567              other mr_mode bits.
568
569       FI_MR_SCALABLE
570              Defined  for  compatibility  – library versions 1.4 and earlier.
571              Only scalable memory registration operations  are  requested  or
572              supported.   Scalable registration uses offset based addressing,
573              with application selectable memory keys.  For  library  versions
574              1.5  and  later, this is the default if no mr_mode bits are set.
575              This flag may not be used  in  conjunction  with  other  mr_mode
576              bits.
577
578       Buffers  used  in  data  transfer  operations may require notifying the
579       provider of their use before a data transfer can  occur.   The  mr_mode
580       field  indicates  the type of memory registration that is required, and
581       when registration is necessary.  Applications that require the use of a
582       specific  registration  mode should set the domain attribute mr_mode to
583       the necessary value when calling fi_getinfo.   The  value  FI_MR_UNSPEC
584       may be used to indicate support for any registration mode.
585
586   MR Key Size (mr_key_size)
587       Size  of  the  memory region remote access key, in bytes.  Applications
588       that request their own MR key must select  a  value  within  the  range
589       specified  by  this value.  Key sizes larger than 8 bytes require using
590       the FI_RAW_KEY mode bit.
591
592   CQ Data Size (cq_data_size)
593       Applications may include a small message with a data transfer  that  is
594       placed  directly into a remote completion queue as part of a completion
595       event.  This is referred to as remote CQ data (sometimes referred to as
596       immediate  data).   This  field  indicates the number of bytes that the
597       provider supports for remote CQ data.  If supported (non-zero value  is
598       returned), the minimum size of remote CQ data must be at least 4-bytes.
599
600   Completion Queue Count (cq_cnt)
601       The  optimal number of completion queues supported by the domain, rela‐
602       tive to any specified or default CQ attributes.  The cq_cnt  value  may
603       be a fixed value of the maximum number of CQs supported by the underly‐
604       ing hardware, or may be a dynamic  value,  based  on  the  default  at‐
605       tributes of an allocated CQ, such as the CQ size and data format.
606
607   Endpoint Count (ep_cnt)
608       The  total number of endpoints supported by the domain, relative to any
609       specified or default endpoint attributes.  The ep_cnt value  may  be  a
610       fixed  value of the maximum number of endpoints supported by the under‐
611       lying hardware, or may be a dynamic value, based  on  the  default  at‐
612       tributes  of  an  allocated endpoint, such as the endpoint capabilities
613       and size.  The endpoint count is the number  of  addressable  endpoints
614       supported by the provider.  Providers return capability limits based on
615       configured hardware maximum capabilities.  Providers cannot predict all
616       possible  system limitations without posteriori knowledge acquired dur‐
617       ing runtime that will further limit these hardware  maximums  (e.g. ap‐
618       plication memory consumption, FD usage, etc.).
619
620   Transmit Context Count (tx_ctx_cnt)
621       The  number  of  outbound  command  queues  optimally  supported by the
622       provider.  For a low-level provider, this represents the number of com‐
623       mand  queues to the hardware and/or the number of parallel transmit en‐
624       gines effectively supported by the hardware and  caches.   Applications
625       which allocate more transmit contexts than this value will end up shar‐
626       ing underlying resources.  By default, there is a single transmit  con‐
627       text  associated with each endpoint, but in an advanced usage model, an
628       endpoint may be configured with multiple transmit contexts.
629
630   Receive Context Count (rx_ctx_cnt)
631       The number of inbound processing  queues  optimally  supported  by  the
632       provider.   For  a low-level provider, this represents the number hard‐
633       ware queues that can be effectively utilized  for  processing  incoming
634       packets.   Applications  which allocate more receive contexts than this
635       value will end up sharing underlying resources.  By default,  a  single
636       receive  context  is  associated with each endpoint, but in an advanced
637       usage model, an endpoint may be configured with multiple  receive  con‐
638       texts.
639
640   Maximum Endpoint Transmit Context (max_ep_tx_ctx)
641       The  maximum number of transmit contexts that may be associated with an
642       endpoint.
643
644   Maximum Endpoint Receive Context (max_ep_rx_ctx)
645       The maximum number of receive contexts that may be associated  with  an
646       endpoint.
647
648   Maximum Sharing of Transmit Context (max_ep_stx_ctx)
649       The  maximum  number  of endpoints that may be associated with a shared
650       transmit context.
651
652   Maximum Sharing of Receive Context (max_ep_srx_ctx)
653       The maximum number of endpoints that may be associated  with  a  shared
654       receive context.
655
656   Counter Count (cntr_cnt)
657       The optimal number of completion counters supported by the domain.  The
658       cq_cnt value may be a fixed value of the  maximum  number  of  counters
659       supported  by the underlying hardware, or may be a dynamic value, based
660       on the default attributes of the domain.
661
662   MR IOV Limit (mr_iov_limit)
663       This is the maximum number of IO vectors (scatter-gather elements) that
664       a single memory registration operation may reference.
665
666   Capabilities (caps)
667       Domain  level  capabilities.  Domain capabilities indicate domain level
668       features that are supported by the provider.
669
670       FI_LOCAL_COMM
671              At a conceptual level, this field indicates that the  underlying
672              device supports loopback communication.  More specifically, this
673              field indicates that an endpoint may communicate with other end‐
674              points that are allocated from the same underlying named domain.
675              If this field is not set, an application may need to use an  al‐
676              ternate  domain or mechanism (e.g. shared memory) to communicate
677              with peers that execute on the same node.
678
679       FI_REMOTE_COMM
680              This field indicates that the underlying provider supports  com‐
681              munication  with  nodes that are reachable over the network.  If
682              this field is not set, then the provider only supports  communi‐
683              cation  between  processes  that  execute  on  the same node – a
684              shared memory provider, for example.
685
686       FI_SHARED_AV
687              Indicates that the domain supports the ability to share  address
688              vectors  among multiple processes using the named address vector
689              feature.
690
691       See fi_getinfo(3) for a discussion on primary versus secondary capabil‐
692       ities.  All domain capabilities are considered secondary capabilities.
693
694   mode
695       The operational mode bit related to using the domain.
696
697       FI_RESTRICTED_COMP
698              This  bit indicates that the domain limits completion queues and
699              counters to only be used with endpoints, transmit contexts,  and
700              receive contexts that have the same set of capability flags.
701
702   Default authorization key (auth_key)
703       The  default  authorization  key  to associate with endpoint and memory
704       registrations created within the domain.  This field is ignored  unless
705       the fabric is opened with API version 1.5 or greater.
706
707   Default authorization key length (auth_key_size)
708       The  length  in  bytes of the default authorization key for the domain.
709       If set to 0, then no authorization key will  be  associated  with  end‐
710       points and memory registrations created within the domain unless speci‐
711       fied in the endpoint or memory registration attributes.  This field  is
712       ignored unless the fabric is opened with API version 1.5 or greater.
713
714   Max Error Data Size (max_err_data)
715       :  The  maximum amount of error data, in bytes, that may be returned as
716       part of a completion or event queue error.  This value  corresponds  to
717       the   err_data_size   field   in   struct  fi_cq_err_entry  and  struct
718       fi_eq_err_entry.
719
720   Memory Regions Count (mr_cnt)
721       The optimal number of memory regions supported by the domain,  or  end‐
722       point if the mr_mode FI_MR_ENDPOINT bit has been set.  The mr_cnt value
723       may be a fixed value of the maximum number of MRs supported by the  un‐
724       derlying  hardware, or may be a dynamic value, based on the default at‐
725       tributes of the domain,  such  as  the  supported  memory  registration
726       modes.   Applications can set the mr_cnt on input to fi_getinfo, in or‐
727       der to indicate their memory registration requirements.  Doing  so  may
728       allow  the provider to optimize any memory registration cache or lookup
729       tables.
730
731   Traffic Class (tclass)
732       This specifies the default traffic class that will  be  associated  any
733       endpoints  created  within  the  domain.   See [fi_endpoint(3)](fi_end‐
734       point.3.html for additional information.
735

RETURN VALUE

737       Returns 0 on success.  On error, a negative value corresponding to fab‐
738       ric  errno is returned.  Fabric errno values are defined in rdma/fi_er‐
739       rno.h.
740

NOTES

742       Users should call fi_close to release all resources  allocated  to  the
743       fabric domain.
744
745       The following fabric resources are associated with domains: active end‐
746       points, memory regions, completion event queues, and address vectors.
747
748       Domain attributes reflect the limitations and capabilities of  the  un‐
749       derlying hardware and/or software provider.  They do not reflect system
750       limitations, such as the number of physical pages that  an  application
751       may  pin  or  number of file descriptors that the application may open.
752       As a result, the reported maximums may not be  achievable,  even  on  a
753       lightly loaded systems, without an administrator configuring system re‐
754       sources appropriately for the installed provider(s).
755

SEE ALSO

757       fi_getinfo(3), fi_endpoint(3), fi_av(3), fi_ep(3), fi_eq(3), fi_mr(3)
758

AUTHORS

760       OpenFabrics.
761
762
763
764Libfabric Programmer’s Manual     2021-10-07                      fi_domain(3)
Impressum