fi_domain(3)

1fi_domain(3)                   Libfabric v1.12.1                  fi_domain(3)
2
3
4

NAME

6       fi_domain - Open a fabric access domain
7

SYNOPSIS

9              #include <rdma/fabric.h>
10
11              #include <rdma/fi_domain.h>
12
13              int fi_domain(struct fid_fabric *fabric, struct fi_info *info,
14                  struct fid_domain **domain, void *context);
15
16              int fi_close(struct fid *domain);
17
18              int fi_domain_bind(struct fid_domain *domain, struct fid *eq,
19                  uint64_t flags);
20
21              int fi_open_ops(struct fid *domain, const char *name, uint64_t flags,
22                  void **ops, void *context);
23
24              int fi_set_ops(struct fid *domain, const char *name, uint64_t flags,
25                  void *ops, void *context);
26

ARGUMENTS

28       fabric Fabric domain
29
30       info   Fabric   information,  including  domain  capabilities  and  at‐
31              tributes.
32
33       domain An opened access domain.
34
35       context
36              User specified context associated with the domain.  This context
37              is  returned  as  part of any asynchronous event associated with
38              the domain.
39
40       eq     Event queue for asynchronous operations initiated on the domain.
41
42       name   Name associated with an interface.
43
44       ops    Fabric interface operations.
45

DESCRIPTION

47       An access domain typically refers to a physical or virtual NIC or hard‐
48       ware  port;  however, a domain may span across multiple hardware compo‐
49       nents for fail-over or data striping purposes.  A  domain  defines  the
50       boundary  for  associating  different  resources  together.  Fabric re‐
51       sources belonging to the same domain may share resources.
52
53   fi_domain
54       Opens a fabric access domain, also referred to as  a  resource  domain.
55       Fabric  domains are identified by a name.  The properties of the opened
56       domain are specified using the info parameter.
57
58   fi_open_ops
59       fi_open_ops is used to open provider specific interfaces.  Provider in‐
60       terfaces  may be used to access low-level resources and operations that
61       are specific to the opened resource domain.  The details of domain  in‐
62       terfaces are outside the scope of this documentation.
63
64   fi_set_ops
65       fi_set_ops  assigns callbacks that a provider should invoke in place of
66       performing selected tasks.  This allows users to modify  or  control  a
67       provider's  default behavior.  Conceptually, it allows the user to hook
68       specific functions used by a provider and replace it with their own.
69
70       The operations being modified are identified using a well-known charac‐
71       ter string, passed as the name parameter.  The format of the ops param‐
72       eter is dependent upon the name value.  The ops parameter  will  refer‐
73       ence  a  structure  containing the callbacks and other fields needed by
74       the provider to invoke the user's functions.
75
76       If a provider accepts the override, it will return FI_SUCCESS.  If  the
77       override  is  unknown  or  not  supported,  the  provider  will  return
78       -FI_ENOSYS.  Overrides should be set prior to allocating  resources  on
79       the domain.
80
81       The  following  fi_set_ops operations and corresponding callback struc‐
82       tures are defined.
83
84       FI_SET_OPS_HMEM_OVERRIDE -- Heterogeneous Memory Overrides
85
86       HMEM override allows  users  to  override  HMEM  related  operations  a
87       provider  may perform.  Currently, the scope of the HMEM override is to
88       allow a user to define the memory movement functions a provider  should
89       use  when  accessing  a  user buffer.  The user-defined memory movement
90       functions need to account for all the  different  HMEM  iface  types  a
91       provider may encounter.
92
93       All objects allocated against a domain will inherit this override.
94
95       The following is the HMEM override operation name and structure.
96
97              #define FI_SET_OPS_HMEM_OVERRIDE "hmem_override_ops"
98
99              struct fi_hmem_override_ops {
100                  size_t  size;
101
102                  ssize_t (*copy_from_hmem_iov)(void *dest, size_t size,
103                      enum fi_hmem_iface iface, uint64_t device, const struct iovec *hmem_iov,
104                      size_t hmem_iov_count, uint64_t hmem_iov_offset);
105
106                  ssize_t (*copy_to_hmem_iov)(enum fi_hmem_iface iface, uint64_t device,
107                  const struct iovec *hmem_iov, size_t hmem_iov_count,
108                      uint64_t hmem_iov_offset, const void *src, size_t size);
109              };
110
111       All  fields  in struct fi_hmem_override_ops must be set (non-null) to a
112       valid value.
113
114       size   This should be set to the  sizeof(struct  fi_hmem_override_ops).
115              The  size  field  is used for forward and backward compatibility
116              purposes.
117
118       copy_from_hmem_iov
119              Copy data from the device/hmem to host  memory.   This  function
120              should  return  a  negative  fi_errno on error, or the number of
121              bytes copied on success.
122
123       copy_to_hmem_iov
124              Copy data from host memory to the  device/hmem.   This  function
125              should  return  a  negative  fi_errno on error, or the number of
126              bytes copied on success.
127
128   fi_domain_bind
129       Associates an event queue with the domain.  An event queue bound  to  a
130       domain  will  be  the  default  EQ associated with asynchronous control
131       events that occur on the domain or active endpoints allocated on a  do‐
132       main.   This  includes  CM  events.  Endpoints may direct their control
133       events to alternate EQs by binding directly with the EQ.
134
135       Binding an event queue to a domain with the  FI_REG_MR  flag  indicates
136       that  the  provider  should  perform all memory registration operations
137       asynchronously, with the completion reported through the  event  queue.
138       If  an  event queue is not bound to the domain with the FI_REG_MR flag,
139       then memory registration requests complete synchronously.
140
141       See fi_av_bind(3), fi_ep_bind(3),  fi_mr_bind(3),  fi_pep_bind(3),  and
142       fi_scalable_ep_bind(3) for more information.
143
144   fi_close
145       The  fi_close  call  is used to release all resources associated with a
146       domain or interface.  All objects associated  with  the  opened  domain
147       must be released prior to calling fi_close, otherwise the call will re‐
148       turn -FI_EBUSY.
149

DOMAIN ATTRIBUTES

151       The fi_domain_attr structure defines the set of  attributes  associated
152       with a domain.
153
154              struct fi_domain_attr {
155                  struct fid_domain     *domain;
156                  char                  *name;
157                  enum fi_threading     threading;
158                  enum fi_progress      control_progress;
159                  enum fi_progress      data_progress;
160                  enum fi_resource_mgmt resource_mgmt;
161                  enum fi_av_type       av_type;
162                  int                   mr_mode;
163                  size_t                mr_key_size;
164                  size_t                cq_data_size;
165                  size_t                cq_cnt;
166                  size_t                ep_cnt;
167                  size_t                tx_ctx_cnt;
168                  size_t                rx_ctx_cnt;
169                  size_t                max_ep_tx_ctx;
170                  size_t                max_ep_rx_ctx;
171                  size_t                max_ep_stx_ctx;
172                  size_t                max_ep_srx_ctx;
173                  size_t                cntr_cnt;
174                  size_t                mr_iov_limit;
175                  uint64_t              caps;
176                  uint64_t              mode;
177                  uint8_t               *auth_key;
178                  size_t                auth_key_size;
179                  size_t                max_err_data;
180                  size_t                mr_cnt;
181                  uint32_t              tclass;
182              };
183
184   domain
185       On  input  to  fi_getinfo,  a user may set this to an opened domain in‐
186       stance to restrict output to the given domain.  On output from  fi_get‐
187       info,  if  no domain was specified, but the user has an opened instance
188       of the named domain, this will reference the first opened instance.  If
189       no instance has been opened, this field will be NULL.
190
191       The  domain  instance  returned by fi_getinfo should only be considered
192       valid if the application does not close any domain instances  from  an‐
193       other thread while fi_getinfo is being processed.
194
195   Name
196       The name of the access domain.
197
198   Multi-threading Support (threading)
199       The threading model specifies the level of serialization required of an
200       application when using the libfabric data transfer interfaces.  Control
201       interfaces  are  always  considered thread safe, and may be accessed by
202       multiple threads.  Applications which can  guarantee  serialization  in
203       their  access  of provider allocated resources and interfaces enables a
204       provider to eliminate lower-level locks.
205
206       FI_THREAD_COMPLETION
207              The completion threading model is intended  for  providers  that
208              make use of manual progress.  Applications must serialize access
209              to all objects that are associated through the use of  having  a
210              shared  completion  structure.  This includes endpoint, transmit
211              context, receive context, completion queue, counter,  wait  set,
212              and poll set objects.
213
214       For example, threads must serialize access to an endpoint and its bound
215       completion queue(s) and/or counters.  Access to  endpoints  that  share
216       the same completion queue must also be serialized.
217
218       The   use   of   FI_THREAD_COMPLETION  can  increase  parallelism  over
219       FI_THREAD_SAFE, but requires the use of isolated resources.
220
221       FI_THREAD_DOMAIN
222              A domain serialization model requires applications to  serialize
223              access to all objects belonging to a domain.
224
225       FI_THREAD_ENDPOINT
226              The  endpoint  threading  model is similar to FI_THREAD_FID, but
227              with the added restriction that serialization is  required  when
228              accessing  the  same endpoint, even if multiple transmit and re‐
229              ceive contexts are used.  Conceptually, FI_THREAD_ENDPOINT  maps
230              well to providers that implement fabric services in hardware but
231              use a single command queue to access different data flows.
232
233       FI_THREAD_FID
234              A fabric descriptor (FID) serialization model requires  applica‐
235              tions to serialize access to individual fabric resources associ‐
236              ated with data transfer operations  and  completions.   Multiple
237              threads  must  be  serialized  when accessing the same endpoint,
238              transmit context, receive context,  completion  queue,  counter,
239              wait  set,  or  poll  set.   Serialization  is  required only by
240              threads accessing the same object.
241
242       For example, one thread may be initiating a data transfer  on  an  end‐
243       point,  while  another  thread reads from a completion queue associated
244       with the endpoint.
245
246       Serialization to endpoint access is only required  when  accessing  the
247       same  endpoint  data  flow.  Multiple threads may initiate transfers on
248       different transmit contexts of the same endpoint  without  serializing,
249       and  no serialization is required between the submission of data trans‐
250       mit requests and data receive operations.
251
252       In general, FI_THREAD_FID allows the provider to be implemented without
253       needing  internal  locking when handling data transfers.  Conceptually,
254       FI_THREAD_FID maps well to providers that implement fabric services  in
255       hardware and provide separate command queues to different data flows.
256
257       FI_THREAD_SAFE
258              A thread safe serialization model allows a multi-threaded appli‐
259              cation to access any allocated resources through  any  interface
260              without  restriction.   All  providers  are  required to support
261              FI_THREAD_SAFE.
262
263       FI_THREAD_UNSPEC
264              This value indicates that no threading model has  been  defined.
265              It  may  be  used  on  input hints to the fi_getinfo call.  When
266              specified, providers will return a threading model  that  allows
267              for the greatest level of parallelism.
268
269   Progress Models (control_progress / data_progress)
270       Progress  is  the  ability of the underlying implementation to complete
271       processing of an asynchronous request.  In many cases,  the  processing
272       of an asynchronous request requires the use of the host processor.  For
273       example, a received message may need to be  matched  with  the  correct
274       buffer,  or a timed out request may need to be retransmitted.  For per‐
275       formance reasons, it may be undesirable for the provider to allocate  a
276       thread  for  this  purpose,  which  will  compete  with the application
277       threads.
278
279       Control progress indicates the method that the provider  uses  to  make
280       progress  on  asynchronous  control operations.  Control operations are
281       functions which do not directly involve the transfer of application da‐
282       ta  between  endpoints.   They include address vector, memory registra‐
283       tion, and connection management routines.
284
285       Data progress indicates the method  that  the  provider  uses  to  make
286       progress  on  data  transfer  operations.  This includes message queue,
287       RMA, tagged messaging, and atomic operations, along with their  comple‐
288       tion processing.
289
290       Progress  frequently  requires action being taken at both the transmit‐
291       ting and receiving sides of an operation.  This is often a  requirement
292       for  reliable  transfers, as a result of retry and acknowledgement pro‐
293       cessing.
294
295       To balance between performance and ease of use, two progress models are
296       defined.
297
298       FI_PROGRESS_AUTO
299              This  progress  model indicates that the provider will make for‐
300              ward progress on an asynchronous operation without  further  in‐
301              tervention by the application.  When FI_PROGRESS_AUTO is provid‐
302              ed as output to fi_getinfo in the absence of any progress hints,
303              it often indicates that the desired functionality is implemented
304              by the provider hardware or is a standard service of the operat‐
305              ing system.
306
307       All  providers are required to support FI_PROGRESS_AUTO.  However, if a
308       provider does not natively support automatic progress, forcing the  use
309       of  FI_PROGRESS_AUTO  may  result  in threads being allocated below the
310       fabric interfaces.
311
312       FI_PROGRESS_MANUAL
313              This progress model indicates that the provider requires the use
314              of  an  application  thread to complete an asynchronous request.
315              When manual progress is set, the provider will  attempt  to  ad‐
316              vance an asynchronous operation forward when the application at‐
317              tempts to wait on or read an event queue, completion  queue,  or
318              counter   where   the  completed  operation  will  be  reported.
319              Progress also occurs when the application processes  a  poll  or
320              wait  set  that has been associated with the event or completion
321              queue.
322
323       Only wait operations defined by the fabric interface will result in  an
324       operation  progressing.   Operating  system or external wait functions,
325       such as select, poll, or pthread routines, cannot.
326
327       Manual progress requirements not only apply to endpoints that  initiate
328       transmit  operations,  but  also to endpoints that may be the target of
329       such operations.  This holds true even if the target endpoint will  not
330       generate  completion  events  for the operations.  For example, an end‐
331       point that acts purely as the target of RMA or atomic  operations  that
332       uses  manual  progress may still need application assistance to process
333       received operations.
334
335       FI_PROGRESS_UNSPEC
336              This value indicates that no progress model  has  been  defined.
337              It may be used on input hints to the fi_getinfo call.
338
339   Resource Management (resource_mgmt)
340       Resource  management  (RM)  is provider and protocol support to protect
341       against overrunning local and remote resources.   This  includes  local
342       and  remote transmit contexts, receive contexts, completion queues, and
343       source and target data buffers.
344
345       When enabled, applications are given some level of  protection  against
346       overrunning  provider  queues  and local and remote data buffers.  Such
347       support may be built directly into the hardware and/or  network  proto‐
348       col,  but may also require that checks be enabled in the provider soft‐
349       ware.  By disabling resource management, an application assumes all re‐
350       sponsibility for preventing queue and buffer overruns, but doing so may
351       allow a provider to eliminate internal synchronization calls,  such  as
352       atomic variables or locks.
353
354       It  should  be  noted that even if resource management is disabled, the
355       provider implementation and protocol may still provide  some  level  of
356       protection  against  overruns.  However, such protection is not guaran‐
357       teed.  The following values for resource management are defined.
358
359       FI_RM_DISABLED
360              The provider is free to select an  implementation  and  protocol
361              that  does  not protect against resource overruns.  The applica‐
362              tion is responsible for resource protection.
363
364       FI_RM_ENABLED
365              Resource management is enabled for this provider domain.
366
367       FI_RM_UNSPEC
368              This value indicates that no resource management model has  been
369              defined.  It may be used on input hints to the fi_getinfo call.
370
371       The  behavior  of  the  various  resource management options depends on
372       whether the endpoint is reliable or unreliable, as well as provider and
373       protocol specific implementation details, as shown in the following ta‐
374       ble.  The table assumes that all peers enable or disable RM the same.
375
376       Resource    DGRAM EP-no RM    DGRAM EP-with RM   RDM/MSG   EP-no   RDM/MSG EP-with
377                                                        RM                RM
378       ───────────────────────────────────────────────────────────────────────────────────
379        Tx Ctx    undefined error         EAGAIN        undefined error        EAGAIN
380        Rx Ctx    undefined error         EAGAIN        undefined error        EAGAIN
381        Tx CQ     undefined error         EAGAIN        undefined error        EAGAIN
382        Rx CQ     undefined error         EAGAIN        undefined error        EAGAIN
383        Target        dropped            dropped         transmit error       retried
384        EP
385       No    Rx       dropped            dropped         transmit error       retried
386       Buffer
387       Rx   Buf   truncate or drop   truncate or drop   truncate or er‐   truncate or er‐
388       Overrun                                          ror               ror
389       Un‐         not applicable     not applicable     transmit error    transmit error
390       matched
391       RMA
392       RMA         not applicable     not applicable     transmit error    transmit error
393       Overrun
394
395       The resource column indicates the resource being  accessed  by  a  data
396       transfer operation.
397
398       Tx Ctx / Rx Ctx
399              Refers to the transmit/receive contexts when a data transfer op‐
400              eration is submitted.  When RM is enabled, attempting to  submit
401              a  request will fail if the context is full.  If RM is disabled,
402              an undefined error (provider specific) will occur.  Such  errors
403              should be considered fatal to the context, and applications must
404              take steps to avoid queue overruns.
405
406       Tx CQ / Rx CQ
407              Refers to the completion queue associated with the Tx or Rx con‐
408              text when a local operation completes.  When RM is disabled, ap‐
409              plications must take care to ensure that  completion  queues  do
410              not  get overrun.  When an overrun occurs, an undefined, but fa‐
411              tal, error will occur affecting all  endpoints  associated  with
412              the CQ.  Overruns can be avoided by sizing the CQs appropriately
413              or by deferring the posting of a data transfer operation  unless
414              CQ  space  is available to store its completion.  When RM is en‐
415              abled, providers may use  different  mechanisms  to  prevent  CQ
416              overruns.   This  includes  failing  (returning  -FI_EAGAIN) the
417              posting of operations that could result in CQ overruns,  or  in‐
418              ternally retrying requests (which will be hidden from the appli‐
419              cation).  See notes at the end of this section regarding CQ  re‐
420              source management restrictions.
421
422       Target EP / No Rx Buffer
423              Target  EP refers to resources associated with the endpoint that
424              is the target of a transmit operation.  This includes the target
425              endpoint's  receive  queue,  posted  receive buffers (no Rx buf‐
426              fers), the receive side  completion  queue,  and  other  related
427              packet  processing queues.  The defined behavior is that seen by
428              the initiator of a request.  For FI_EP_DGRAM endpoints,  if  the
429              target  EP  queues  are  unable to accept incoming messages, re‐
430              ceived messages will be dropped.  For reliable endpoints, if  RM
431              is  disabled, the transmit operation will complete in error.  If
432              RM is enabled, the provider will internally retry the operation.
433
434       Rx Buffer Overrun
435              This refers to buffers posted to receive incoming tagged or  un‐
436              tagged messages, with the behavior defined from the viewpoint of
437              the sender.  The behavior for handling  received  messages  that
438              are  larger  than  the  buffers  provided  by the application is
439              provider specific.  Providers may either  truncate  the  message
440              and  report a successful completion, or fail the operation.  For
441              datagram endpoints, failed sends will result in the message  be‐
442              ing  dropped.   For reliable endpoints, send operations may com‐
443              plete successfully, yet be truncated at the receive side.   This
444              can  occur  when  the target side buffers received data until an
445              application buffer is made available.  The completion status may
446              also be dependent upon the completion model selected byt the ap‐
447              plication (e.g.   FI_DELIVERY_COMPLETE  versus  FI_TRANSMIT_COM‐
448              PLETE).
449
450       Unmatched RMA / RMA Overrun
451              Unmatched  RMA  and RMA overruns deal with the processing of RMA
452              and atomic operations.  Unlike send operations,  RMA  operations
453              that  attempt to access a memory address that is either not reg‐
454              istered for such operations, or attempt to access outside of the
455              target memory region will fail, resulting in a transmit error.
456
457       When a resource management error occurs on an endpoint, the endpoint is
458       transitioned into a disabled state.  Any operations which have not  al‐
459       ready  completed  will  fail and be discarded.  For connectionless end‐
460       points, the endpoint must be re-enabled before it will accept new  data
461       transfer  operations.   For connected endpoints, the connection is torn
462       down and must be re-established.
463
464       There is one notable restriction on the protections offered by resource
465       management.  This occurs when resource management is enabled on an end‐
466       point that has been bound to completion queue(s)  using  the  FI_SELEC‐
467       TIVE_COMPLETION flag.  Operations posted to such an endpoint may speci‐
468       fy that a successful completion should not generate a entry on the cor‐
469       responding  completion  queue.  (I.e.  the operation leaves the FI_COM‐
470       PLETION flag unset).  In such situations, the provider is not  required
471       to  reserve  an  entry in the completion queue to handle the case where
472       the operation fails and does generate a CQ entry,  which  would  effec‐
473       tively require tracking the operation to completion.  Applications con‐
474       cerned with avoiding CQ overruns in the occurrence of errors  must  en‐
475       sure  that  there is sufficient space in the CQ to report failed opera‐
476       tions.  This can typically be achieved by sizing the CQ to at least the
477       same size as the endpoint queue(s) that are bound to it.
478
479   AV Type (av_type)
480       Specifies the type of address vectors that are usable with this domain.
481       For additional details on AV type, see fi_av(3).  The following  values
482       may be specified.
483
484       FI_AV_MAP
485              Only address vectors of type AV map are requested or supported.
486
487       FI_AV_TABLE
488              Only  address vectors of type AV index are requested or support‐
489              ed.
490
491       FI_AV_UNSPEC
492              Any address vector format is requested and supported.
493
494       Address vectors are only used by  connectionless  endpoints.   Applica‐
495       tions  that require the use of a specific type of address vector should
496       set the domain attribute av_type to the necessary  value  when  calling
497       fi_getinfo.   The  value  FI_AV_UNSPEC may be used to indicate that the
498       provider can support either address vector format.   In  this  case,  a
499       provider may return FI_AV_UNSPEC to indicate that either format is sup‐
500       portable, or may return another AV type to indicate the optimal AV type
501       supported by this domain.
502
503   Memory Registration Mode (mr_mode)
504       Defines  memory  registration specific mode bits used with this domain.
505       Full details on MR mode options are available in fi_mr(3).  The follow‐
506       ing values may be specified.
507
508       FI_MR_ALLOCATED
509              Indicates that memory registration occurs on allocated data buf‐
510              fers, and physical pages must back all virtual  addresses  being
511              registered.
512
513       FI_MR_ENDPOINT
514              Memory  registration  occurs  at the endpoint level, rather than
515              domain.
516
517       FI_MR_LOCAL
518              The provider is optimized around  having  applications  register
519              memory  for locally accessed data buffers.  Data buffers used in
520              send and receive operations and as the source buffer for RMA and
521              atomic  operations must be registered by the application for ac‐
522              cess domains opened with this capability.
523
524       FI_MR_MMU_NOTIFY
525              Indicates that the application is responsible for notifying  the
526              provider  when  the  page tables referencing a registered memory
527              region may have been updated.
528
529       FI_MR_PROV_KEY
530              Memory registration  keys  are  selected  and  returned  by  the
531              provider.
532
533       FI_MR_RAW
534              The  provider  requires additional setup as part of their memory
535              registration process.  This mode is required by  providers  that
536              use a memory key that is larger than 64-bits.
537
538       FI_MR_RMA_EVENT
539              Indicates  that  the  memory  regions associated with completion
540              counters must be explicitly enabled after  being  bound  to  any
541              counter.
542
543       FI_MR_UNSPEC
544              Defined  for  compatibility -- library versions 1.4 and earlier.
545              Setting mr_mode to 0 indicates that FI_MR_BASIC  or  FI_MR_SCAL‐
546              ABLE are requested and supported.
547
548       FI_MR_VIRT_ADDR
549              Registered memory regions are referenced by peers using the vir‐
550              tual address of the registered  memory  region,  rather  than  a
551              0-based offset.
552
553       FI_MR_BASIC
554              Defined  for  compatibility -- library versions 1.4 and earlier.
555              Only basic memory registration operations are requested or  sup‐
556              ported.    This  mode  is  equivalent  to  the  FI_MR_VIRT_ADDR,
557              FI_MR_ALLOCATED, and FI_MR_PROV_KEY flags being set in later li‐
558              brary  versions.   This flag may not be used in conjunction with
559              other mr_mode bits.
560
561       FI_MR_SCALABLE
562              Defined for compatibility -- library versions 1.4  and  earlier.
563              Only  scalable  memory  registration operations are requested or
564              supported.  Scalable registration uses offset based  addressing,
565              with  application  selectable memory keys.  For library versions
566              1.5 and later, this is the default if no mr_mode bits  are  set.
567              This  flag  may  not  be  used in conjunction with other mr_mode
568              bits.
569
570       Buffers used in data transfer  operations  may  require  notifying  the
571       provider  of  their  use before a data transfer can occur.  The mr_mode
572       field indicates the type of memory registration that is  required,  and
573       when registration is necessary.  Applications that require the use of a
574       specific registration mode should set the domain attribute  mr_mode  to
575       the  necessary  value  when calling fi_getinfo.  The value FI_MR_UNSPEC
576       may be used to indicate support for any registration mode.
577
578   MR Key Size (mr_key_size)
579       Size of the memory region remote access key,  in  bytes.   Applications
580       that  request  their  own  MR  key must select a value within the range
581       specified by this value.  Key sizes larger than 8 bytes  require  using
582       the FI_RAW_KEY mode bit.
583
584   CQ Data Size (cq_data_size)
585       Applications  may  include a small message with a data transfer that is
586       placed directly into a remote completion queue as part of a  completion
587       event.  This is referred to as remote CQ data (sometimes referred to as
588       immediate data).  This field indicates the number  of  bytes  that  the
589       provider  supports for remote CQ data.  If supported (non-zero value is
590       returned), the minimum size of remote CQ data must be at least 4-bytes.
591
592   Completion Queue Count (cq_cnt)
593       The optimal number of completion queues supported by the domain,  rela‐
594       tive  to  any specified or default CQ attributes.  The cq_cnt value may
595       be a fixed value of the maximum number of CQs supported by the underly‐
596       ing  hardware,  or  may  be  a  dynamic value, based on the default at‐
597       tributes of an allocated CQ, such as the CQ size and data format.
598
599   Endpoint Count (ep_cnt)
600       The total number of endpoints supported by the domain, relative to  any
601       specified  or  default  endpoint attributes.  The ep_cnt value may be a
602       fixed value of the maximum number of endpoints supported by the  under‐
603       lying  hardware,  or  may  be a dynamic value, based on the default at‐
604       tributes of an allocated endpoint, such as  the  endpoint  capabilities
605       and  size.   The  endpoint count is the number of addressable endpoints
606       supported by the provider.  Providers return capability limits based on
607       configured hardware maximum capabilities.  Providers cannot predict all
608       possible system limitations without posteriori knowledge acquired  dur‐
609       ing  runtime that will further limit these hardware maximums (e.g.  ap‐
610       plication memory consumption, FD usage, etc.).
611
612   Transmit Context Count (tx_ctx_cnt)
613       The number of  outbound  command  queues  optimally  supported  by  the
614       provider.  For a low-level provider, this represents the number of com‐
615       mand queues to the hardware and/or the number of parallel transmit  en‐
616       gines  effectively  supported by the hardware and caches.  Applications
617       which allocate more transmit contexts than this value will end up shar‐
618       ing  underlying resources.  By default, there is a single transmit con‐
619       text associated with each endpoint, but in an advanced usage model,  an
620       endpoint may be configured with multiple transmit contexts.
621
622   Receive Context Count (rx_ctx_cnt)
623       The  number  of  inbound  processing  queues optimally supported by the
624       provider.  For a low-level provider, this represents the  number  hard‐
625       ware  queues  that  can be effectively utilized for processing incoming
626       packets.  Applications which allocate more receive contexts  than  this
627       value  will  end up sharing underlying resources.  By default, a single
628       receive context is associated with each endpoint, but  in  an  advanced
629       usage  model,  an endpoint may be configured with multiple receive con‐
630       texts.
631
632   Maximum Endpoint Transmit Context (max_ep_tx_ctx)
633       The maximum number of transmit contexts that may be associated with  an
634       endpoint.
635
636   Maximum Endpoint Receive Context (max_ep_rx_ctx)
637       The  maximum  number of receive contexts that may be associated with an
638       endpoint.
639
640   Maximum Sharing of Transmit Context (max_ep_stx_ctx)
641       The maximum number of endpoints that may be associated  with  a  shared
642       transmit context.
643
644   Maximum Sharing of Receive Context (max_ep_srx_ctx)
645       The  maximum  number  of endpoints that may be associated with a shared
646       receive context.
647
648   Counter Count (cntr_cnt)
649       The optimal number of completion counters supported by the domain.  The
650       cq_cnt  value  may  be  a fixed value of the maximum number of counters
651       supported by the underlying hardware, or may be a dynamic value,  based
652       on the default attributes of the domain.
653
654   MR IOV Limit (mr_iov_limit)
655       This is the maximum number of IO vectors (scatter-gather elements) that
656       a single memory registration operation may reference.
657
658   Capabilities (caps)
659       Domain level capabilities.  Domain capabilities indicate  domain  level
660       features that are supported by the provider.
661
662       FI_LOCAL_COMM
663              At  a conceptual level, this field indicates that the underlying
664              device supports loopback communication.  More specifically, this
665              field indicates that an endpoint may communicate with other end‐
666              points that are allocated from the same underlying named domain.
667              If  this field is not set, an application may need to use an al‐
668              ternate domain or mechanism (e.g.  shared memory) to communicate
669              with peers that execute on the same node.
670
671       FI_REMOTE_COMM
672              This  field indicates that the underlying provider supports com‐
673              munication with nodes that are reachable over the  network.   If
674              this  field is not set, then the provider only supports communi‐
675              cation between processes that execute on  the  same  node  --  a
676              shared memory provider, for example.
677
678       FI_SHARED_AV
679              Indicates  that the domain supports the ability to share address
680              vectors among multiple processes using the named address  vector
681              feature.
682
683       See fi_getinfo(3) for a discussion on primary versus secondary capabil‐
684       ities.  All domain capabilities are considered secondary capabilities.
685
686   mode
687       The operational mode bit related to using the domain.
688
689       FI_RESTRICTED_COMP
690              This bit indicates that the domain limits completion queues  and
691              counters  to only be used with endpoints, transmit contexts, and
692              receive contexts that have the same set of capability flags.
693
694   Default authorization key (auth_key)
695       The default authorization key to associate  with  endpoint  and  memory
696       registrations  created within the domain.  This field is ignored unless
697       the fabric is opened with API version 1.5 or greater.
698
699   Default authorization key length (auth_key_size)
700       The length in bytes of the default authorization key  for  the  domain.
701       If  set  to  0,  then no authorization key will be associated with end‐
702       points and memory registrations created within the domain unless speci‐
703       fied  in the endpoint or memory registration attributes.  This field is
704       ignored unless the fabric is opened with API version 1.5 or greater.
705
706   Max Error Data Size (max_err_data)
707       : The maximum amount of error data, in bytes, that may be  returned  as
708       part  of  a completion or event queue error.  This value corresponds to
709       the  err_data_size  field  in   struct   fi_cq_err_entry   and   struct
710       fi_eq_err_entry.
711
712   Memory Regions Count (mr_cnt)
713       The  optimal  number of memory regions supported by the domain, or end‐
714       point if the mr_mode FI_MR_ENDPOINT bit has been set.  The mr_cnt value
715       may  be a fixed value of the maximum number of MRs supported by the un‐
716       derlying hardware, or may be a dynamic value, based on the default  at‐
717       tributes  of  the  domain,  such  as  the supported memory registration
718       modes.  Applications can set the mr_cnt on input to fi_getinfo, in  or‐
719       der  to  indicate their memory registration requirements.  Doing so may
720       allow the provider to optimize any memory registration cache or  lookup
721       tables.
722
723   Traffic Class (tclass)
724       This  specifies  the  default traffic class that will be associated any
725       endpoints created  within  the  domain.   See  [fi_endpoint(3)](fi_end‐
726       point.3.html for additional information.
727

RETURN VALUE

729       Returns 0 on success.  On error, a negative value corresponding to fab‐
730       ric errno is returned.  Fabric errno values are defined in  rdma/fi_er‐
731       rno.h.
732

NOTES

734       Users  should  call  fi_close to release all resources allocated to the
735       fabric domain.
736
737       The following fabric resources are associated with domains: active end‐
738       points, memory regions, completion event queues, and address vectors.
739
740       Domain  attributes  reflect the limitations and capabilities of the un‐
741       derlying hardware and/or software provider.  They do not reflect system
742       limitations,  such  as the number of physical pages that an application
743       may pin or number of file descriptors that the  application  may  open.
744       As  a  result,  the  reported maximums may not be achievable, even on a
745       lightly loaded systems, without an administrator configuring system re‐
746       sources appropriately for the installed provider(s).
747

AUTHORS

752       OpenFabrics.
753
754
755
756Libfabric Programmer's Manual     2020-10-14                      fi_domain(3)