1fi_domain(3) Libfabric v1.18.1 fi_domain(3)
2
3
4
6 fi_domain - Open a fabric access domain
7
9 #include <rdma/fabric.h>
10
11 #include <rdma/fi_domain.h>
12
13 int fi_domain(struct fid_fabric *fabric, struct fi_info *info,
14 struct fid_domain **domain, void *context);
15
16 int fi_domain2(struct fid_fabric *fabric, struct fi_info *info,
17 struct fid_domain **domain, uint64_t flags, void *context);
18
19 int fi_close(struct fid *domain);
20
21 int fi_domain_bind(struct fid_domain *domain, struct fid *eq,
22 uint64_t flags);
23
24 int fi_open_ops(struct fid *domain, const char *name, uint64_t flags,
25 void **ops, void *context);
26
27 int fi_set_ops(struct fid *domain, const char *name, uint64_t flags,
28 void *ops, void *context);
29
31 fabric Fabric domain
32
33 info Fabric information, including domain capabilities and at‐
34 tributes.
35
36 domain An opened access domain.
37
38 context
39 User specified context associated with the domain. This context
40 is returned as part of any asynchronous event associated with
41 the domain.
42
43 eq Event queue for asynchronous operations initiated on the domain.
44
45 name Name associated with an interface.
46
47 ops Fabric interface operations.
48
50 An access domain typically refers to a physical or virtual NIC or hard‐
51 ware port; however, a domain may span across multiple hardware compo‐
52 nents for fail-over or data striping purposes. A domain defines the
53 boundary for associating different resources together. Fabric re‐
54 sources belonging to the same domain may share resources.
55
56 fi_domain
57 Opens a fabric access domain, also referred to as a resource domain.
58 Fabric domains are identified by a name. The properties of the opened
59 domain are specified using the info parameter.
60
61 fi_domain2
62 Similar to fi_domain, but accepts an extra parameter flags. Mainly
63 used for opening peer domain. See fi_peer(3).
64
65 fi_open_ops
66 fi_open_ops is used to open provider specific interfaces. Provider in‐
67 terfaces may be used to access low-level resources and operations that
68 are specific to the opened resource domain. The details of domain in‐
69 terfaces are outside the scope of this documentation.
70
71 fi_set_ops
72 fi_set_ops assigns callbacks that a provider should invoke in place of
73 performing selected tasks. This allows users to modify or control a
74 provider’s default behavior. Conceptually, it allows the user to hook
75 specific functions used by a provider and replace it with their own.
76
77 The operations being modified are identified using a well-known charac‐
78 ter string, passed as the name parameter. The format of the ops param‐
79 eter is dependent upon the name value. The ops parameter will refer‐
80 ence a structure containing the callbacks and other fields needed by
81 the provider to invoke the user’s functions.
82
83 If a provider accepts the override, it will return FI_SUCCESS. If the
84 override is unknown or not supported, the provider will return
85 -FI_ENOSYS. Overrides should be set prior to allocating resources on
86 the domain.
87
88 The following fi_set_ops operations and corresponding callback struc‐
89 tures are defined.
90
91 FI_SET_OPS_HMEM_OVERRIDE – Heterogeneous Memory Overrides
92
93 HMEM override allows users to override HMEM related operations a
94 provider may perform. Currently, the scope of the HMEM override is to
95 allow a user to define the memory movement functions a provider should
96 use when accessing a user buffer. The user-defined memory movement
97 functions need to account for all the different HMEM iface types a
98 provider may encounter.
99
100 All objects allocated against a domain will inherit this override.
101
102 The following is the HMEM override operation name and structure.
103
104 #define FI_SET_OPS_HMEM_OVERRIDE "hmem_override_ops"
105
106 struct fi_hmem_override_ops {
107 size_t size;
108
109 ssize_t (*copy_from_hmem_iov)(void *dest, size_t size,
110 enum fi_hmem_iface iface, uint64_t device, const struct iovec *hmem_iov,
111 size_t hmem_iov_count, uint64_t hmem_iov_offset);
112
113 ssize_t (*copy_to_hmem_iov)(enum fi_hmem_iface iface, uint64_t device,
114 const struct iovec *hmem_iov, size_t hmem_iov_count,
115 uint64_t hmem_iov_offset, const void *src, size_t size);
116 };
117
118 All fields in struct fi_hmem_override_ops must be set (non-null) to a
119 valid value.
120
121 size This should be set to the sizeof(struct fi_hmem_override_ops).
122 The size field is used for forward and backward compatibility
123 purposes.
124
125 copy_from_hmem_iov
126 Copy data from the device/hmem to host memory. This function
127 should return a negative fi_errno on error, or the number of
128 bytes copied on success.
129
130 copy_to_hmem_iov
131 Copy data from host memory to the device/hmem. This function
132 should return a negative fi_errno on error, or the number of
133 bytes copied on success.
134
135 fi_domain_bind
136 Associates an event queue with the domain. An event queue bound to a
137 domain will be the default EQ associated with asynchronous control
138 events that occur on the domain or active endpoints allocated on a do‐
139 main. This includes CM events. Endpoints may direct their control
140 events to alternate EQs by binding directly with the EQ.
141
142 Binding an event queue to a domain with the FI_REG_MR flag indicates
143 that the provider should perform all memory registration operations
144 asynchronously, with the completion reported through the event queue.
145 If an event queue is not bound to the domain with the FI_REG_MR flag,
146 then memory registration requests complete synchronously.
147
148 See fi_av_bind(3), fi_ep_bind(3), fi_mr_bind(3), fi_pep_bind(3), and
149 fi_scalable_ep_bind(3) for more information.
150
151 fi_close
152 The fi_close call is used to release all resources associated with a
153 domain or interface. All objects associated with the opened domain
154 must be released prior to calling fi_close, otherwise the call will re‐
155 turn -FI_EBUSY.
156
158 The fi_domain_attr structure defines the set of attributes associated
159 with a domain.
160
161 struct fi_domain_attr {
162 struct fid_domain *domain;
163 char *name;
164 enum fi_threading threading;
165 enum fi_progress control_progress;
166 enum fi_progress data_progress;
167 enum fi_resource_mgmt resource_mgmt;
168 enum fi_av_type av_type;
169 int mr_mode;
170 size_t mr_key_size;
171 size_t cq_data_size;
172 size_t cq_cnt;
173 size_t ep_cnt;
174 size_t tx_ctx_cnt;
175 size_t rx_ctx_cnt;
176 size_t max_ep_tx_ctx;
177 size_t max_ep_rx_ctx;
178 size_t max_ep_stx_ctx;
179 size_t max_ep_srx_ctx;
180 size_t cntr_cnt;
181 size_t mr_iov_limit;
182 uint64_t caps;
183 uint64_t mode;
184 uint8_t *auth_key;
185 size_t auth_key_size;
186 size_t max_err_data;
187 size_t mr_cnt;
188 uint32_t tclass;
189 };
190
191 domain
192 On input to fi_getinfo, a user may set this to an opened domain in‐
193 stance to restrict output to the given domain. On output from fi_get‐
194 info, if no domain was specified, but the user has an opened instance
195 of the named domain, this will reference the first opened instance. If
196 no instance has been opened, this field will be NULL.
197
198 The domain instance returned by fi_getinfo should only be considered
199 valid if the application does not close any domain instances from an‐
200 other thread while fi_getinfo is being processed.
201
202 Name
203 The name of the access domain.
204
205 Multi-threading Support (threading)
206 The threading model specifies the level of serialization required of an
207 application when using the libfabric data transfer interfaces. Control
208 interfaces are always considered thread safe, and may be accessed by
209 multiple threads. Applications which can guarantee serialization in
210 their access of provider allocated resources and interfaces enables a
211 provider to eliminate lower-level locks.
212
213 FI_THREAD_COMPLETION
214 The completion threading model is intended for providers that
215 make use of manual progress. Applications must serialize access
216 to all objects that are associated through the use of having a
217 shared completion structure. This includes endpoint, transmit
218 context, receive context, completion queue, counter, wait set,
219 and poll set objects.
220
221 For example, threads must serialize access to an endpoint and its bound
222 completion queue(s) and/or counters. Access to endpoints that share
223 the same completion queue must also be serialized.
224
225 The use of FI_THREAD_COMPLETION can increase parallelism over
226 FI_THREAD_SAFE, but requires the use of isolated resources.
227
228 FI_THREAD_DOMAIN
229 A domain serialization model requires applications to serialize
230 access to all objects belonging to a domain.
231
232 FI_THREAD_ENDPOINT
233 The endpoint threading model is similar to FI_THREAD_FID, but
234 with the added restriction that serialization is required when
235 accessing the same endpoint, even if multiple transmit and re‐
236 ceive contexts are used. Conceptually, FI_THREAD_ENDPOINT maps
237 well to providers that implement fabric services in hardware but
238 use a single command queue to access different data flows.
239
240 FI_THREAD_FID
241 A fabric descriptor (FID) serialization model requires applica‐
242 tions to serialize access to individual fabric resources associ‐
243 ated with data transfer operations and completions. Multiple
244 threads must be serialized when accessing the same endpoint,
245 transmit context, receive context, completion queue, counter,
246 wait set, or poll set. Serialization is required only by
247 threads accessing the same object.
248
249 For example, one thread may be initiating a data transfer on an end‐
250 point, while another thread reads from a completion queue associated
251 with the endpoint.
252
253 Serialization to endpoint access is only required when accessing the
254 same endpoint data flow. Multiple threads may initiate transfers on
255 different transmit contexts of the same endpoint without serializing,
256 and no serialization is required between the submission of data trans‐
257 mit requests and data receive operations.
258
259 In general, FI_THREAD_FID allows the provider to be implemented without
260 needing internal locking when handling data transfers. Conceptually,
261 FI_THREAD_FID maps well to providers that implement fabric services in
262 hardware and provide separate command queues to different data flows.
263
264 FI_THREAD_SAFE
265 A thread safe serialization model allows a multi-threaded appli‐
266 cation to access any allocated resources through any interface
267 without restriction. All providers are required to support
268 FI_THREAD_SAFE.
269
270 FI_THREAD_UNSPEC
271 This value indicates that no threading model has been defined.
272 It may be used on input hints to the fi_getinfo call. When
273 specified, providers will return a threading model that allows
274 for the greatest level of parallelism.
275
276 Progress Models (control_progress / data_progress)
277 Progress is the ability of the underlying implementation to complete
278 processing of an asynchronous request. In many cases, the processing
279 of an asynchronous request requires the use of the host processor. For
280 example, a received message may need to be matched with the correct
281 buffer, or a timed out request may need to be retransmitted. For per‐
282 formance reasons, it may be undesirable for the provider to allocate a
283 thread for this purpose, which will compete with the application
284 threads.
285
286 Control progress indicates the method that the provider uses to make
287 progress on asynchronous control operations. Control operations are
288 functions which do not directly involve the transfer of application da‐
289 ta between endpoints. They include address vector, memory registra‐
290 tion, and connection management routines.
291
292 Data progress indicates the method that the provider uses to make
293 progress on data transfer operations. This includes message queue,
294 RMA, tagged messaging, and atomic operations, along with their comple‐
295 tion processing.
296
297 Progress frequently requires action being taken at both the transmit‐
298 ting and receiving sides of an operation. This is often a requirement
299 for reliable transfers, as a result of retry and acknowledgement pro‐
300 cessing.
301
302 To balance between performance and ease of use, two progress models are
303 defined.
304
305 FI_PROGRESS_AUTO
306 This progress model indicates that the provider will make for‐
307 ward progress on an asynchronous operation without further in‐
308 tervention by the application. When FI_PROGRESS_AUTO is provid‐
309 ed as output to fi_getinfo in the absence of any progress hints,
310 it often indicates that the desired functionality is implemented
311 by the provider hardware or is a standard service of the operat‐
312 ing system.
313
314 It is recommended that providers support FI_PROGRESS_AUTO. However, if
315 a provider does not natively support automatic progress, forcing the
316 use of FI_PROGRESS_AUTO may result in threads being allocated below the
317 fabric interfaces.
318
319 Note that prior versions of the library required providers to support
320 FI_PROGRESS_AUTO. However, in some cases progress threads cannot be
321 blocked when communication is idle, which results in threads spinning
322 in progress functions. As a result, those providers only supported
323 FI_PROGRESS_MANUAL.
324
325 FI_PROGRESS_MANUAL
326 This progress model indicates that the provider requires the use
327 of an application thread to complete an asynchronous request.
328 When manual progress is set, the provider will attempt to ad‐
329 vance an asynchronous operation forward when the application at‐
330 tempts to wait on or read an event queue, completion queue, or
331 counter where the completed operation will be reported.
332 Progress also occurs when the application processes a poll or
333 wait set that has been associated with the event or completion
334 queue.
335
336 Only wait operations defined by the fabric interface will result in an
337 operation progressing. Operating system or external wait functions,
338 such as select, poll, or pthread routines, cannot.
339
340 Manual progress requirements not only apply to endpoints that initiate
341 transmit operations, but also to endpoints that may be the target of
342 such operations. This holds true even if the target endpoint will not
343 generate completion events for the operations. For example, an end‐
344 point that acts purely as the target of RMA or atomic operations that
345 uses manual progress may still need application assistance to process
346 received operations.
347
348 FI_PROGRESS_UNSPEC
349 This value indicates that no progress model has been defined.
350 It may be used on input hints to the fi_getinfo call.
351
352 Resource Management (resource_mgmt)
353 Resource management (RM) is provider and protocol support to protect
354 against overrunning local and remote resources. This includes local
355 and remote transmit contexts, receive contexts, completion queues, and
356 source and target data buffers.
357
358 When enabled, applications are given some level of protection against
359 overrunning provider queues and local and remote data buffers. Such
360 support may be built directly into the hardware and/or network proto‐
361 col, but may also require that checks be enabled in the provider soft‐
362 ware. By disabling resource management, an application assumes all re‐
363 sponsibility for preventing queue and buffer overruns, but doing so may
364 allow a provider to eliminate internal synchronization calls, such as
365 atomic variables or locks.
366
367 It should be noted that even if resource management is disabled, the
368 provider implementation and protocol may still provide some level of
369 protection against overruns. However, such protection is not guaran‐
370 teed. The following values for resource management are defined.
371
372 FI_RM_DISABLED
373 The provider is free to select an implementation and protocol
374 that does not protect against resource overruns. The applica‐
375 tion is responsible for resource protection.
376
377 FI_RM_ENABLED
378 Resource management is enabled for this provider domain.
379
380 FI_RM_UNSPEC
381 This value indicates that no resource management model has been
382 defined. It may be used on input hints to the fi_getinfo call.
383
384 The behavior of the various resource management options depends on
385 whether the endpoint is reliable or unreliable, as well as provider and
386 protocol specific implementation details, as shown in the following ta‐
387 ble. The table assumes that all peers enable or disable RM the same.
388
389 Resource DGRAM EP-no RM DGRAM EP-with RM RDM/MSG EP-no RDM/MSG EP-with
390 RM RM
391 ────────────────────────────────────────────────────────────────────────────────────
392 Tx Ctx undefined error EAGAIN undefined error EAGAIN
393 Rx Ctx undefined error EAGAIN undefined error EAGAIN
394 Tx CQ undefined error EAGAIN undefined error EAGAIN
395 Rx CQ undefined error EAGAIN undefined error EAGAIN
396 Target dropped dropped transmit error retried
397 EP
398 No Rx dropped dropped transmit error retried
399 Buffer
400 Rx Buf truncate or drop truncate or drop truncate or er‐ truncate or er‐
401 Overrun ror ror
402
403 Un‐ not applicable not applicable transmit error transmit error
404 matched
405 RMA
406 RMA not applicable not applicable transmit error transmit error
407 Overrun
408
409 The resource column indicates the resource being accessed by a data
410 transfer operation.
411
412 Tx Ctx / Rx Ctx
413 Refers to the transmit/receive contexts when a data transfer op‐
414 eration is submitted. When RM is enabled, attempting to submit
415 a request will fail if the context is full. If RM is disabled,
416 an undefined error (provider specific) will occur. Such errors
417 should be considered fatal to the context, and applications must
418 take steps to avoid queue overruns.
419
420 Tx CQ / Rx CQ
421 Refers to the completion queue associated with the Tx or Rx con‐
422 text when a local operation completes. When RM is disabled, ap‐
423 plications must take care to ensure that completion queues do
424 not get overrun. When an overrun occurs, an undefined, but fa‐
425 tal, error will occur affecting all endpoints associated with
426 the CQ. Overruns can be avoided by sizing the CQs appropriately
427 or by deferring the posting of a data transfer operation unless
428 CQ space is available to store its completion. When RM is en‐
429 abled, providers may use different mechanisms to prevent CQ
430 overruns. This includes failing (returning -FI_EAGAIN) the
431 posting of operations that could result in CQ overruns, or in‐
432 ternally retrying requests (which will be hidden from the appli‐
433 cation). See notes at the end of this section regarding CQ re‐
434 source management restrictions.
435
436 Target EP / No Rx Buffer
437 Target EP refers to resources associated with the endpoint that
438 is the target of a transmit operation. This includes the target
439 endpoint’s receive queue, posted receive buffers (no Rx buf‐
440 fers), the receive side completion queue, and other related
441 packet processing queues. The defined behavior is that seen by
442 the initiator of a request. For FI_EP_DGRAM endpoints, if the
443 target EP queues are unable to accept incoming messages, re‐
444 ceived messages will be dropped. For reliable endpoints, if RM
445 is disabled, the transmit operation will complete in error. A
446 provider may choose to return an error completion with the error
447 code FI_ENORX for that transmit operation so that it can be re‐
448 tried. If RM is enabled, the provider will internally retry the
449 operation.
450
451 Rx Buffer Overrun
452 This refers to buffers posted to receive incoming tagged or un‐
453 tagged messages, with the behavior defined from the viewpoint of
454 the sender. The behavior for handling received messages that
455 are larger than the buffers provided by the application is
456 provider specific. Providers may either truncate the message
457 and report a successful completion, or fail the operation. For
458 datagram endpoints, failed sends will result in the message be‐
459 ing dropped. For reliable endpoints, send operations may com‐
460 plete successfully, yet be truncated at the receive side. This
461 can occur when the target side buffers received data until an
462 application buffer is made available. The completion status may
463 also be dependent upon the completion model selected byt the ap‐
464 plication (e.g. FI_DELIVERY_COMPLETE versus FI_TRANSMIT_COM‐
465 PLETE).
466
467 Unmatched RMA / RMA Overrun
468 Unmatched RMA and RMA overruns deal with the processing of RMA
469 and atomic operations. Unlike send operations, RMA operations
470 that attempt to access a memory address that is either not reg‐
471 istered for such operations, or attempt to access outside of the
472 target memory region will fail, resulting in a transmit error.
473
474 When a resource management error occurs on an endpoint, the endpoint is
475 transitioned into a disabled state. Any operations which have not al‐
476 ready completed will fail and be discarded. For connectionless end‐
477 points, the endpoint must be re-enabled before it will accept new data
478 transfer operations. For connected endpoints, the connection is torn
479 down and must be re-established.
480
481 There is one notable restriction on the protections offered by resource
482 management. This occurs when resource management is enabled on an end‐
483 point that has been bound to completion queue(s) using the FI_SELEC‐
484 TIVE_COMPLETION flag. Operations posted to such an endpoint may speci‐
485 fy that a successful completion should not generate a entry on the cor‐
486 responding completion queue. (I.e. the operation leaves the FI_COM‐
487 PLETION flag unset). In such situations, the provider is not required
488 to reserve an entry in the completion queue to handle the case where
489 the operation fails and does generate a CQ entry, which would effec‐
490 tively require tracking the operation to completion. Applications con‐
491 cerned with avoiding CQ overruns in the occurrence of errors must en‐
492 sure that there is sufficient space in the CQ to report failed opera‐
493 tions. This can typically be achieved by sizing the CQ to at least the
494 same size as the endpoint queue(s) that are bound to it.
495
496 AV Type (av_type)
497 Specifies the type of address vectors that are usable with this domain.
498 For additional details on AV type, see fi_av(3). The following values
499 may be specified.
500
501 FI_AV_MAP
502 Only address vectors of type AV map are requested or supported.
503
504 FI_AV_TABLE
505 Only address vectors of type AV index are requested or support‐
506 ed.
507
508 FI_AV_UNSPEC
509 Any address vector format is requested and supported.
510
511 Address vectors are only used by connectionless endpoints. Applica‐
512 tions that require the use of a specific type of address vector should
513 set the domain attribute av_type to the necessary value when calling
514 fi_getinfo. The value FI_AV_UNSPEC may be used to indicate that the
515 provider can support either address vector format. In this case, a
516 provider may return FI_AV_UNSPEC to indicate that either format is sup‐
517 portable, or may return another AV type to indicate the optimal AV type
518 supported by this domain.
519
520 Memory Registration Mode (mr_mode)
521 Defines memory registration specific mode bits used with this domain.
522 Full details on MR mode options are available in fi_mr(3). The follow‐
523 ing values may be specified.
524
525 FI_MR_ALLOCATED
526 Indicates that memory registration occurs on allocated data buf‐
527 fers, and physical pages must back all virtual addresses being
528 registered.
529
530 FI_MR_COLLECTIVE
531 Requires data buffers passed to collective operations be explic‐
532 itly registered for collective operations using the FI_COLLEC‐
533 TIVE flag.
534
535 FI_MR_ENDPOINT
536 Memory registration occurs at the endpoint level, rather than
537 domain.
538
539 FI_MR_LOCAL
540 The provider is optimized around having applications register
541 memory for locally accessed data buffers. Data buffers used in
542 send and receive operations and as the source buffer for RMA and
543 atomic operations must be registered by the application for ac‐
544 cess domains opened with this capability.
545
546 FI_MR_MMU_NOTIFY
547 Indicates that the application is responsible for notifying the
548 provider when the page tables referencing a registered memory
549 region may have been updated.
550
551 FI_MR_PROV_KEY
552 Memory registration keys are selected and returned by the
553 provider.
554
555 FI_MR_RAW
556 The provider requires additional setup as part of their memory
557 registration process. This mode is required by providers that
558 use a memory key that is larger than 64-bits.
559
560 FI_MR_RMA_EVENT
561 Indicates that the memory regions associated with completion
562 counters must be explicitly enabled after being bound to any
563 counter.
564
565 FI_MR_UNSPEC
566 Defined for compatibility – library versions 1.4 and earlier.
567 Setting mr_mode to 0 indicates that FI_MR_BASIC or FI_MR_SCAL‐
568 ABLE are requested and supported.
569
570 FI_MR_VIRT_ADDR
571 Registered memory regions are referenced by peers using the vir‐
572 tual address of the registered memory region, rather than a
573 0-based offset.
574
575 FI_MR_BASIC
576 Defined for compatibility – library versions 1.4 and earlier.
577 Only basic memory registration operations are requested or sup‐
578 ported. This mode is equivalent to the FI_MR_VIRT_ADDR,
579 FI_MR_ALLOCATED, and FI_MR_PROV_KEY flags being set in later li‐
580 brary versions. This flag may not be used in conjunction with
581 other mr_mode bits.
582
583 FI_MR_SCALABLE
584 Defined for compatibility – library versions 1.4 and earlier.
585 Only scalable memory registration operations are requested or
586 supported. Scalable registration uses offset based addressing,
587 with application selectable memory keys. For library versions
588 1.5 and later, this is the default if no mr_mode bits are set.
589 This flag may not be used in conjunction with other mr_mode
590 bits.
591
592 Buffers used in data transfer operations may require notifying the
593 provider of their use before a data transfer can occur. The mr_mode
594 field indicates the type of memory registration that is required, and
595 when registration is necessary. Applications that require the use of a
596 specific registration mode should set the domain attribute mr_mode to
597 the necessary value when calling fi_getinfo. The value FI_MR_UNSPEC
598 may be used to indicate support for any registration mode.
599
600 MR Key Size (mr_key_size)
601 Size of the memory region remote access key, in bytes. Applications
602 that request their own MR key must select a value within the range
603 specified by this value. Key sizes larger than 8 bytes require using
604 the FI_RAW_KEY mode bit.
605
606 CQ Data Size (cq_data_size)
607 Applications may include a small message with a data transfer that is
608 placed directly into a remote completion queue as part of a completion
609 event. This is referred to as remote CQ data (sometimes referred to as
610 immediate data). This field indicates the number of bytes that the
611 provider supports for remote CQ data. If supported (non-zero value is
612 returned), the minimum size of remote CQ data must be at least 4-bytes.
613
614 Completion Queue Count (cq_cnt)
615 The optimal number of completion queues supported by the domain, rela‐
616 tive to any specified or default CQ attributes. The cq_cnt value may
617 be a fixed value of the maximum number of CQs supported by the underly‐
618 ing hardware, or may be a dynamic value, based on the default at‐
619 tributes of an allocated CQ, such as the CQ size and data format.
620
621 Endpoint Count (ep_cnt)
622 The total number of endpoints supported by the domain, relative to any
623 specified or default endpoint attributes. The ep_cnt value may be a
624 fixed value of the maximum number of endpoints supported by the under‐
625 lying hardware, or may be a dynamic value, based on the default at‐
626 tributes of an allocated endpoint, such as the endpoint capabilities
627 and size. The endpoint count is the number of addressable endpoints
628 supported by the provider. Providers return capability limits based on
629 configured hardware maximum capabilities. Providers cannot predict all
630 possible system limitations without posteriori knowledge acquired dur‐
631 ing runtime that will further limit these hardware maximums (e.g. ap‐
632 plication memory consumption, FD usage, etc.).
633
634 Transmit Context Count (tx_ctx_cnt)
635 The number of outbound command queues optimally supported by the
636 provider. For a low-level provider, this represents the number of com‐
637 mand queues to the hardware and/or the number of parallel transmit en‐
638 gines effectively supported by the hardware and caches. Applications
639 which allocate more transmit contexts than this value will end up shar‐
640 ing underlying resources. By default, there is a single transmit con‐
641 text associated with each endpoint, but in an advanced usage model, an
642 endpoint may be configured with multiple transmit contexts.
643
644 Receive Context Count (rx_ctx_cnt)
645 The number of inbound processing queues optimally supported by the
646 provider. For a low-level provider, this represents the number hard‐
647 ware queues that can be effectively utilized for processing incoming
648 packets. Applications which allocate more receive contexts than this
649 value will end up sharing underlying resources. By default, a single
650 receive context is associated with each endpoint, but in an advanced
651 usage model, an endpoint may be configured with multiple receive con‐
652 texts.
653
654 Maximum Endpoint Transmit Context (max_ep_tx_ctx)
655 The maximum number of transmit contexts that may be associated with an
656 endpoint.
657
658 Maximum Endpoint Receive Context (max_ep_rx_ctx)
659 The maximum number of receive contexts that may be associated with an
660 endpoint.
661
662 Maximum Sharing of Transmit Context (max_ep_stx_ctx)
663 The maximum number of endpoints that may be associated with a shared
664 transmit context.
665
666 Maximum Sharing of Receive Context (max_ep_srx_ctx)
667 The maximum number of endpoints that may be associated with a shared
668 receive context.
669
670 Counter Count (cntr_cnt)
671 The optimal number of completion counters supported by the domain. The
672 cq_cnt value may be a fixed value of the maximum number of counters
673 supported by the underlying hardware, or may be a dynamic value, based
674 on the default attributes of the domain.
675
676 MR IOV Limit (mr_iov_limit)
677 This is the maximum number of IO vectors (scatter-gather elements) that
678 a single memory registration operation may reference.
679
680 Capabilities (caps)
681 Domain level capabilities. Domain capabilities indicate domain level
682 features that are supported by the provider.
683
684 FI_LOCAL_COMM
685 At a conceptual level, this field indicates that the underlying
686 device supports loopback communication. More specifically, this
687 field indicates that an endpoint may communicate with other end‐
688 points that are allocated from the same underlying named domain.
689 If this field is not set, an application may need to use an al‐
690 ternate domain or mechanism (e.g. shared memory) to communicate
691 with peers that execute on the same node.
692
693 FI_REMOTE_COMM
694 This field indicates that the underlying provider supports com‐
695 munication with nodes that are reachable over the network. If
696 this field is not set, then the provider only supports communi‐
697 cation between processes that execute on the same node – a
698 shared memory provider, for example.
699
700 FI_SHARED_AV
701 Indicates that the domain supports the ability to share address
702 vectors among multiple processes using the named address vector
703 feature.
704
705 See fi_getinfo(3) for a discussion on primary versus secondary capabil‐
706 ities. All domain capabilities are considered secondary capabilities.
707
708 mode
709 The operational mode bit related to using the domain.
710
711 FI_RESTRICTED_COMP
712 This bit indicates that the domain limits completion queues and
713 counters to only be used with endpoints, transmit contexts, and
714 receive contexts that have the same set of capability flags.
715
716 Default authorization key (auth_key)
717 The default authorization key to associate with endpoint and memory
718 registrations created within the domain. This field is ignored unless
719 the fabric is opened with API version 1.5 or greater.
720
721 Default authorization key length (auth_key_size)
722 The length in bytes of the default authorization key for the domain.
723 If set to 0, then no authorization key will be associated with end‐
724 points and memory registrations created within the domain unless speci‐
725 fied in the endpoint or memory registration attributes. This field is
726 ignored unless the fabric is opened with API version 1.5 or greater.
727
728 Max Error Data Size (max_err_data)
729 : The maximum amount of error data, in bytes, that may be returned as
730 part of a completion or event queue error. This value corresponds to
731 the err_data_size field in struct fi_cq_err_entry and struct
732 fi_eq_err_entry.
733
734 Memory Regions Count (mr_cnt)
735 The optimal number of memory regions supported by the domain, or end‐
736 point if the mr_mode FI_MR_ENDPOINT bit has been set. The mr_cnt value
737 may be a fixed value of the maximum number of MRs supported by the un‐
738 derlying hardware, or may be a dynamic value, based on the default at‐
739 tributes of the domain, such as the supported memory registration
740 modes. Applications can set the mr_cnt on input to fi_getinfo, in or‐
741 der to indicate their memory registration requirements. Doing so may
742 allow the provider to optimize any memory registration cache or lookup
743 tables.
744
745 Traffic Class (tclass)
746 This specifies the default traffic class that will be associated any
747 endpoints created within the domain. See fi_endpoint(3) for additional
748 information.
749
751 Returns 0 on success. On error, a negative value corresponding to fab‐
752 ric errno is returned. Fabric errno values are defined in rdma/fi_er‐
753 rno.h.
754
756 Users should call fi_close to release all resources allocated to the
757 fabric domain.
758
759 The following fabric resources are associated with domains: active end‐
760 points, memory regions, completion event queues, and address vectors.
761
762 Domain attributes reflect the limitations and capabilities of the un‐
763 derlying hardware and/or software provider. They do not reflect system
764 limitations, such as the number of physical pages that an application
765 may pin or number of file descriptors that the application may open.
766 As a result, the reported maximums may not be achievable, even on a
767 lightly loaded systems, without an administrator configuring system re‐
768 sources appropriately for the installed provider(s).
769
771 fi_getinfo(3), fi_endpoint(3), fi_av(3), fi_eq(3), fi_mr(3) fi_peer(3)
772
774 OpenFabrics.
775
776
777
778Libfabric Programmer’s Manual 2022-12-09 fi_domain(3)