1fi_domain(3) Libfabric v1.15.1 fi_domain(3)
2
3
4
6 fi_domain - Open a fabric access domain
7
9 #include <rdma/fabric.h>
10
11 #include <rdma/fi_domain.h>
12
13 int fi_domain(struct fid_fabric *fabric, struct fi_info *info,
14 struct fid_domain **domain, void *context);
15
16 int fi_close(struct fid *domain);
17
18 int fi_domain_bind(struct fid_domain *domain, struct fid *eq,
19 uint64_t flags);
20
21 int fi_open_ops(struct fid *domain, const char *name, uint64_t flags,
22 void **ops, void *context);
23
24 int fi_set_ops(struct fid *domain, const char *name, uint64_t flags,
25 void *ops, void *context);
26
28 fabric Fabric domain
29
30 info Fabric information, including domain capabilities and at‐
31 tributes.
32
33 domain An opened access domain.
34
35 context
36 User specified context associated with the domain. This context
37 is returned as part of any asynchronous event associated with
38 the domain.
39
40 eq Event queue for asynchronous operations initiated on the domain.
41
42 name Name associated with an interface.
43
44 ops Fabric interface operations.
45
47 An access domain typically refers to a physical or virtual NIC or hard‐
48 ware port; however, a domain may span across multiple hardware compo‐
49 nents for fail-over or data striping purposes. A domain defines the
50 boundary for associating different resources together. Fabric re‐
51 sources belonging to the same domain may share resources.
52
53 fi_domain
54 Opens a fabric access domain, also referred to as a resource domain.
55 Fabric domains are identified by a name. The properties of the opened
56 domain are specified using the info parameter.
57
58 fi_open_ops
59 fi_open_ops is used to open provider specific interfaces. Provider in‐
60 terfaces may be used to access low-level resources and operations that
61 are specific to the opened resource domain. The details of domain in‐
62 terfaces are outside the scope of this documentation.
63
64 fi_set_ops
65 fi_set_ops assigns callbacks that a provider should invoke in place of
66 performing selected tasks. This allows users to modify or control a
67 provider’s default behavior. Conceptually, it allows the user to hook
68 specific functions used by a provider and replace it with their own.
69
70 The operations being modified are identified using a well-known charac‐
71 ter string, passed as the name parameter. The format of the ops param‐
72 eter is dependent upon the name value. The ops parameter will refer‐
73 ence a structure containing the callbacks and other fields needed by
74 the provider to invoke the user’s functions.
75
76 If a provider accepts the override, it will return FI_SUCCESS. If the
77 override is unknown or not supported, the provider will return
78 -FI_ENOSYS. Overrides should be set prior to allocating resources on
79 the domain.
80
81 The following fi_set_ops operations and corresponding callback struc‐
82 tures are defined.
83
84 FI_SET_OPS_HMEM_OVERRIDE – Heterogeneous Memory Overrides
85
86 HMEM override allows users to override HMEM related operations a
87 provider may perform. Currently, the scope of the HMEM override is to
88 allow a user to define the memory movement functions a provider should
89 use when accessing a user buffer. The user-defined memory movement
90 functions need to account for all the different HMEM iface types a
91 provider may encounter.
92
93 All objects allocated against a domain will inherit this override.
94
95 The following is the HMEM override operation name and structure.
96
97 #define FI_SET_OPS_HMEM_OVERRIDE "hmem_override_ops"
98
99 struct fi_hmem_override_ops {
100 size_t size;
101
102 ssize_t (*copy_from_hmem_iov)(void *dest, size_t size,
103 enum fi_hmem_iface iface, uint64_t device, const struct iovec *hmem_iov,
104 size_t hmem_iov_count, uint64_t hmem_iov_offset);
105
106 ssize_t (*copy_to_hmem_iov)(enum fi_hmem_iface iface, uint64_t device,
107 const struct iovec *hmem_iov, size_t hmem_iov_count,
108 uint64_t hmem_iov_offset, const void *src, size_t size);
109 };
110
111 All fields in struct fi_hmem_override_ops must be set (non-null) to a
112 valid value.
113
114 size This should be set to the sizeof(struct fi_hmem_override_ops).
115 The size field is used for forward and backward compatibility
116 purposes.
117
118 copy_from_hmem_iov
119 Copy data from the device/hmem to host memory. This function
120 should return a negative fi_errno on error, or the number of
121 bytes copied on success.
122
123 copy_to_hmem_iov
124 Copy data from host memory to the device/hmem. This function
125 should return a negative fi_errno on error, or the number of
126 bytes copied on success.
127
128 fi_domain_bind
129 Associates an event queue with the domain. An event queue bound to a
130 domain will be the default EQ associated with asynchronous control
131 events that occur on the domain or active endpoints allocated on a do‐
132 main. This includes CM events. Endpoints may direct their control
133 events to alternate EQs by binding directly with the EQ.
134
135 Binding an event queue to a domain with the FI_REG_MR flag indicates
136 that the provider should perform all memory registration operations
137 asynchronously, with the completion reported through the event queue.
138 If an event queue is not bound to the domain with the FI_REG_MR flag,
139 then memory registration requests complete synchronously.
140
141 See fi_av_bind(3), fi_ep_bind(3), fi_mr_bind(3), fi_pep_bind(3), and
142 fi_scalable_ep_bind(3) for more information.
143
144 fi_close
145 The fi_close call is used to release all resources associated with a
146 domain or interface. All objects associated with the opened domain
147 must be released prior to calling fi_close, otherwise the call will re‐
148 turn -FI_EBUSY.
149
151 The fi_domain_attr structure defines the set of attributes associated
152 with a domain.
153
154 struct fi_domain_attr {
155 struct fid_domain *domain;
156 char *name;
157 enum fi_threading threading;
158 enum fi_progress control_progress;
159 enum fi_progress data_progress;
160 enum fi_resource_mgmt resource_mgmt;
161 enum fi_av_type av_type;
162 int mr_mode;
163 size_t mr_key_size;
164 size_t cq_data_size;
165 size_t cq_cnt;
166 size_t ep_cnt;
167 size_t tx_ctx_cnt;
168 size_t rx_ctx_cnt;
169 size_t max_ep_tx_ctx;
170 size_t max_ep_rx_ctx;
171 size_t max_ep_stx_ctx;
172 size_t max_ep_srx_ctx;
173 size_t cntr_cnt;
174 size_t mr_iov_limit;
175 uint64_t caps;
176 uint64_t mode;
177 uint8_t *auth_key;
178 size_t auth_key_size;
179 size_t max_err_data;
180 size_t mr_cnt;
181 uint32_t tclass;
182 };
183
184 domain
185 On input to fi_getinfo, a user may set this to an opened domain in‐
186 stance to restrict output to the given domain. On output from fi_get‐
187 info, if no domain was specified, but the user has an opened instance
188 of the named domain, this will reference the first opened instance. If
189 no instance has been opened, this field will be NULL.
190
191 The domain instance returned by fi_getinfo should only be considered
192 valid if the application does not close any domain instances from an‐
193 other thread while fi_getinfo is being processed.
194
195 Name
196 The name of the access domain.
197
198 Multi-threading Support (threading)
199 The threading model specifies the level of serialization required of an
200 application when using the libfabric data transfer interfaces. Control
201 interfaces are always considered thread safe, and may be accessed by
202 multiple threads. Applications which can guarantee serialization in
203 their access of provider allocated resources and interfaces enables a
204 provider to eliminate lower-level locks.
205
206 FI_THREAD_COMPLETION
207 The completion threading model is intended for providers that
208 make use of manual progress. Applications must serialize access
209 to all objects that are associated through the use of having a
210 shared completion structure. This includes endpoint, transmit
211 context, receive context, completion queue, counter, wait set,
212 and poll set objects.
213
214 For example, threads must serialize access to an endpoint and its bound
215 completion queue(s) and/or counters. Access to endpoints that share
216 the same completion queue must also be serialized.
217
218 The use of FI_THREAD_COMPLETION can increase parallelism over
219 FI_THREAD_SAFE, but requires the use of isolated resources.
220
221 FI_THREAD_DOMAIN
222 A domain serialization model requires applications to serialize
223 access to all objects belonging to a domain.
224
225 FI_THREAD_ENDPOINT
226 The endpoint threading model is similar to FI_THREAD_FID, but
227 with the added restriction that serialization is required when
228 accessing the same endpoint, even if multiple transmit and re‐
229 ceive contexts are used. Conceptually, FI_THREAD_ENDPOINT maps
230 well to providers that implement fabric services in hardware but
231 use a single command queue to access different data flows.
232
233 FI_THREAD_FID
234 A fabric descriptor (FID) serialization model requires applica‐
235 tions to serialize access to individual fabric resources associ‐
236 ated with data transfer operations and completions. Multiple
237 threads must be serialized when accessing the same endpoint,
238 transmit context, receive context, completion queue, counter,
239 wait set, or poll set. Serialization is required only by
240 threads accessing the same object.
241
242 For example, one thread may be initiating a data transfer on an end‐
243 point, while another thread reads from a completion queue associated
244 with the endpoint.
245
246 Serialization to endpoint access is only required when accessing the
247 same endpoint data flow. Multiple threads may initiate transfers on
248 different transmit contexts of the same endpoint without serializing,
249 and no serialization is required between the submission of data trans‐
250 mit requests and data receive operations.
251
252 In general, FI_THREAD_FID allows the provider to be implemented without
253 needing internal locking when handling data transfers. Conceptually,
254 FI_THREAD_FID maps well to providers that implement fabric services in
255 hardware and provide separate command queues to different data flows.
256
257 FI_THREAD_SAFE
258 A thread safe serialization model allows a multi-threaded appli‐
259 cation to access any allocated resources through any interface
260 without restriction. All providers are required to support
261 FI_THREAD_SAFE.
262
263 FI_THREAD_UNSPEC
264 This value indicates that no threading model has been defined.
265 It may be used on input hints to the fi_getinfo call. When
266 specified, providers will return a threading model that allows
267 for the greatest level of parallelism.
268
269 Progress Models (control_progress / data_progress)
270 Progress is the ability of the underlying implementation to complete
271 processing of an asynchronous request. In many cases, the processing
272 of an asynchronous request requires the use of the host processor. For
273 example, a received message may need to be matched with the correct
274 buffer, or a timed out request may need to be retransmitted. For per‐
275 formance reasons, it may be undesirable for the provider to allocate a
276 thread for this purpose, which will compete with the application
277 threads.
278
279 Control progress indicates the method that the provider uses to make
280 progress on asynchronous control operations. Control operations are
281 functions which do not directly involve the transfer of application da‐
282 ta between endpoints. They include address vector, memory registra‐
283 tion, and connection management routines.
284
285 Data progress indicates the method that the provider uses to make
286 progress on data transfer operations. This includes message queue,
287 RMA, tagged messaging, and atomic operations, along with their comple‐
288 tion processing.
289
290 Progress frequently requires action being taken at both the transmit‐
291 ting and receiving sides of an operation. This is often a requirement
292 for reliable transfers, as a result of retry and acknowledgement pro‐
293 cessing.
294
295 To balance between performance and ease of use, two progress models are
296 defined.
297
298 FI_PROGRESS_AUTO
299 This progress model indicates that the provider will make for‐
300 ward progress on an asynchronous operation without further in‐
301 tervention by the application. When FI_PROGRESS_AUTO is provid‐
302 ed as output to fi_getinfo in the absence of any progress hints,
303 it often indicates that the desired functionality is implemented
304 by the provider hardware or is a standard service of the operat‐
305 ing system.
306
307 It is recommended that providers support FI_PROGRESS_AUTO. However, if
308 a provider does not natively support automatic progress, forcing the
309 use of FI_PROGRESS_AUTO may result in threads being allocated below the
310 fabric interfaces.
311
312 Note that prior versions of the library required providers to support
313 FI_PROGRESS_AUTO. However, in some cases progress threads cannot be
314 blocked when communication is idle, which results in threads spinning
315 in progress functions. As a result, those providers only supported
316 FI_PROGRESS_MANUAL.
317
318 FI_PROGRESS_MANUAL
319 This progress model indicates that the provider requires the use
320 of an application thread to complete an asynchronous request.
321 When manual progress is set, the provider will attempt to ad‐
322 vance an asynchronous operation forward when the application at‐
323 tempts to wait on or read an event queue, completion queue, or
324 counter where the completed operation will be reported.
325 Progress also occurs when the application processes a poll or
326 wait set that has been associated with the event or completion
327 queue.
328
329 Only wait operations defined by the fabric interface will result in an
330 operation progressing. Operating system or external wait functions,
331 such as select, poll, or pthread routines, cannot.
332
333 Manual progress requirements not only apply to endpoints that initiate
334 transmit operations, but also to endpoints that may be the target of
335 such operations. This holds true even if the target endpoint will not
336 generate completion events for the operations. For example, an end‐
337 point that acts purely as the target of RMA or atomic operations that
338 uses manual progress may still need application assistance to process
339 received operations.
340
341 FI_PROGRESS_UNSPEC
342 This value indicates that no progress model has been defined.
343 It may be used on input hints to the fi_getinfo call.
344
345 Resource Management (resource_mgmt)
346 Resource management (RM) is provider and protocol support to protect
347 against overrunning local and remote resources. This includes local
348 and remote transmit contexts, receive contexts, completion queues, and
349 source and target data buffers.
350
351 When enabled, applications are given some level of protection against
352 overrunning provider queues and local and remote data buffers. Such
353 support may be built directly into the hardware and/or network proto‐
354 col, but may also require that checks be enabled in the provider soft‐
355 ware. By disabling resource management, an application assumes all re‐
356 sponsibility for preventing queue and buffer overruns, but doing so may
357 allow a provider to eliminate internal synchronization calls, such as
358 atomic variables or locks.
359
360 It should be noted that even if resource management is disabled, the
361 provider implementation and protocol may still provide some level of
362 protection against overruns. However, such protection is not guaran‐
363 teed. The following values for resource management are defined.
364
365 FI_RM_DISABLED
366 The provider is free to select an implementation and protocol
367 that does not protect against resource overruns. The applica‐
368 tion is responsible for resource protection.
369
370 FI_RM_ENABLED
371 Resource management is enabled for this provider domain.
372
373 FI_RM_UNSPEC
374 This value indicates that no resource management model has been
375 defined. It may be used on input hints to the fi_getinfo call.
376
377 The behavior of the various resource management options depends on
378 whether the endpoint is reliable or unreliable, as well as provider and
379 protocol specific implementation details, as shown in the following ta‐
380 ble. The table assumes that all peers enable or disable RM the same.
381
382 Resource DGRAM EP-no RM DGRAM EP-with RM RDM/MSG EP-no RDM/MSG EP-with
383 RM RM
384 ────────────────────────────────────────────────────────────────────────────────────
385 Tx Ctx undefined error EAGAIN undefined error EAGAIN
386 Rx Ctx undefined error EAGAIN undefined error EAGAIN
387 Tx CQ undefined error EAGAIN undefined error EAGAIN
388 Rx CQ undefined error EAGAIN undefined error EAGAIN
389 Target dropped dropped transmit error retried
390 EP
391 No Rx dropped dropped transmit error retried
392 Buffer
393 Rx Buf truncate or drop truncate or drop truncate or er‐ truncate or er‐
394 Overrun ror ror
395 Un‐ not applicable not applicable transmit error transmit error
396 matched
397 RMA
398 RMA not applicable not applicable transmit error transmit error
399 Overrun
400
401 The resource column indicates the resource being accessed by a data
402 transfer operation.
403
404 Tx Ctx / Rx Ctx
405 Refers to the transmit/receive contexts when a data transfer op‐
406 eration is submitted. When RM is enabled, attempting to submit
407 a request will fail if the context is full. If RM is disabled,
408 an undefined error (provider specific) will occur. Such errors
409 should be considered fatal to the context, and applications must
410 take steps to avoid queue overruns.
411
412 Tx CQ / Rx CQ
413 Refers to the completion queue associated with the Tx or Rx con‐
414 text when a local operation completes. When RM is disabled, ap‐
415 plications must take care to ensure that completion queues do
416 not get overrun. When an overrun occurs, an undefined, but fa‐
417 tal, error will occur affecting all endpoints associated with
418 the CQ. Overruns can be avoided by sizing the CQs appropriately
419 or by deferring the posting of a data transfer operation unless
420 CQ space is available to store its completion. When RM is en‐
421 abled, providers may use different mechanisms to prevent CQ
422 overruns. This includes failing (returning -FI_EAGAIN) the
423 posting of operations that could result in CQ overruns, or in‐
424 ternally retrying requests (which will be hidden from the appli‐
425 cation). See notes at the end of this section regarding CQ re‐
426 source management restrictions.
427
428 Target EP / No Rx Buffer
429 Target EP refers to resources associated with the endpoint that
430 is the target of a transmit operation. This includes the target
431 endpoint’s receive queue, posted receive buffers (no Rx buf‐
432 fers), the receive side completion queue, and other related
433 packet processing queues. The defined behavior is that seen by
434 the initiator of a request. For FI_EP_DGRAM endpoints, if the
435 target EP queues are unable to accept incoming messages, re‐
436 ceived messages will be dropped. For reliable endpoints, if RM
437 is disabled, the transmit operation will complete in error. A
438 provider may choose to return an error completion with the error
439 code FI_ENORX for that transmit operation so that it can be re‐
440 tried. If RM is enabled, the provider will internally retry the
441 operation.
442
443 Rx Buffer Overrun
444 This refers to buffers posted to receive incoming tagged or un‐
445 tagged messages, with the behavior defined from the viewpoint of
446 the sender. The behavior for handling received messages that
447 are larger than the buffers provided by the application is
448 provider specific. Providers may either truncate the message
449 and report a successful completion, or fail the operation. For
450 datagram endpoints, failed sends will result in the message be‐
451 ing dropped. For reliable endpoints, send operations may com‐
452 plete successfully, yet be truncated at the receive side. This
453 can occur when the target side buffers received data until an
454 application buffer is made available. The completion status may
455 also be dependent upon the completion model selected byt the ap‐
456 plication (e.g. FI_DELIVERY_COMPLETE versus FI_TRANSMIT_COM‐
457 PLETE).
458
459 Unmatched RMA / RMA Overrun
460 Unmatched RMA and RMA overruns deal with the processing of RMA
461 and atomic operations. Unlike send operations, RMA operations
462 that attempt to access a memory address that is either not reg‐
463 istered for such operations, or attempt to access outside of the
464 target memory region will fail, resulting in a transmit error.
465
466 When a resource management error occurs on an endpoint, the endpoint is
467 transitioned into a disabled state. Any operations which have not al‐
468 ready completed will fail and be discarded. For connectionless end‐
469 points, the endpoint must be re-enabled before it will accept new data
470 transfer operations. For connected endpoints, the connection is torn
471 down and must be re-established.
472
473 There is one notable restriction on the protections offered by resource
474 management. This occurs when resource management is enabled on an end‐
475 point that has been bound to completion queue(s) using the FI_SELEC‐
476 TIVE_COMPLETION flag. Operations posted to such an endpoint may speci‐
477 fy that a successful completion should not generate a entry on the cor‐
478 responding completion queue. (I.e. the operation leaves the FI_COM‐
479 PLETION flag unset). In such situations, the provider is not required
480 to reserve an entry in the completion queue to handle the case where
481 the operation fails and does generate a CQ entry, which would effec‐
482 tively require tracking the operation to completion. Applications con‐
483 cerned with avoiding CQ overruns in the occurrence of errors must en‐
484 sure that there is sufficient space in the CQ to report failed opera‐
485 tions. This can typically be achieved by sizing the CQ to at least the
486 same size as the endpoint queue(s) that are bound to it.
487
488 AV Type (av_type)
489 Specifies the type of address vectors that are usable with this domain.
490 For additional details on AV type, see fi_av(3). The following values
491 may be specified.
492
493 FI_AV_MAP
494 Only address vectors of type AV map are requested or supported.
495
496 FI_AV_TABLE
497 Only address vectors of type AV index are requested or support‐
498 ed.
499
500 FI_AV_UNSPEC
501 Any address vector format is requested and supported.
502
503 Address vectors are only used by connectionless endpoints. Applica‐
504 tions that require the use of a specific type of address vector should
505 set the domain attribute av_type to the necessary value when calling
506 fi_getinfo. The value FI_AV_UNSPEC may be used to indicate that the
507 provider can support either address vector format. In this case, a
508 provider may return FI_AV_UNSPEC to indicate that either format is sup‐
509 portable, or may return another AV type to indicate the optimal AV type
510 supported by this domain.
511
512 Memory Registration Mode (mr_mode)
513 Defines memory registration specific mode bits used with this domain.
514 Full details on MR mode options are available in fi_mr(3). The follow‐
515 ing values may be specified.
516
517 FI_MR_ALLOCATED
518 Indicates that memory registration occurs on allocated data buf‐
519 fers, and physical pages must back all virtual addresses being
520 registered.
521
522 FI_MR_COLLECTIVE
523 Requires data buffers passed to collective operations be explic‐
524 itly registered for collective operations using the FI_COLLEC‐
525 TIVE flag.
526
527 FI_MR_ENDPOINT
528 Memory registration occurs at the endpoint level, rather than
529 domain.
530
531 FI_MR_LOCAL
532 The provider is optimized around having applications register
533 memory for locally accessed data buffers. Data buffers used in
534 send and receive operations and as the source buffer for RMA and
535 atomic operations must be registered by the application for ac‐
536 cess domains opened with this capability.
537
538 FI_MR_MMU_NOTIFY
539 Indicates that the application is responsible for notifying the
540 provider when the page tables referencing a registered memory
541 region may have been updated.
542
543 FI_MR_PROV_KEY
544 Memory registration keys are selected and returned by the
545 provider.
546
547 FI_MR_RAW
548 The provider requires additional setup as part of their memory
549 registration process. This mode is required by providers that
550 use a memory key that is larger than 64-bits.
551
552 FI_MR_RMA_EVENT
553 Indicates that the memory regions associated with completion
554 counters must be explicitly enabled after being bound to any
555 counter.
556
557 FI_MR_UNSPEC
558 Defined for compatibility – library versions 1.4 and earlier.
559 Setting mr_mode to 0 indicates that FI_MR_BASIC or FI_MR_SCAL‐
560 ABLE are requested and supported.
561
562 FI_MR_VIRT_ADDR
563 Registered memory regions are referenced by peers using the vir‐
564 tual address of the registered memory region, rather than a
565 0-based offset.
566
567 FI_MR_BASIC
568 Defined for compatibility – library versions 1.4 and earlier.
569 Only basic memory registration operations are requested or sup‐
570 ported. This mode is equivalent to the FI_MR_VIRT_ADDR,
571 FI_MR_ALLOCATED, and FI_MR_PROV_KEY flags being set in later li‐
572 brary versions. This flag may not be used in conjunction with
573 other mr_mode bits.
574
575 FI_MR_SCALABLE
576 Defined for compatibility – library versions 1.4 and earlier.
577 Only scalable memory registration operations are requested or
578 supported. Scalable registration uses offset based addressing,
579 with application selectable memory keys. For library versions
580 1.5 and later, this is the default if no mr_mode bits are set.
581 This flag may not be used in conjunction with other mr_mode
582 bits.
583
584 Buffers used in data transfer operations may require notifying the
585 provider of their use before a data transfer can occur. The mr_mode
586 field indicates the type of memory registration that is required, and
587 when registration is necessary. Applications that require the use of a
588 specific registration mode should set the domain attribute mr_mode to
589 the necessary value when calling fi_getinfo. The value FI_MR_UNSPEC
590 may be used to indicate support for any registration mode.
591
592 MR Key Size (mr_key_size)
593 Size of the memory region remote access key, in bytes. Applications
594 that request their own MR key must select a value within the range
595 specified by this value. Key sizes larger than 8 bytes require using
596 the FI_RAW_KEY mode bit.
597
598 CQ Data Size (cq_data_size)
599 Applications may include a small message with a data transfer that is
600 placed directly into a remote completion queue as part of a completion
601 event. This is referred to as remote CQ data (sometimes referred to as
602 immediate data). This field indicates the number of bytes that the
603 provider supports for remote CQ data. If supported (non-zero value is
604 returned), the minimum size of remote CQ data must be at least 4-bytes.
605
606 Completion Queue Count (cq_cnt)
607 The optimal number of completion queues supported by the domain, rela‐
608 tive to any specified or default CQ attributes. The cq_cnt value may
609 be a fixed value of the maximum number of CQs supported by the underly‐
610 ing hardware, or may be a dynamic value, based on the default at‐
611 tributes of an allocated CQ, such as the CQ size and data format.
612
613 Endpoint Count (ep_cnt)
614 The total number of endpoints supported by the domain, relative to any
615 specified or default endpoint attributes. The ep_cnt value may be a
616 fixed value of the maximum number of endpoints supported by the under‐
617 lying hardware, or may be a dynamic value, based on the default at‐
618 tributes of an allocated endpoint, such as the endpoint capabilities
619 and size. The endpoint count is the number of addressable endpoints
620 supported by the provider. Providers return capability limits based on
621 configured hardware maximum capabilities. Providers cannot predict all
622 possible system limitations without posteriori knowledge acquired dur‐
623 ing runtime that will further limit these hardware maximums (e.g. ap‐
624 plication memory consumption, FD usage, etc.).
625
626 Transmit Context Count (tx_ctx_cnt)
627 The number of outbound command queues optimally supported by the
628 provider. For a low-level provider, this represents the number of com‐
629 mand queues to the hardware and/or the number of parallel transmit en‐
630 gines effectively supported by the hardware and caches. Applications
631 which allocate more transmit contexts than this value will end up shar‐
632 ing underlying resources. By default, there is a single transmit con‐
633 text associated with each endpoint, but in an advanced usage model, an
634 endpoint may be configured with multiple transmit contexts.
635
636 Receive Context Count (rx_ctx_cnt)
637 The number of inbound processing queues optimally supported by the
638 provider. For a low-level provider, this represents the number hard‐
639 ware queues that can be effectively utilized for processing incoming
640 packets. Applications which allocate more receive contexts than this
641 value will end up sharing underlying resources. By default, a single
642 receive context is associated with each endpoint, but in an advanced
643 usage model, an endpoint may be configured with multiple receive con‐
644 texts.
645
646 Maximum Endpoint Transmit Context (max_ep_tx_ctx)
647 The maximum number of transmit contexts that may be associated with an
648 endpoint.
649
650 Maximum Endpoint Receive Context (max_ep_rx_ctx)
651 The maximum number of receive contexts that may be associated with an
652 endpoint.
653
654 Maximum Sharing of Transmit Context (max_ep_stx_ctx)
655 The maximum number of endpoints that may be associated with a shared
656 transmit context.
657
658 Maximum Sharing of Receive Context (max_ep_srx_ctx)
659 The maximum number of endpoints that may be associated with a shared
660 receive context.
661
662 Counter Count (cntr_cnt)
663 The optimal number of completion counters supported by the domain. The
664 cq_cnt value may be a fixed value of the maximum number of counters
665 supported by the underlying hardware, or may be a dynamic value, based
666 on the default attributes of the domain.
667
668 MR IOV Limit (mr_iov_limit)
669 This is the maximum number of IO vectors (scatter-gather elements) that
670 a single memory registration operation may reference.
671
672 Capabilities (caps)
673 Domain level capabilities. Domain capabilities indicate domain level
674 features that are supported by the provider.
675
676 FI_LOCAL_COMM
677 At a conceptual level, this field indicates that the underlying
678 device supports loopback communication. More specifically, this
679 field indicates that an endpoint may communicate with other end‐
680 points that are allocated from the same underlying named domain.
681 If this field is not set, an application may need to use an al‐
682 ternate domain or mechanism (e.g. shared memory) to communicate
683 with peers that execute on the same node.
684
685 FI_REMOTE_COMM
686 This field indicates that the underlying provider supports com‐
687 munication with nodes that are reachable over the network. If
688 this field is not set, then the provider only supports communi‐
689 cation between processes that execute on the same node – a
690 shared memory provider, for example.
691
692 FI_SHARED_AV
693 Indicates that the domain supports the ability to share address
694 vectors among multiple processes using the named address vector
695 feature.
696
697 See fi_getinfo(3) for a discussion on primary versus secondary capabil‐
698 ities. All domain capabilities are considered secondary capabilities.
699
700 mode
701 The operational mode bit related to using the domain.
702
703 FI_RESTRICTED_COMP
704 This bit indicates that the domain limits completion queues and
705 counters to only be used with endpoints, transmit contexts, and
706 receive contexts that have the same set of capability flags.
707
708 Default authorization key (auth_key)
709 The default authorization key to associate with endpoint and memory
710 registrations created within the domain. This field is ignored unless
711 the fabric is opened with API version 1.5 or greater.
712
713 Default authorization key length (auth_key_size)
714 The length in bytes of the default authorization key for the domain.
715 If set to 0, then no authorization key will be associated with end‐
716 points and memory registrations created within the domain unless speci‐
717 fied in the endpoint or memory registration attributes. This field is
718 ignored unless the fabric is opened with API version 1.5 or greater.
719
720 Max Error Data Size (max_err_data)
721 : The maximum amount of error data, in bytes, that may be returned as
722 part of a completion or event queue error. This value corresponds to
723 the err_data_size field in struct fi_cq_err_entry and struct
724 fi_eq_err_entry.
725
726 Memory Regions Count (mr_cnt)
727 The optimal number of memory regions supported by the domain, or end‐
728 point if the mr_mode FI_MR_ENDPOINT bit has been set. The mr_cnt value
729 may be a fixed value of the maximum number of MRs supported by the un‐
730 derlying hardware, or may be a dynamic value, based on the default at‐
731 tributes of the domain, such as the supported memory registration
732 modes. Applications can set the mr_cnt on input to fi_getinfo, in or‐
733 der to indicate their memory registration requirements. Doing so may
734 allow the provider to optimize any memory registration cache or lookup
735 tables.
736
737 Traffic Class (tclass)
738 This specifies the default traffic class that will be associated any
739 endpoints created within the domain. See fi_endpoint(3) for additional
740 information.
741
743 Returns 0 on success. On error, a negative value corresponding to fab‐
744 ric errno is returned. Fabric errno values are defined in rdma/fi_er‐
745 rno.h.
746
748 Users should call fi_close to release all resources allocated to the
749 fabric domain.
750
751 The following fabric resources are associated with domains: active end‐
752 points, memory regions, completion event queues, and address vectors.
753
754 Domain attributes reflect the limitations and capabilities of the un‐
755 derlying hardware and/or software provider. They do not reflect system
756 limitations, such as the number of physical pages that an application
757 may pin or number of file descriptors that the application may open.
758 As a result, the reported maximums may not be achievable, even on a
759 lightly loaded systems, without an administrator configuring system re‐
760 sources appropriately for the installed provider(s).
761
763 fi_getinfo(3), fi_endpoint(3), fi_av(3), fi_eq(3), fi_mr(3)
764
766 OpenFabrics.
767
768
769
770Libfabric Programmer’s Manual 2022-03-30 fi_domain(3)