1fi_setup(7) Libfabric v1.18.1 fi_setup(7)
2
3
4
6 fi_setup - libfabric setup and initialization
7
9 A full description of the libfabric API is documented in the relevant
10 man pages. This section provides an introduction to select interfaces,
11 including how they may be used. It does not attempt to capture all
12 subtleties or use cases, nor describe all possible data structures or
13 fields. However, it is useful for new developers trying to kick-start
14 using libfabric.
15
17 The fi_getinfo() call is one of the first calls that applications in‐
18 voke. It is designed to be easy to use for simple applications, but
19 extensible enough to configure a network for optimal performance. It
20 serves several purposes. First, it abstracts away network implementa‐
21 tion and addressing details. Second, it allows an application to spec‐
22 ify which features they require of the network. Last, it provides a
23 mechanism for a provider to report how an application can use the net‐
24 work in order to achieve the best performance. fi_getinfo() is loosely
25 based on the getaddrinfo() call.
26
27 /* API prototypes */
28 struct fi_info *fi_allocinfo(void);
29
30 int fi_getinfo(int version, const char *node, const char *service,
31 uint64_t flags, struct fi_info *hints, struct fi_info **info);
32
33 /* Sample initialization code flow */
34 struct fi_info *hints, *info;
35
36 hints = fi_allocinfo();
37
38 /* hints will point to a cleared fi_info structure
39 * Initialize hints here to request specific network capabilities
40 */
41
42 fi_getinfo(FI_VERSION(1, 16), NULL, NULL, 0, hints, &info);
43 fi_freeinfo(hints);
44
45 /* Use the returned info structure to allocate fabric resources */
46
47 The hints parameter is the key for requesting fabric services. The
48 fi_info structure contains several data fields, plus pointers to a wide
49 variety of attributes. The fi_allocinfo() call simplifies the creation
50 of an fi_info structure and is strongly recommended for use. In this
51 example, the application is merely attempting to get a list of what
52 providers are available in the system and the features that they sup‐
53 port. Note that the API is designed to be extensible. Versioning in‐
54 formation is provided as part of the fi_getinfo() call. The version is
55 used by libfabric to determine what API features the application is
56 aware of. In this case, the application indicates that it can properly
57 handle any feature that was defined for the 1.16 release (or earlier).
58
59 Applications should always hard code the version that they are written
60 for into the fi_getinfo() call. This ensures that newer versions of
61 libfabric will provide backwards compatibility with that used by the
62 application. Newer versions of libfabric must support applications
63 that were compiled against an older version of the library. It must
64 also support applications written against header files from an older
65 library version, but re-compiled against newer header files. Among
66 other things, the version parameter allows libfabric to determine if an
67 application is aware of new fields that may have been added to struc‐
68 tures, or if the data in those fields may be uninitialized.
69
70 Typically, an application will initialize the hints parameter to list
71 the features that it will use.
72
73 /* Taking a peek at the contents of fi_info */
74 struct fi_info {
75 struct fi_info *next;
76 uint64_t caps;
77 uint64_t mode;
78 uint32_t addr_format;
79 size_t src_addrlen;
80 size_t dest_addrlen;
81 void *src_addr;
82 void *dest_addr;
83 fid_t handle;
84 struct fi_tx_attr *tx_attr;
85 struct fi_rx_attr *rx_attr;
86 struct fi_ep_attr *ep_attr;
87 struct fi_domain_attr *domain_attr;
88 struct fi_fabric_attr *fabric_attr;
89 struct fid_nic *nic;
90 };
91
92 The fi_info structure references several different attributes, which
93 correspond to the different libfabric objects that an application allo‐
94 cates. For basic applications, modifying or accessing most attribute
95 fields are unnecessary. Many applications will only need to deal with
96 a few fields of fi_info, most notably the endpoint type, capability
97 (caps) bits, and mode bits. These are defined in more detail below.
98
99 On success, the fi_getinfo() function returns a linked list of fi_info
100 structures. Each entry in the list will meet the conditions specified
101 through the hints parameter. The returned entries may come from dif‐
102 ferent network providers, or may differ in the returned attributes.
103 For example, if hints does not specify a particular endpoint type,
104 there may be an entry for each of the three endpoint types. As a gen‐
105 eral rule, libfabric attempts to return the list of fi_info structures
106 in order from most desirable to least. High-performance network
107 providers are listed before more generic providers.
108
109 Capabilities (fi_info::caps)
110 The fi_info caps field is used to specify the features and services
111 that the application requires of the network. This field is a bit-mask
112 of desired capabilities. There are capability bits for each of the da‐
113 ta transfer services previously mentioned: FI_MSG, FI_TAGGED, FI_RMA,
114 FI_ATOMIC, and FI_COLLECTIVE. Applications should set each bit for
115 each set of operations that it will use. These bits are often the only
116 caps bits set by an application.
117
118 Capabilities are grouped into three general categories: primary, sec‐
119 ondary, and primary modifiers. Primary capabilities must explicitly be
120 requested by an application, and a provider must enable support for on‐
121 ly those primary capabilities which were selected. This is required
122 for both performance and security reasons. Primary modifiers are used
123 to limit a primary capability, such as restricting an endpoint to being
124 send-only.
125
126 Secondary capabilities may optionally be requested by an application.
127 If requested, a provider must support a capability if it is asked for
128 or fail the fi_getinfo request. A provider may optionally report non-
129 requested secondary capabilities if doing so would not compromise per‐
130 formance or security. That is, a provider may grant an application a
131 secondary capability, whether the application. The most commonly ac‐
132 cessed secondary capability bits indicate if provider communication is
133 restricted to the local node Ifor example, the shared memory provider
134 only supports local communication) and/or remote nodes (which can be
135 the case for NICs that lack loopback support). Other secondary capa‐
136 bility bits mostly deal with features targeting highly-scalable appli‐
137 cations, but may not be commonly supported across multiple providers.
138
139 Because different providers support different sets of capabilities, ap‐
140 plications that desire optimal network performance may need to code for
141 a capability being either present or absent. When present, such capa‐
142 bilities can offer a scalability or performance boost. When absent, an
143 application may prefer to adjust its protocol or implementation to work
144 around the network limitations. Although providers can often emulate
145 features, doing so can impact overall performance, including the per‐
146 formance of data transfers that otherwise appear unrelated to the fea‐
147 ture in use. For example, if a provider needs to insert protocol head‐
148 ers into the message stream in order to implement a given capability,
149 the insertion of that header could negatively impact the performance of
150 all transfers. By exposing such limitations to the application, the
151 application developer has better control over how to best emulate the
152 feature or work around its absence.
153
154 It is recommended that applications code for only those capabilities
155 required to achieve the best performance. If a capability would have
156 little to no effect on overall performance, developers should avoid us‐
157 ing such features as part of an initial implementation. This will al‐
158 low the application to work well across the widest variety of hardware.
159 Application optimizations can then add support for less common fea‐
160 tures. To see which features are supported by which providers, see the
161 libfabric Provider Feature Maxtrix for the relevant release.
162
163 Mode Bits (fi_info::mode)
164 Where capability bits represent features desired by applications, mode
165 bits correspond to behavior needed by the provider. That is, capabili‐
166 ty bits are top down requests, whereas mode bits are bottom up restric‐
167 tions. Mode bits are set by the provider to request that the applica‐
168 tion use the API in a specific way in order to achieve optimal perfor‐
169 mance. Mode bits often imply that the additional work to implement
170 certain communication semantics needed by the application will be less
171 if implemented by the applicaiton than forcing that same implementation
172 down into the provider. Mode bits arise as a result of hardware imple‐
173 mentation restrictions.
174
175 An application developer decides which mode bits they want to or can
176 easily support as part of their development process. Each mode bit de‐
177 scribes a particular behavior that the application must follow to use
178 various interfaces. Applications set the mode bits that they support
179 when calling fi_getinfo(). If a provider requires a mode bit that
180 isn’t set, that provider will be skipped by fi_getinfo(). If a
181 provider does not need a mode bit that is set, it will respond to the
182 fi_getinfo() call, with the mode bit cleared. This indicates that the
183 application does not need to perform the action required by the mode
184 bit.
185
186 One of common mode bit needed by providers is FI_CONTEXT (and FI_CON‐
187 TEXT2). This mode bit requires that applications pass in a libfabric
188 defined data structure (struct fi_context) into any data transfer func‐
189 tion. That structure must remain valid and unused by the application
190 until the data transfer operation completes. The purpose behind this
191 mode bit is that the struct fi_context provides “scratch” space that
192 the provider can use to track the request. For example, it may need to
193 insert the request into a linked list while it is pending, or track the
194 number of times that an outbound transfer has been retried. Since many
195 applications already track outstanding operations with their own data
196 structure, by embedding the struct fi_context into that same structure,
197 overall performance can be improved. This avoids the provider needing
198 to allocate and free internal structures for each request.
199
200 Continuing with this example, if an application does not already track
201 outstanding requests, then it would leave the FI_CONTEXT mode bit un‐
202 set. This would indicate that the provider needs to get and release
203 its own structure for tracking purposes. In this case, the costs would
204 essentially be the same whether it were done by the application or
205 provider.
206
207 For the broadest support of different network technologies, applica‐
208 tions should attempt to support as many mode bits as feasible. It is
209 recommended that providers support applications that cannot support any
210 mode bits, with as small an impact as possible. However, implementa‐
211 tion of mode bit avoidance in the provider can still impact perfor‐
212 mance, even when the mode bit is disabled. As a result, some providers
213 may always require specific mode bits be set.
214
216 FID stands for fabric identifier. It is the base object type assigned
217 to all libfabric API objects. All fabric resources are represented by
218 a fid structure, and all fid’s are derived from a base fid type. In
219 object-oriented terms, a fid would be the parent class. The contents
220 of a fid are visible to the application.
221
222 /* Base FID definition */
223 enum {
224 FI_CLASS_UNSPEC,
225 FI_CLASS_FABRIC,
226 FI_CLASS_DOMAIN,
227 ...
228 };
229
230 struct fi_ops {
231 size_t size;
232 int (*close)(struct fid *fid);
233 ...
234 };
235
236 /* All fabric interface descriptors must start with this structure */
237 struct fid {
238 size_t fclass;
239 void *context;
240 struct fi_ops *ops;
241 };
242
243 The fid structure is designed as a trade-off between minimizing memory
244 footprint versus software overhead. Each fid is identified as a spe‐
245 cific object class, which helps with debugging. Examples are given
246 above (e.g. FI_CLASS_FABRIC). The context field is an application de‐
247 fined data value, assigned to an object during its creation. The use
248 of the context field is application specific, but it is meant to be
249 read by applications. Applications often set context to a correspond‐
250 ing structure that it’s allocated. The context field is the only field
251 that applications are recommended to access directly. Access to other
252 fields should be done using defined function calls (for example, the
253 close() operation).
254
255 The ops field points to a set of function pointers. The fi_ops struc‐
256 ture defines the operations that apply to that class. The size field
257 in the fi_ops structure is used for extensibility, and allows the
258 fi_ops structure to grow in a backward compatible manner as new opera‐
259 tions are added. The fid deliberately points to the fi_ops structure,
260 rather than embedding the operations directly. This allows multiple
261 fids to point to the same set of ops, which minimizes the memory foot‐
262 print of each fid. (Internally, providers usually set ops to a static
263 data structure, with the fid structure dynamically allocated.)
264
265 Although it’s possible for applications to access function pointers di‐
266 rectly, it is strongly recommended that the static inline functions de‐
267 fined in the man pages be used instead. This is required by applica‐
268 tions that may be built using the FABRIC_DIRECT library feature. (FAB‐
269 RIC_DIRECT is a compile time option that allows for highly optimized
270 builds by tightly coupling an application with a specific provider.)
271
272 Other OFI classes are derived from this structure, adding their own set
273 of operations.
274
275 /* Example of deriving a new class for a fabric object */
276 struct fi_ops_fabric {
277 size_t size;
278 int (*domain)(struct fid_fabric *fabric, struct fi_info *info,
279 struct fid_domain **dom, void *context);
280 ...
281 };
282
283 struct fid_fabric {
284 struct fid fid;
285 struct fi_ops_fabric *ops;
286 };
287
288 Other fid classes follow a similar pattern as that shown for fid_fab‐
289 ric. The base fid structure is followed by zero or more pointers to
290 operation sets.
291
293 The top-level object that applications open is the fabric identifier.
294 The fabric can mostly be viewed as a container object by applications,
295 though it does identify which provider(s) applications use.
296
297 Opening a fabric is usually a straightforward call after calling
298 fi_getinfo().
299
300 int fi_fabric(struct fi_fabric_attr *attr, struct fid_fabric **fabric, void *context);
301
302 The fabric attributes can be directly accessed from struct fi_info.
303 The newly opened fabric is returned through the `fabric' parameter.
304 The `context' parameter appears in many operations. It is a user-spec‐
305 ified value that is associated with the fabric. It may be used to
306 point to an application specific structure and is retrievable from
307 struct fid_fabric.
308
309 Attributes (fi_fabric_attr)
310 The fabric attributes are straightforward.
311
312 struct fi_fabric_attr {
313 struct fid_fabric *fabric;
314 char *name;
315 char *prov_name;
316 uint32_t prov_version;
317 uint32_t api_version;
318 };
319
320 The only field that applications are likely to use directly is the
321 prov_name. This is a string value that can be used by hints to select
322 a specific provider for use. On most systems, there will be multiple
323 providers available. Only one is likely to represent the high-perfor‐
324 mance network attached to the system. Others are generic providers
325 that may be available on any system, such as the TCP socket and UDP
326 providers.
327
328 The fabric field is used to help applications manage opened fabric re‐
329 sources. If an application has already opened a fabric that can sup‐
330 port the returned fi_info structure, this will be set to that fabric.
331
333 Domains frequently map to a specific local network interface adapter.
334 A domain may either refer to the entire NIC, a port on a multi-port
335 NIC, a virtual device exposed by a NIC, multiple NICs being used in a
336 multi-rail fashion, and so forth. Although it’s convenient to think of
337 a domain as referring to a NIC, such an association isn’t expected by
338 libfabric. From the viewpoint of the application, a domain identifies
339 a set of resources that may be used together.
340
341 Similar to a fabric, opening a domain is straightforward after calling
342 fi_getinfo().
343
344 int fi_domain(struct fid_fabric *fabric, struct fi_info *info,
345 struct fid_domain **domain, void *context);
346
347 The fi_info structure returned from fi_getinfo() can be passed directly
348 to fi_domain() to open a new domain.
349
350 Attributes (fi_domain_attr)
351 One of the goals of a domain is to define the relationship between data
352 transfer services (endpoints) and completion services (completion
353 queues and counters). Many of the domain attributes describe that re‐
354 lationship and its impact to the application.
355
356 struct fi_domain_attr {
357 struct fid_domain *domain;
358 char *name;
359 enum fi_threading threading;
360 enum fi_progress control_progress;
361 enum fi_progress data_progress;
362 enum fi_resource_mgmt resource_mgmt;
363 enum fi_av_type av_type;
364 enum fi_mr_mode mr_mode;
365 size_t mr_key_size;
366 size_t cq_data_size;
367 size_t cq_cnt;
368 size_t ep_cnt;
369 size_t tx_ctx_cnt;
370 size_t rx_ctx_cnt;
371 ...
372
373 Full details of the domain attributes and their meaning are in the
374 fi_domain man page. Information on select attributes and their impact
375 to the application are described below.
376
377 Threading (fi_threading)
378 libfabric defines a unique threading model. The libfabric design is
379 heavily influenced by object-oriented programming concepts. A multi-
380 threaded application must determine how libfabric objects (domains,
381 endpoints, completion queues, etc.) will be allocated among its
382 threads, or if any thread can access any object. For example, an ap‐
383 plication may spawn a new thread to handle each new connected endpoint.
384 The domain threading field provides a mechanism for an application to
385 identify which objects may be accessed simultaneously by different
386 threads. This in turn allows a provider to optimize or, in some cases,
387 eliminate internal synchronization and locking around those objects.
388
389 Threading defines where providers could optimize synchronization primi‐
390 tives. However, providers may still implement more serialization than
391 is needed by the application. (This is usually a result of keeping the
392 provider implementation simpler).
393
394 It is recommended that applications target either FI_THREAD_SAFE (full
395 thread safety implemented by the provider) or FI_THREAD_DOMAIN (objects
396 associated with a single domain will only be accessed by a single
397 thread).
398
399 Progress (fi_progress)
400 Progress models are a result of using the host processor in order to
401 perform some portion of the transport protocol. In order to simplify
402 development, libfabric defines two progress models: automatic or manu‐
403 al. It does not attempt to identify which specific interface features
404 may be offloaded, or what operations require additional processing by
405 the application’s thread.
406
407 Automatic progress means that an operation initiated by the application
408 will eventually complete, even if the application makes no further
409 calls into the libfabric API. The operation is either offloaded en‐
410 tirely onto hardware, the provider uses an internal thread, or the op‐
411 erating system kernel may perform the task. The use of automatic
412 progress may increase system overhead and latency in the latter two
413 cases. For control operations, such as connection setup, this is usu‐
414 ally acceptable. However, the impact to data transfers may be measur‐
415 able, especially if internal threads are required to provide automatic
416 progress.
417
418 The manual progress model can avoid this overhead for providers that do
419 not offload all transport features into hardware. With manual progress
420 the provider implementation will handle transport operations as part of
421 specific libfabric functions. For example, a call to fi_cq_read()
422 which retrieves an array completed operations may also be responsible
423 for sending ack messages to notify peers that a message has been re‐
424 ceived. Since reading the completion queue is part of the normal oper‐
425 ation of an application, there is minimal impact to the application and
426 additional threads are avoided.
427
428 Applications need to take care when using manual progress, particularly
429 if they link into libfabric multiple times through different code paths
430 or library dependencies. If application threads are used to drive
431 progress, such as responding to received data with ACKs, then it is
432 critical that the application thread call into libfabric in a timely
433 manner.
434
435 Memory Registration (fid_mr)
436 RMA, atomic, and collective operations can read and write memory that
437 is owned by a peer process, and neither require the involvement of the
438 target processor. Because the memory can be modified over the network,
439 an application must opt into exposing its memory to peers. This is
440 handled by the memory registration process. Registered memory regions
441 associate memory buffers with permissions granted for access by fabric
442 resources. A memory buffer must be registered before it can be used as
443 the target of a remote RMA, atomic, or collective data transfer. Addi‐
444 tionally, a fabric provider may require that data buffers be registered
445 before being used even in the case of local transfers. The latter is
446 necessary to ensure that the virtual to physical page mappings do not
447 change while network hardware is performing the transfer.
448
449 In order to handle diverse hardware requirements, there are a set of
450 mr_mode bits associated with memory registration. The mr_mode bits be‐
451 have similar to fi_info mode bits. Applications indicate which types
452 of restrictions they can support, and the providers clear those bits
453 which aren’t needed.
454
455 For hardware that requires memory registration, managing registration
456 is critical to achieving good performance and scalability. The act of
457 registering memory is costly and should be avoided on a per data trans‐
458 fer basis. libfabric has extensive internal support for managing memo‐
459 ry registration, hiding registration from user application, caching
460 registration to reduce per transfer overhead, and detecting when cached
461 registrations are no longer valid. It is recommended that applications
462 that are not natively designed to account for registering memory to
463 make use of libfabric’s registration cache. This can be done by simply
464 not setting the relevant mr_mode bits.
465
466 Memory Region APIs
467 The following APIs highlight how to allocate and access a registered
468 memory region. Note that this is not a complete list of memory region
469 (MR) calls, and for full details on each API, readers should refer di‐
470 rectly to the fi_mr man page.
471
472 int fi_mr_reg(struct fid_domain *domain, const void *buf, size_t len,
473 uint64_t access, uint64_t offset, uint64_t requested_key, uint64_t flags,
474 struct fid_mr **mr, void *context);
475
476 void * fi_mr_desc(struct fid_mr *mr);
477 uint64_t fi_mr_key(struct fid_mr *mr);
478
479 By default, memory regions are associated with a domain. A MR is ac‐
480 cessible by any endpoint that is opened on that domain. A region
481 starts at the address specified by `buf', and is `len' bytes long. The
482 `access' parameter are permission flags that are OR’ed together. The
483 permissions indicate which type of operations may be invoked against
484 the region (e.g. FI_READ, FI_WRITE, FI_REMOTE_READ, FI_REMOTE_WRITE).
485 The `buf' parameter typically references allocated virtual memory.
486
487 A MR is associated with local and remote protection keys. The local
488 key is referred to as a memory descriptor and may be retrieved by call‐
489 ing fi_mr_desc(). This call is only needed if the FI_MR_LOCAL mr_mode
490 bit has been set. The memory descriptor is passed directly into data
491 transfer operations, for example:
492
493 /* fi_mr_desc() example using fi_send() */
494 fi_send(ep, buf, len, fi_mr_desc(mr), 0, NULL);
495
496 The remote key, or simply MR key, is used by the peer when targeting
497 the MR with an RMA or atomic operation. In many cases, the key will
498 need to be sent in a separate message to the initiating peer. libfab‐
499 ric API uses a 64-bit key where one is used. The actual key size used
500 by a provider is part of its domain attributes Support for larger key
501 sizes, as required by some providers, is conveyed through an mr_mode
502 bit, and requires the use of extended MR API calls that map the larger
503 size to a 64-bit value.
504
506 Endpoints are transport level communication portals. Opening an end‐
507 point is trivial after calling fi_getinfo().
508
509 Active (fid_ep)
510 Active endpoints may be connection-oriented or connection-less. They
511 are considered active as they may be used to perform data transfers.
512 All data transfer interfaces – messages (fi_msg), tagged messages
513 (fi_tagged), RMA (fi_rma), atomics (fi_atomic), and collectives
514 (fi_collective) – are associated with active endpoints. Though an in‐
515 dividual endpoint may not be enabled to use all data transfers. In
516 standard configurations, an active endpoint has one transmit and one
517 receive queue. In general, operations that generate traffic on the
518 fabric are posted to the transmit queue. This includes all RMA and
519 atomic operations, along with sent messages and sent tagged messages.
520 Operations that post buffers for receiving incoming data are submitted
521 to the receive queue.
522
523 Active endpoints are created in the disabled state. The endpoint must
524 first be configured prior to it being enabled. Endpoints must transi‐
525 tion into an enabled state before accepting data transfer operations,
526 including posting of receive buffers. The fi_enable() call is used to
527 transition an active endpoint into an enabled state. The fi_connect()
528 and fi_accept() calls will also transition an endpoint into the enabled
529 state, if it is not already enabled.
530
531 int fi_endpoint(struct fid_domain *domain, struct fi_info *info,
532 struct fid_ep **ep, void *context);
533
534 Enabling (fi_enable)
535 In order to transition an endpoint into an enabled state, it must be
536 bound to one or more fabric resources. This includes binding the end‐
537 point to a completion queue and event queue. Unconnected endpoints
538 must also be bound to an address vector.
539
540 /* Example to enable an unconnected endpoint */
541
542 /* Allocate an address vector and associated it with the endpoint */
543 fi_av_open(domain, &av_attr, &av, NULL);
544 fi_ep_bind(ep, &av->fid, 0);
545
546 /* Allocate and associate completion queues with the endpoint */
547 fi_cq_open(domain, &cq_attr, &cq, NULL);
548 fi_ep_bind(ep, &cq->fid, FI_TRANSMIT | FI_RECV);
549
550 fi_enable(ep);
551
552 In the above example, we allocate an AV and CQ. The attributes for the
553 AV and CQ are omitted (additional discussion below). Those are then
554 associated with the endpoint through the fi_ep_bind() call. After all
555 necessary resources have been assigned to the endpoint, we enable it.
556 Enabling the endpoint indicates to the provider that it should allocate
557 any hardware and software resources and complete the initialization for
558 the endpoint. (If the endpoint is not bound to all necessary re‐
559 sources, the fi_enable() call will fail.)
560
561 The fi_enable() call is always called for unconnected endpoints. Con‐
562 nected endpoints may be able to skip calling fi_enable(), since fi_con‐
563 nect() and fi_accept() will enable the endpoint automatically. Howev‐
564 er, applications may still call fi_enable() prior to calling fi_con‐
565 nect() or fi_accept(). Doing so allows the application to post receive
566 buffers to the endpoint, which ensures that they are available to re‐
567 ceive data in the case the peer endpoint sends messages immediately af‐
568 ter it establishes the connection.
569
570 Passive (fid_pep)
571 Passive endpoints are used to listen for incoming connection requests.
572 Passive endpoints are of type FI_EP_MSG, and may not perform any data
573 transfers. An application wishing to create a passive endpoint typi‐
574 cally calls fi_getinfo() using the FI_SOURCE flag, often only specify‐
575 ing a `service' address. The service address corresponds to a TCP port
576 number.
577
578 Passive endpoints are associated with event queues. Event queues re‐
579 port connection requests from peers. Unlike active endpoints, passive
580 endpoints are not associated with a domain. This allows an application
581 to listen for connection requests across multiple domains, though still
582 restricted to a single provider.
583
584 /* Example passive endpoint listen */
585 fi_passive_ep(fabric, info, &pep, NULL);
586
587 fi_eq_open(fabric, &eq_attr, &eq, NULL);
588 fi_pep_bind(pep, &eq->fid, 0);
589
590 fi_listen(pep);
591
592 A passive endpoint must be bound to an event queue before calling lis‐
593 ten. This ensures that connection requests can be reported to the ap‐
594 plication. To accept new connections, the application waits for a re‐
595 quest, allocates a new active endpoint for it, and accepts the request.
596
597 /* Example accepting a new connection */
598
599 /* Wait for a CONNREQ event */
600 fi_eq_sread(eq, &event, &cm_entry, sizeof cm_entry, -1, 0);
601 assert(event == FI_CONNREQ);
602
603 /* Allocate a new endpoint for the connection */
604 if (!cm_entry.info->domain_attr->domain)
605 fi_domain(fabric, cm_entry.info, &domain, NULL);
606 fi_endpoint(domain, cm_entry.info, &ep, NULL);
607
608 fi_ep_bind(ep, &eq->fid, 0);
609 fi_cq_open(domain, &cq_attr, &cq, NULL);
610 fi_ep_bind(ep, &cq->fid, FI_TRANSMIT | FI_RECV);
611
612 fi_enable(ep);
613 fi_recv(ep, rx_buf, len, NULL, 0, NULL);
614
615 fi_accept(ep, NULL, 0);
616 fi_eq_sread(eq, &event, &cm_entry, sizeof cm_entry, -1, 0);
617 assert(event == FI_CONNECTED);
618
619 The connection request event (FI_CONNREQ) includes information about
620 the type of endpoint to allocate, including default attributes to use.
621 If a domain has not already been opened for the endpoint, one must be
622 opened. Then the endpoint and related resources can be allocated. Un‐
623 like the unconnected endpoint example above, a connected endpoint does
624 not have an AV, but does need to be bound to an event queue. In this
625 case, we use the same EQ as the listening endpoint. Once the other EP
626 resources (e.g. CQ) have been allocated and bound, the EP can be en‐
627 abled.
628
629 To accept the connection, the application calls fi_accept(). Note that
630 because of thread synchronization issues, it is possible for the active
631 endpoint to receive data even before fi_accept() can return. The post‐
632 ing of receive buffers prior to calling fi_accept() handles this condi‐
633 tion, which avoids network flow control issues occurring immediately
634 after connecting.
635
636 The fi_eq_sread() calls are blocking (synchronous) read calls to the
637 event queue. These calls wait until an event occurs, which in this
638 case are connection request and establishment events.
639
640 EP Attributes (fi_ep_attr)
641 The properties of an endpoint are specified using endpoint attributes.
642 These are attributes for the endpoint as a whole. There are additional
643 attributes specifically related to the transmit and receive contexts
644 underpinning the endpoint (details below).
645
646 struct fi_ep_attr {
647 enum fi_ep_type type;
648 uint32_t protocol;
649 uint32_t protocol_version;
650 size_t max_msg_size;
651 ...
652 };
653
654 A full description of each field is available in the fi_endpoint man
655 page, with selected details listed below.
656
657 Endpoint Type (fi_ep_type)
658 This indicates the type of endpoint: reliable datagram (FI_EP_RDM), re‐
659 liable-connected (FI_EP_MSG), or unreliable datagram (FI_EP_DGRAM).
660 Nearly all applications will want to specify the endpoint type as a
661 hint passed into fi_getinfo, as most applications will only be coded to
662 support a single endpoint type.
663
664 Maximum Message Size (max_msg_size)
665 This size is the maximum size for any data transfer operation that goes
666 over the endpoint. For unreliable datagram endpoints, this is often
667 the MTU of the underlying network. For reliable endpoints, this value
668 is often a restriction of the underlying transport protocol. A common
669 minimum maximum message size is 2GB, though some providers support an
670 arbitrarily large size. Applications that require transfers larger
671 than the maximum reported size are required to break up a single, large
672 transfer into multiple operations.
673
674 Providers expose their hardware or network limits to the applications,
675 rather than segmenting large transfers internally, in order to minimize
676 completion overhead. For example, for a provider to support large mes‐
677 sage segmentation internally, it would need to emulate all completion
678 mechanisms (queues and counters) in software, even if transfers that
679 are larger than the transport supported maximum were never used.
680
681 Message Order Size (max_order_xxx_size)
682 These fields specify data ordering. They define the delivery order of
683 transport data into target memory for RMA and atomic operations. Data
684 ordering requires message ordering. If message ordering is not speci‐
685 fied, these fields do not apply.
686
687 For example, suppose that an application issues two RMA write opera‐
688 tions to the same target memory location. (The application may be
689 writing a time stamp value every time a local condition is met, for in‐
690 stance). Message ordering indicates that the first write as initiated
691 by the sender is the first write processed by the receiver. Data or‐
692 dering indicates whether the data from the first write updates memory
693 before the second write updates memory.
694
695 The max_order_xxx_size fields indicate how large a message may be while
696 still achieving data ordering. If a field is 0, then no data ordering
697 is guaranteed. If a field is the same as the max_msg_size, then data
698 order is guaranteed for all messages.
699
700 Providers may support data ordering up to max_msg_size for back to back
701 operations that are the same. For example, an RMA write followed by an
702 RMA write may have data ordering regardless of the size of the data
703 transfer (max_order_waw_size = max_msg_size). Mixed operations, such
704 as a read followed by a write, are often restricted. This is because
705 RMA read operations may require acknowledgments from the initiator,
706 which impacts the re-transmission protocol.
707
708 For example, consider an RMA read followed by a write. The target will
709 process the read request, retrieve the data, and send a reply. While
710 that is occurring, a write is received that wants to update the same
711 memory location accessed by the read. If the target processes the
712 write, it will overwrite the memory used by the read. If the read re‐
713 sponse is lost, and the read is retried, the target will be unable to
714 re-send the data. To handle this, the target either needs to: defer
715 handling the write until it receives an acknowledgment for the read re‐
716 sponse, buffer the read response so it can be re-transmitted, or indi‐
717 cate that data ordering is not guaranteed.
718
719 Because the read or write operation may be gigabytes in size, deferring
720 the write may add significant latency, and buffering the read response
721 may be impractical. The max_order_xxx_size fields indicate how large
722 back to back operations may be with ordering still maintained. In many
723 cases, read after write and write and read ordering may be significant‐
724 ly limited, but still usable for implementing specific algorithms, such
725 as a global locking mechanism.
726
727 Rx/Tx Context Attributes (fi_rx_attr / fi_tx_attr)
728 The endpoint attributes define the overall abilities for the endpoint;
729 however, attributes that apply specifically to receive or transmit con‐
730 texts are defined by struct fi_rx_attr and fi_tx_attr, respectively:
731
732 struct fi_rx_attr {
733 uint64_t caps;
734 uint64_t mode;
735 uint64_t op_flags;
736 uint64_t msg_order;
737 uint64_t comp_order;
738 ...
739 };
740
741 struct fi_tx_attr {
742 uint64_t caps;
743 uint64_t mode;
744 uint64_t op_flags;
745 uint64_t msg_order;
746 uint64_t comp_order;
747 size_t inject_size;
748 ...
749 };
750
751 Rx/Tx context capabilities must be a subset of the endpoint capabili‐
752 ties. For many applications, the default attributes returned by the
753 provider will be sufficient, with the application only needing to spec‐
754 ify endpoint attributes.
755
756 Both context attributes include an op_flags field. This field is used
757 by applications to specify the default operation flags to use with any
758 call. For example, by setting the transmit context’s op_flags to
759 FI_INJECT, the application has indicated to the provider that all
760 transmit operations should assume `inject' behavior is desired. I.e.
761 the buffer provided to the call must be returned to the application up‐
762 on return from the function. The op_flags applies to all operations
763 that do not provide flags as part of the call (e.g. fi_sendmsg). One
764 use of op_flags is to specify the default completion semantic desired
765 (discussed next) by the application. By setting the default op_flags
766 at initialization time, we can avoid passing the flags as arguments in‐
767 to some data transfer calls, avoid parsing the flags, and can prepare
768 submitted commands ahead of time.
769
770 It should be noted that some attributes are dependent upon the peer
771 endpoint having supporting attributes in order to achieve correct ap‐
772 plication behavior. For example, message order must be the compatible
773 between the initiator’s transmit attributes and the target’s receive
774 attributes. Any mismatch may result in incorrect behavior that could
775 be difficult to debug.
776
778 Data transfer operations complete asynchronously. Libfabric defines
779 two mechanism by which an application can be notified that an operation
780 has completed: completion queues and counters. Regardless of which
781 mechanism is used to notify the application that an operation is done,
782 developers must be aware of what a completion indicates.
783
784 In all cases, a completion indicates that it is safe to reuse the buf‐
785 fer(s) associated with the data transfer. This completion mode is re‐
786 ferred to as inject complete and corresponds to the operational flags
787 FI_INJECT_COMPLETE. However, a completion may also guarantee stronger
788 semantics.
789
790 Although libfabric does not define an implementation, a provider can
791 meet the requirement for inject complete by copying the application’s
792 buffer into a network buffer before generating the completion. Even if
793 the transmit operation is lost and must be retried, the provider can
794 resend the original data from the copied location. For large trans‐
795 fers, a provider may not mark a request as inject complete until the
796 data has been acknowledged by the target. Applications, however,
797 should only infer that it is safe to reuse their data buffer for an in‐
798 ject complete operation.
799
800 Transmit complete is a completion mode that provides slightly stronger
801 guarantees to the application. The meaning of transmit complete de‐
802 pends on whether the endpoint is reliable or unreliable. For an unre‐
803 liable endpoint (FI_EP_DGRAM), a transmit completion indicates that the
804 request has been delivered to the network. That is, the message has
805 been delivered at least as far as hardware queues on the local NIC.
806 For reliable endpoints, a transmit complete occurs when the request has
807 reached the target endpoint. Typically, this indicates that the target
808 has acked the request. Transmit complete maps to the operation flag
809 FI_TRANSMIT_COMPLETE.
810
811 A third completion mode is defined to provide guarantees beyond trans‐
812 mit complete. With transmit complete, an application knows that the
813 message is no longer dependent on the local NIC or network
814 (e.g. switches). However, the data may be buffered at the remote NIC
815 and has not necessarily been written to the target memory. As a re‐
816 sult, data sent in the request may not be visible to all processes.
817 The third completion mode is delivery complete.
818
819 Delivery complete indicates that the results of the operation are
820 available to all processes on the fabric. The distinction between
821 transmit and delivery complete is subtle, but important. It often
822 deals with when the target endpoint generates an acknowledgment to a
823 message. For providers that offload transport protocol to the NIC,
824 support for transmit complete is common. Delivery complete guarantees
825 are more easily met by providers that implement portions of their pro‐
826 tocol on the host processor. Delivery complete corresponds to the
827 FI_DELIVERY_COMPLETE operation flag.
828
829 Applications can request a default completion mode when opening an end‐
830 point by setting one of the above mentioned complete flags as an
831 op_flags for the context’s attributes. However, it is usually recom‐
832 mended that application use the provider’s default flags for best per‐
833 formance, and amend its protocol to achieve its completion semantics.
834 For example, many applications will perform a `finalize' or `commit'
835 procedure as part of their operation, which synchronizes the processing
836 of all peers and guarantees that all previously sent data has been re‐
837 ceived.
838
839 A full discussion of completion semantics is given in the fi_cq man
840 page.
841
842 CQs (fid_cq)
843 Completion queues often map directly to provider hardware mechanisms,
844 and libfabric is designed around minimizing the software impact of ac‐
845 cessing those mechanisms. Unlike other objects discussed so far (fab‐
846 rics, domains, endpoints), completion queues are not part of the fi_in‐
847 fo structure or involved with the fi_getinfo() call.
848
849 All active endpoints must be bound with one or more completion queues.
850 This is true even if completions will be suppressed by the application
851 (e.g. using the FI_SELECTIVE_COMPLETION flag). Completion queues are
852 needed to report operations that complete in error and help drive
853 progress in the case of manual progress.
854
855 CQs are allocated separately from endpoints and are associated with
856 endpoints through the fi_ep_bind() function.
857
858 CQ Format (fi_cq_format)
859 In order to minimize the amount of data that a provider must report,
860 the type of completion data written back to the application is select-
861 able. This limits the number of bytes the provider writes to memory,
862 and allows necessary completion data to fit into a compact structure.
863 Each CQ format maps to a specific completion structure. Developers
864 should analyze each structure, select the smallest structure that con‐
865 tains all of the data it requires, and specify the corresponding enum
866 value as the CQ format.
867
868 For example, if an application only needs to know which request com‐
869 pleted, along with the size of a received message, it can select the
870 following:
871
872 cq_attr->format = FI_CQ_FORMAT_MSG;
873
874 struct fi_cq_msg_entry {
875 void *op_context;
876 uint64_t flags;
877 size_t len;
878 };
879
880 Once the format has been selected, the underlying provider will assume
881 that read operations against the CQ will pass in an array of the corre‐
882 sponding structure. The CQ data formats are designed such that a
883 structure that reports more information can be cast to one that reports
884 less.
885
886 Reading Completions (fi_cq_read)
887 Completions may be read from a CQ by using one of the non-blocking
888 calls, fi_cq_read / fi_cq_readfrom, or one of the blocking calls,
889 fi_cq_sread / fi_cq_sreadfrom. Regardless of which call is used, ap‐
890 plications pass in an array of completion structures based on the se‐
891 lected CQ format. The CQ interfaces are optimized for batch completion
892 processing, allowing the application to retrieve multiple completions
893 from a single read call. The difference between the read and readfrom
894 calls is that readfrom returns source addressing data, if available.
895 The readfrom derivative of the calls is only useful for unconnected
896 endpoints, and only if the corresponding endpoint has been configured
897 with the FI_SOURCE capability.
898
899 FI_SOURCE requires that the provider use the source address available
900 in the raw completion data, such as the packet’s source address, to re‐
901 trieve a matching entry in the endpoint’s address vector. Applications
902 that carry some sort of source identifier as part of their data packets
903 can avoid the overhead associated with using FI_SOURCE.
904
905 Retrieving Errors
906 Because the selected completion structure is insufficient to report all
907 data necessary to debug or handle an operation that completes in error,
908 failed operations are reported using a separate fi_cq_readerr() func‐
909 tion. This call takes as input a CQ error entry structure, which al‐
910 lows the provider to report more information regarding the reason for
911 the failure.
912
913 /* read error prototype */
914 fi_cq_readerr(struct fid_cq *cq, struct fi_cq_err_entry *buf, uint64_t flags);
915
916 /* error data structure */
917 struct fi_cq_err_entry {
918 void *op_context;
919 uint64_t flags;
920 size_t len;
921 void *buf;
922 uint64_t data;
923 uint64_t tag;
924 size_t olen;
925 int err;
926 int prov_errno;
927 void *err_data;
928 size_t err_data_size;
929 };
930
931 /* Sample error handling */
932 struct fi_cq_msg_entry entry;
933 struct fi_cq_err_entry err_entry;
934 int ret;
935
936 ret = fi_cq_read(cq, &entry, 1);
937 if (ret == -FI_EAVAIL)
938 ret = fi_cq_readerr(cq, &err_entry, 0);
939
940 As illustrated, if an error entry has been inserted into the completion
941 queue, then attempting to read the CQ will result in the read call re‐
942 turning -FI_EAVAIL (error available). This indicates that the applica‐
943 tion must use the fi_cq_readerr() call to remove the failed operation’s
944 completion information before other completions can be reaped from the
945 CQ.
946
947 A fabric error code regarding the failure is reported as the err field
948 in the fi_cq_err_entry structure. A provider specific error code is
949 also available through the prov_errno field. This field can be decoded
950 into a displayable string using the fi_cq_strerror() routine. The
951 err_data field is provider specific data that assists the provider in
952 decoding the reason for the failure.
953
955 A primary goal of address vectors is to allow applications to communi‐
956 cate with thousands to millions of peers while minimizing the amount of
957 data needed to store peer addressing information. It pushes fabric
958 specific addressing details away from the application to the provider.
959 This allows the provider to optimize how it converts addresses into
960 routing data, and enables data compression techniques that may be dif‐
961 ficult for an application to achieve without being aware of low-level
962 fabric addressing details. For example, providers may be able to algo‐
963 rithmically calculate addressing components, rather than storing the
964 data locally. Additionally, providers can communicate with resource
965 management entities or fabric manager agents to obtain quality of ser‐
966 vice or other information about the fabric, in order to improve network
967 utilization.
968
969 An equally important objective is ensuring that the resulting inter‐
970 faces, particularly data transfer operations, are fast and easy to use.
971 Conceptually, an address vector converts an endpoint address into an
972 fi_addr_t. The fi_addr_t (fabric interface address datatype) is a
973 64-bit value that is used in all `fast-path' operations – data trans‐
974 fers and completions.
975
976 Address vectors are associated with domain objects. This allows
977 providers to implement portions of an address vector, such as quality
978 of service mappings, in hardware.
979
980 AV Type (fi_av_type)
981 There are two types of address vectors. The type refers to the format
982 of the returned fi_addr_t values for addresses that are inserted into
983 the AV. With type FI_AV_TABLE, returned addresses are simple indices,
984 and developers may think of the AV as an array of addresses. Each ad‐
985 dress that is inserted into the AV is mapped to the index of the next
986 free array slot. The advantage of FI_AV_TABLE is that applications can
987 refer to peers using a simple index, eliminating an application’s need
988 to store any addressing data. I.e. the application can generate the
989 fi_addr_t values themselves. This type maps well to applications, such
990 as MPI, where a peer is referenced by rank.
991
992 The second type is FI_AV_MAP. This type does not define any specific
993 format for the fi_addr_t value. Applications that use type map are re‐
994 quired to provide the correct fi_addr_t for a given peer when issuing a
995 data transfer operation. The advantage of FI_AV_MAP is that a provider
996 can use the fi_addr_t to encode the target’s address, which avoids re‐
997 trieving the data from memory. As a simple example, consider a fabric
998 that uses TCP/IPv4 based addressing. An fi_addr_t is large enough to
999 contain the address, which allows a provider to copy the data from the
1000 fi_addr_t directly into an outgoing packet.
1001
1002 Sharing AVs Between Processes
1003 Large scale parallel programs typically run with multiple processes al‐
1004 located on each node. Because these processes communicate with the
1005 same set of peers, the addressing data needed by each process is the
1006 same. Libfabric defines a mechanism by which processes running on the
1007 same node may share their address vectors. This allows a system to
1008 maintain a single copy of addressing data, rather than one copy per
1009 process.
1010
1011 Although libfabric does not require any implementation for how an ad‐
1012 dress vector is shared, the interfaces map well to using shared memory.
1013 Address vectors which will be shared are given an application specific
1014 name. How an application selects a name that avoid conflicts with un‐
1015 related processes, or how it communicates the name with peer processes
1016 is outside the scope of libfabric.
1017
1018 In addition to having a name, a shared AV also has a base map address –
1019 map_addr. Use of map_addr is only important for address vectors that
1020 are of type FI_AV_MAP, and allows applications to share fi_addr_t val‐
1021 ues. From the viewpoint of the application, the map_addr is the base
1022 value for all fi_addr_t values. A common use for map_addr is for the
1023 process that creates the initial address vector to request a value from
1024 the provider, exchange the returned map_addr with its peers, and for
1025 the peers to open the shared AV using the same map_addr. This allows
1026 the fi_addr_t values to be stored in shared memory that is accessible
1027 by all peers.
1028
1030 There is an important difference between using libfabric completion ob‐
1031 jects, versus sockets, that may not be obvious from the discussions so
1032 far. With sockets, the object that is signaled is the same object that
1033 abstracts the queues, namely the file descriptor. When data is re‐
1034 ceived on a socket, that data is placed in a queue associated directly
1035 with the fd. Reading from the fd retrieves that data. If an applica‐
1036 tion wishes to block until data arrives on a socket, it calls select()
1037 or poll() on the fd. The fd is signaled when a message is received,
1038 which releases the blocked thread, allowing it to read the fd.
1039
1040 By associating the wait object with the underlying data queue, applica‐
1041 tions are exposed to an interface that is easy to use and race free.
1042 If data is available to read from the socket at the time select() or
1043 poll() is called, those calls simply return that the fd is readable.
1044
1045 There are a couple of significant disadvantages to this approach, which
1046 have been discussed previously, but from different perspectives. The
1047 first is that every socket must be associated with its own fd. There
1048 is no way to share a wait object among multiple sockets. (This is a
1049 main reason for the development of epoll semantics). The second is
1050 that the queue is maintained in the kernel, so that the select() and
1051 poll() calls can check them.
1052
1053 Libfabric allows for the separation of the wait object from the data
1054 queues. For applications that use libfabric interfaces to wait for
1055 events, such as fi_cq_sread, this separation is mostly hidden from the
1056 application. The exception is that applications may receive a signal,
1057 but no events are retrieved when a queue is read. This separation al‐
1058 lows the queues to reside in the application’s memory space, while wait
1059 objects may still use kernel components. A reason for the latter is
1060 that wait objects may be signaled as part of system interrupt process‐
1061 ing, which would go through a kernel driver.
1062
1063 Applications that want to use native wait objects (e.g. file descrip‐
1064 tors) directly in operating system calls must perform an additional
1065 step in their processing. In order to handle race conditions that can
1066 occur between inserting an event into a completion or event object and
1067 signaling the corresponding wait object, libfabric defines an `fi_try‐
1068 wait()' function. The fi_trywait implementation is responsible for
1069 handling potential race conditions which could result in an application
1070 either losing events or hanging. The following example demonstrates
1071 the use of fi_trywait().
1072
1073 /* Get the native wait object -- an fd in this case */
1074 fi_control(&cq->fid, FI_GETWAIT, (void *) &fd);
1075 FD_ZERO(&fds);
1076 FD_SET(fd, &fds);
1077
1078 while (1) {
1079 ret = fi_trywait(fabric, &cq->fid, 1);
1080 if (ret == FI_SUCCESS) {
1081 /* It’s safe to block on the fd */
1082 select(fd + 1, &fds, NULL, &fds, &timeout);
1083 } else if (ret == -FI_EAGAIN) {
1084 /* Read and process all completions from the CQ */
1085 do {
1086 ret = fi_cq_read(cq, &comp, 1);
1087 } while (ret > 0);
1088 } else {
1089 /* something really bad happened */
1090 }
1091 }
1092
1093 In this example, the application has allocated a CQ with an fd as its
1094 wait object. It calls select() on the fd. Before calling select(),
1095 the application must call fi_trywait() successfully (return code of
1096 FI_SUCCESS). Success indicates that a blocking operation can now be
1097 invoked on the native wait object without fear of the application hang‐
1098 ing or events being lost. If fi_trywait() returns –FI_EAGAIN, it usu‐
1099 ally indicates that there are queued events to process.
1100
1102 Environment variables are used by providers to configure internal op‐
1103 tions for optimal performance or memory consumption. Libfabric pro‐
1104 vides an interface for querying which environment variables are usable,
1105 along with an application to display the information to a command win‐
1106 dow. Although environment variables are usually configured by an ad‐
1107 ministrator, an application can query for variables programmatically.
1108
1109 /* APIs to query for supported environment variables */
1110 enum fi_param_type {
1111 FI_PARAM_STRING,
1112 FI_PARAM_INT,
1113 FI_PARAM_BOOL,
1114 FI_PARAM_SIZE_T,
1115 };
1116
1117 struct fi_param {
1118 /* The name of the environment variable */
1119 const char *name;
1120 /* What type of value it stores */
1121 enum fi_param_type type;
1122 /* A description of how the variable is used */
1123 const char *help_string;
1124 /* The current value of the variable */
1125 const char *value;
1126 };
1127
1128 int fi_getparams(struct fi_param **params, int *count);
1129 void fi_freeparams(struct fi_param *params);
1130
1131 The modification of environment variables is typically a tuning activi‐
1132 ty done on larger clusters. However there are a few values that are
1133 useful for developers. These can be seen by executing the fi_info com‐
1134 mand.
1135
1136 $ fi_info -e
1137 # FI_LOG_LEVEL: String
1138 # Specify logging level: warn, trace, info, debug (default: warn)
1139
1140 # FI_LOG_PROV: String
1141 # Specify specific provider to log (default: all)
1142
1143 # FI_PROVIDER: String
1144 # Only use specified provider (default: all available)
1145
1146 The fi_info application, which ships with libfabric, can be used to
1147 list all environment variables for all providers. The `-e' option will
1148 list all variables, and the `-g' option can be used to filter the out‐
1149 put to only those variables with a matching substring. Variables are
1150 documented directly in code with the description available as the
1151 help_string output.
1152
1153 The FI_LOG_LEVEL can be used to increase the debug output from libfab‐
1154 ric and the providers. Note that in the release build of libfabric,
1155 debug output from data path operations (transmit, receive, and comple‐
1156 tion processing) may not be available. The FI_PROVIDER variable can be
1157 used to enable or disable specific providers. This is useful to ensure
1158 that a given provider will be used.
1159
1161 OpenFabrics.
1162
1163
1164
1165Libfabric Programmer’s Manual 2023-01-02 fi_setup(7)