1fabric(7) Libfabric v1.18.1 fabric(7)
2
3
4
6 fabric - Fabric Interface Library
7
9 #include <rdma/fabric.h>
10
11 Libfabric is a high-performance fabric software library designed to
12 provide low-latency interfaces to fabric hardware. For an in-depth
13 discussion of the motivation and design see fi_guide(7).
14
16 Libfabric provides `process direct I/O' to application software commu‐
17 nicating across fabric software and hardware. Process direct I/O, his‐
18 torically referred to as RDMA, allows an application to directly access
19 network resources without operating system interventions. Data trans‐
20 fers can occur directly to and from application memory.
21
22 There are two components to the libfabric software:
23
24 Fabric Providers
25 Conceptually, a fabric provider may be viewed as a local hard‐
26 ware NIC driver, though a provider is not limited by this defi‐
27 nition. The first component of libfabric is a general purpose
28 framework that is capable of handling different types of fabric
29 hardware. All fabric hardware devices and their software driv‐
30 ers are required to support this framework. Devices and the
31 drivers that plug into the libfabric framework are referred to
32 as fabric providers, or simply providers. Provider details may
33 be found in fi_provider(7).
34
35 Fabric Interfaces
36 The second component is a set of communication operations. Lib‐
37 fabric defines several sets of communication functions that
38 providers can support. It is not required that providers imple‐
39 ment all the interfaces that are defined; however, providers
40 clearly indicate which interfaces they do support.
41
43 The fabric interfaces are designed such that they are cohesive and not
44 simply a union of disjoint interfaces. The interfaces are logically
45 divided into two groups: control interfaces and communication opera‐
46 tions. The control interfaces are a common set of operations that pro‐
47 vide access to local communication resources, such as address vectors
48 and event queues. The communication operations expose particular mod‐
49 els of communication and fabric functionality, such as message queues,
50 remote memory access, and atomic operations. Communication operations
51 are associated with fabric endpoints.
52
53 Applications will typically use the control interfaces to discover lo‐
54 cal capabilities and allocate necessary resources. They will then al‐
55 locate and configure a communication endpoint to send and receive data,
56 or perform other types of data transfers, with remote endpoints.
57
59 The control interfaces APIs provide applications access to network re‐
60 sources. This involves listing all the interfaces available, obtaining
61 the capabilities of the interfaces and opening a provider.
62
63 fi_getinfo - Fabric Information
64 The fi_getinfo call is the base call used to discover and re‐
65 quest fabric services offered by the system. Applications can
66 use this call to indicate the type of communication that they
67 desire. The results from fi_getinfo, fi_info, are used to re‐
68 serve and configure fabric resources.
69
70 fi_getinfo returns a list of fi_info structures. Each structure refer‐
71 ences a single fabric provider, indicating the interfaces that the
72 provider supports, along with a named set of resources. A fabric
73 provider may include multiple fi_info structures in the returned list.
74
75 fi_fabric - Fabric Domain
76 A fabric domain represents a collection of hardware and software
77 resources that access a single physical or virtual network. All
78 network ports on a system that can communicate with each other
79 through the fabric belong to the same fabric domain. A fabric
80 domain shares network addresses and can span multiple providers.
81 libfabric supports systems connected to multiple fabrics.
82
83 fi_domain - Access Domains
84 An access domain represents a single logical connection into a
85 fabric. It may map to a single physical or virtual NIC or a
86 port. An access domain defines the boundary across which fabric
87 resources may be associated. Each access domain belongs to a
88 single fabric domain.
89
90 fi_endpoint - Fabric Endpoint
91 A fabric endpoint is a communication portal. An endpoint may be
92 either active or passive. Passive endpoints are used to listen
93 for connection requests. Active endpoints can perform data
94 transfers. Endpoints are configured with specific communication
95 capabilities and data transfer interfaces.
96
97 fi_eq - Event Queue
98 Event queues, are used to collect and report the completion of
99 asynchronous operations and events. Event queues report events
100 that are not directly associated with data transfer operations.
101
102 fi_cq - Completion Queue
103 Completion queues are high-performance event queues used to re‐
104 port the completion of data transfer operations.
105
106 fi_cntr - Event Counters
107 Event counters are used to report the number of completed asyn‐
108 chronous operations. Event counters are considered light-
109 weight, in that a completion simply increments a counter, rather
110 than placing an entry into an event queue.
111
112 fi_mr - Memory Region
113 Memory regions describe application local memory buffers. In
114 order for fabric resources to access application memory, the ap‐
115 plication must first grant permission to the fabric provider by
116 constructing a memory region. Memory regions are required for
117 specific types of data transfer operations, such as RMA trans‐
118 fers (see below).
119
120 fi_av - Address Vector
121 Address vectors are used to map higher level addresses, such as
122 IP addresses, which may be more natural for an application to
123 use, into fabric specific addresses. The use of address vectors
124 allows providers to reduce the amount of memory required to
125 maintain large address look-up tables, and eliminate expensive
126 address resolution and look-up methods during data transfer op‐
127 erations.
128
130 Fabric endpoints are associated with multiple data transfer interfaces.
131 Each interface set is designed to support a specific style of communi‐
132 cation, with an endpoint allowing the different interfaces to be used
133 in conjunction. The following data transfer interfaces are defined by
134 libfabric.
135
136 fi_msg - Message Queue
137 Message queues expose a simple, message-based FIFO queue inter‐
138 face to the application. Message data transfers allow applica‐
139 tions to send and receive data with message boundaries being
140 maintained.
141
142 fi_tagged - Tagged Message Queues
143 Tagged message lists expose send/receive data transfer opera‐
144 tions built on the concept of tagged messaging. The tagged mes‐
145 sage queue is conceptually similar to standard message queues,
146 but with the addition of 64-bit tags for each message. Sent
147 messages are matched with receive buffers that are tagged with a
148 similar value.
149
150 fi_rma - Remote Memory Access
151 RMA transfers are one-sided operations that read or write data
152 directly to a remote memory region. Other than defining the ap‐
153 propriate memory region, RMA operations do not require interac‐
154 tion at the target side for the data transfer to complete.
155
156 fi_atomic - Atomic
157 Atomic operations can perform one of several operations on a re‐
158 mote memory region. Atomic operations include well-known func‐
159 tionality, such as atomic-add and compare-and-swap, plus several
160 other pre-defined calls. Unlike other data transfer interfaces,
161 atomic operations are aware of the data formatting at the target
162 memory region.
163
165 Logging can be controlled using the FI_LOG_LEVEL, FI_LOG_PROV, and
166 FI_LOG_SUBSYS environment variables.
167
168 FI_LOG_LEVEL
169 FI_LOG_LEVEL controls the amount of logging data that is output.
170 The following log levels are defined.
171
172 - Warn Warn is the least verbose setting and is intended for reporting
173 errors or warnings.
174
175 - Trace
176 Trace is more verbose and is meant to include non-detailed out‐
177 put helpful to tracing program execution.
178
179 - Info Info is high traffic and meant for detailed output.
180
181 - Debug
182 Debug is high traffic and is likely to impact application per‐
183 formance. Debug output is only available if the library has
184 been compiled with debugging enabled.
185
186 FI_LOG_PROV
187 The FI_LOG_PROV environment variable enables or disables logging
188 from specific providers. Providers can be enabled by listing
189 them in a comma separated fashion. If the list begins with the
190 `^' symbol, then the list will be negated. By default all
191 providers are enabled.
192
193 Example: To enable logging from the psm and sockets provider:
194 FI_LOG_PROV=“psm,sockets”
195
196 Example: To enable logging from providers other than psm:
197 FI_LOG_PROV=“^psm”
198
199 FI_LOG_SUBSYS
200 The FI_LOG_SUBSYS environment variable enables or disables log‐
201 ging at the subsystem level. The syntax for enabling or dis‐
202 abling subsystems is similar to that used for FI_LOG_PROV. The
203 following subsystems are defined.
204
205 - core Provides output related to the core framework and its management
206 of providers.
207
208 - fabric
209 Provides output specific to interactions associated with the
210 fabric object.
211
212 - domain
213 Provides output specific to interactions associated with the do‐
214 main object.
215
216 - ep_ctrl
217 Provides output specific to endpoint non-data transfer opera‐
218 tions, such as CM operations.
219
220 - ep_data
221 Provides output specific to endpoint data transfer operations.
222
223 - av Provides output specific to address vector operations.
224
225 - cq Provides output specific to completion queue operations.
226
227 - eq Provides output specific to event queue operations.
228
229 - mr Provides output specific to memory registration.
230
232 The libfabric build scripts will install all providers that are sup‐
233 ported by the installation system. Providers that are missing build
234 prerequisites will be disabled. Installed providers will dynamically
235 check for necessary hardware on library initialization and respond ap‐
236 propriately to application queries.
237
238 Users can enable or disable available providers through build configu‐
239 ration options. See `configure –help' for details. In general, a spe‐
240 cific provider can be controlled using the configure option `–enable-'.
241 For example, `–enable-udp' (or `–enable-udp=yes') will add the udp
242 provider to the build. To disable the provider, `–enable-udp=no' can
243 be used.
244
245 Providers can also be enable or disabled at run time using the
246 FI_PROVIDER environment variable. The FI_PROVIDER variable is set to a
247 comma separated list of providers to include. If the list begins with
248 the `^' symbol, then the list will be negated.
249
250 Example: To enable the udp and tcp providers only, set:
251 FI_PROVIDER=“udp,tcp”
252
253 The fi_info utility, which is included as part of the libfabric pack‐
254 age, can be used to retrieve information about which providers are
255 available in the system. Additionally, it can retrieve a list of all
256 environment variables that may be used to configure libfabric and each
257 provider. See fi_info(1) for more details.
258
260 Core features of libfabric and its providers may be configured by an
261 administrator through the use of environment variables. Man pages will
262 usually describe the most commonly accessed variables, such as those
263 mentioned above. However, libfabric defines interfaces for publishing
264 and obtaining environment variables. These are targeted for providers,
265 but allow applications and users to obtain the full list of variables
266 that may be set, along with a brief description of their use.
267
268 A full list of variables available may be obtained by running the
269 fi_info application, with the -e or –env command line option.
270
272 System Calls
273 Because libfabric is designed to provide applications direct access to
274 fabric hardware, there are limits on how libfabric resources may be
275 used in conjunction with system calls. These limitations are notable
276 for developers who may be familiar programming to the sockets inter‐
277 face. Although limits are provider specific, the following restric‐
278 tions apply to many providers and should be adhered to by applications
279 desiring portability across providers.
280
281 fork Fabric resources are not guaranteed to be available by child
282 processes. This includes objects, such as endpoints and comple‐
283 tion queues, as well as application controlled data buffers
284 which have been assigned to the network. For example, data buf‐
285 fers that have been registered with a fabric domain may not be
286 available in a child process because of copy on write restric‐
287 tions.
288
289 CUDA deadlock
290 In some cases, calls to cudaMemcpy within libfabric may result in a
291 deadlock. This typically occurs when a CUDA kernel blocks until a cud‐
292 aMemcpy on the host completes. To avoid this deadlock, cudaMemcpy may
293 be disabled by setting FI_HMEM_CUDA_ENABLE_XFER=0. If this environment
294 variable is set and there is a call to cudaMemcpy with libfabric, a
295 warning will be emitted and no copy will occur. Note that not all
296 providers support this option.
297
298 Another mechanism which can be used to avoid deadlock is Nvidia’s gdr‐
299 copy. Using gdrcopy requires an external library and kernel module
300 available at https://github.com/NVIDIA/gdrcopy. Libfabric must be con‐
301 figured with gdrcopy support using the --with-gdrcopy option, and be
302 run with FI_HMEM_CUDA_USE_GDRCOPY=1. This may be used in conjunction
303 with the above option to provide a method for copying to/from CUDA de‐
304 vice memory when cudaMemcpy cannot be used. Again, this may not be
305 supported by all providers.
306
308 libfabric releases maintain compatibility with older releases, so that
309 compiled applications can continue to work as-is, and previously writ‐
310 ten applications will compile against newer versions of the library
311 without needing source code changes. The changes below describe ABI
312 updates that have occurred and which libfabric release corresponds to
313 the changes.
314
315 Note that because most functions called by applications actually call
316 static inline functions, which in turn reference function pointers in
317 order to call directly into providers, libfabric only exports a handful
318 of functions directly. ABI changes are limited to those functions,
319 most notably the fi_getinfo call and its returned attribute structures.
320
321 The ABI version is independent from the libfabric release version.
322
323 ABI 1.0
324 The initial libfabric release (1.0.0) also corresponds to ABI version
325 1.0. The 1.0 ABI was unchanged for libfabric major.minor versions 1.0,
326 1.1, 1.2, 1.3, and 1.4.
327
328 ABI 1.1
329 A number of external data structures were appended starting with lib‐
330 fabric version 1.5. These changes included adding the fields to the
331 following data structures. The 1.1 ABI was exported by libfabric ver‐
332 sions 1.5 and 1.6.
333
334 fi_fabric_attr
335 Added api_version
336
337 fi_domain_attr
338 Added cntr_cnt, mr_iov_limit, caps, mode, auth_key,
339 auth_key_size, max_err_data, and mr_cnt fields. The mr_mode
340 field was also changed from an enum to an integer flag field.
341
342 fi_ep_attr
343 Added auth_key_size and auth_key fields.
344
345 ABI 1.2
346 The 1.2 ABI version was exported by libfabric versions 1.7 and 1.8, and
347 expanded the following structure.
348
349 fi_info
350 The fi_info structure was expanded to reference a new fabric ob‐
351 ject, fid_nic. When available, the fid_nic references a new set
352 of attributes related to network hardware details.
353
354 ABI 1.3
355 The 1.3 ABI version was exported by libfabric versions 1.9, 1.10, and
356 1.11. Added new fields to the following attributes:
357
358 fi_domain_attr
359 Added tclass
360
361 fi_tx_attr
362 Added tclass
363
364 ABI 1.4
365 The 1.4 ABI version was exported by libfabric 1.12. Added fi_tostr_r,
366 a thread-safe (re-entrant) version of fi_tostr.
367
368 ABI 1.5
369 ABI version starting with libfabric 1.13. Added new fi_open API call.
370
371 ABI 1.6
372 ABI version starting with libfabric 1.14. Added fi_log_ready for
373 providers.
374
376 fi_info(1), fi_provider(7), fi_getinfo(3), fi_endpoint(3), fi_do‐
377 main(3), fi_av(3), fi_eq(3), fi_cq(3), fi_cntr(3), fi_mr(3)
378
380 OpenFabrics.
381
382
383
384Libfabric Programmer’s Manual 2023-01-02 fabric(7)