1fabric(7) Libfabric v1.17.0 fabric(7)
2
3
4
6 fabric - Fabric Interface Library
7
9 #include <rdma/fabric.h>
10
11 Libfabric is a high-performance fabric software library designed to
12 provide low-latency interfaces to fabric hardware.
13
15 Libfabric provides `process direct I/O' to application software commu‐
16 nicating across fabric software and hardware. Process direct I/O, his‐
17 torically referred to as RDMA, allows an application to directly access
18 network resources without operating system interventions. Data trans‐
19 fers can occur directly to and from application memory.
20
21 There are two components to the libfabric software:
22
23 Fabric Providers
24 Conceptually, a fabric provider may be viewed as a local hard‐
25 ware NIC driver, though a provider is not limited by this defi‐
26 nition. The first component of libfabric is a general purpose
27 framework that is capable of handling different types of fabric
28 hardware. All fabric hardware devices and their software driv‐
29 ers are required to support this framework. Devices and the
30 drivers that plug into the libfabric framework are referred to
31 as fabric providers, or simply providers. Provider details may
32 be found in fi_provider(7).
33
34 Fabric Interfaces
35 The second component is a set of communication operations. Lib‐
36 fabric defines several sets of communication functions that
37 providers can support. It is not required that providers imple‐
38 ment all the interfaces that are defined; however, providers
39 clearly indicate which interfaces they do support.
40
42 The fabric interfaces are designed such that they are cohesive and not
43 simply a union of disjoint interfaces. The interfaces are logically
44 divided into two groups: control interfaces and communication opera‐
45 tions. The control interfaces are a common set of operations that pro‐
46 vide access to local communication resources, such as address vectors
47 and event queues. The communication operations expose particular mod‐
48 els of communication and fabric functionality, such as message queues,
49 remote memory access, and atomic operations. Communication operations
50 are associated with fabric endpoints.
51
52 Applications will typically use the control interfaces to discover lo‐
53 cal capabilities and allocate necessary resources. They will then al‐
54 locate and configure a communication endpoint to send and receive data,
55 or perform other types of data transfers, with remote endpoints.
56
58 The control interfaces APIs provide applications access to network re‐
59 sources. This involves listing all the interfaces available, obtaining
60 the capabilities of the interfaces and opening a provider.
61
62 fi_getinfo - Fabric Information
63 The fi_getinfo call is the base call used to discover and re‐
64 quest fabric services offered by the system. Applications can
65 use this call to indicate the type of communication that they
66 desire. The results from fi_getinfo, fi_info, are used to re‐
67 serve and configure fabric resources.
68
69 fi_getinfo returns a list of fi_info structures. Each structure refer‐
70 ences a single fabric provider, indicating the interfaces that the
71 provider supports, along with a named set of resources. A fabric
72 provider may include multiple fi_info structures in the returned list.
73
74 fi_fabric - Fabric Domain
75 A fabric domain represents a collection of hardware and software
76 resources that access a single physical or virtual network. All
77 network ports on a system that can communicate with each other
78 through the fabric belong to the same fabric domain. A fabric
79 domain shares network addresses and can span multiple providers.
80 libfabric supports systems connected to multiple fabrics.
81
82 fi_domain - Access Domains
83 An access domain represents a single logical connection into a
84 fabric. It may map to a single physical or virtual NIC or a
85 port. An access domain defines the boundary across which fabric
86 resources may be associated. Each access domain belongs to a
87 single fabric domain.
88
89 fi_endpoint - Fabric Endpoint
90 A fabric endpoint is a communication portal. An endpoint may be
91 either active or passive. Passive endpoints are used to listen
92 for connection requests. Active endpoints can perform data
93 transfers. Endpoints are configured with specific communication
94 capabilities and data transfer interfaces.
95
96 fi_eq - Event Queue
97 Event queues, are used to collect and report the completion of
98 asynchronous operations and events. Event queues report events
99 that are not directly associated with data transfer operations.
100
101 fi_cq - Completion Queue
102 Completion queues are high-performance event queues used to re‐
103 port the completion of data transfer operations.
104
105 fi_cntr - Event Counters
106 Event counters are used to report the number of completed asyn‐
107 chronous operations. Event counters are considered light-
108 weight, in that a completion simply increments a counter, rather
109 than placing an entry into an event queue.
110
111 fi_mr - Memory Region
112 Memory regions describe application local memory buffers. In
113 order for fabric resources to access application memory, the ap‐
114 plication must first grant permission to the fabric provider by
115 constructing a memory region. Memory regions are required for
116 specific types of data transfer operations, such as RMA trans‐
117 fers (see below).
118
119 fi_av - Address Vector
120 Address vectors are used to map higher level addresses, such as
121 IP addresses, which may be more natural for an application to
122 use, into fabric specific addresses. The use of address vectors
123 allows providers to reduce the amount of memory required to
124 maintain large address look-up tables, and eliminate expensive
125 address resolution and look-up methods during data transfer op‐
126 erations.
127
129 Fabric endpoints are associated with multiple data transfer interfaces.
130 Each interface set is designed to support a specific style of communi‐
131 cation, with an endpoint allowing the different interfaces to be used
132 in conjunction. The following data transfer interfaces are defined by
133 libfabric.
134
135 fi_msg - Message Queue
136 Message queues expose a simple, message-based FIFO queue inter‐
137 face to the application. Message data transfers allow applica‐
138 tions to send and receive data with message boundaries being
139 maintained.
140
141 fi_tagged - Tagged Message Queues
142 Tagged message lists expose send/receive data transfer opera‐
143 tions built on the concept of tagged messaging. The tagged mes‐
144 sage queue is conceptually similar to standard message queues,
145 but with the addition of 64-bit tags for each message. Sent
146 messages are matched with receive buffers that are tagged with a
147 similar value.
148
149 fi_rma - Remote Memory Access
150 RMA transfers are one-sided operations that read or write data
151 directly to a remote memory region. Other than defining the ap‐
152 propriate memory region, RMA operations do not require interac‐
153 tion at the target side for the data transfer to complete.
154
155 fi_atomic - Atomic
156 Atomic operations can perform one of several operations on a re‐
157 mote memory region. Atomic operations include well-known func‐
158 tionality, such as atomic-add and compare-and-swap, plus several
159 other pre-defined calls. Unlike other data transfer interfaces,
160 atomic operations are aware of the data formatting at the target
161 memory region.
162
164 Logging can be controlled using the FI_LOG_LEVEL, FI_LOG_PROV, and
165 FI_LOG_SUBSYS environment variables.
166
167 FI_LOG_LEVEL
168 FI_LOG_LEVEL controls the amount of logging data that is output.
169 The following log levels are defined.
170
171 - Warn Warn is the least verbose setting and is intended for reporting
172 errors or warnings.
173
174 - Trace
175 Trace is more verbose and is meant to include non-detailed out‐
176 put helpful to tracing program execution.
177
178 - Info Info is high traffic and meant for detailed output.
179
180 - Debug
181 Debug is high traffic and is likely to impact application per‐
182 formance. Debug output is only available if the library has
183 been compiled with debugging enabled.
184
185 FI_LOG_PROV
186 The FI_LOG_PROV environment variable enables or disables logging
187 from specific providers. Providers can be enabled by listing
188 them in a comma separated fashion. If the list begins with the
189 `^' symbol, then the list will be negated. By default all
190 providers are enabled.
191
192 Example: To enable logging from the psm and sockets provider:
193 FI_LOG_PROV=“psm,sockets”
194
195 Example: To enable logging from providers other than psm:
196 FI_LOG_PROV=“^psm”
197
198 FI_LOG_SUBSYS
199 The FI_LOG_SUBSYS environment variable enables or disables log‐
200 ging at the subsystem level. The syntax for enabling or dis‐
201 abling subsystems is similar to that used for FI_LOG_PROV. The
202 following subsystems are defined.
203
204 - core Provides output related to the core framework and its management
205 of providers.
206
207 - fabric
208 Provides output specific to interactions associated with the
209 fabric object.
210
211 - domain
212 Provides output specific to interactions associated with the do‐
213 main object.
214
215 - ep_ctrl
216 Provides output specific to endpoint non-data transfer opera‐
217 tions, such as CM operations.
218
219 - ep_data
220 Provides output specific to endpoint data transfer operations.
221
222 - av Provides output specific to address vector operations.
223
224 - cq Provides output specific to completion queue operations.
225
226 - eq Provides output specific to event queue operations.
227
228 - mr Provides output specific to memory registration.
229
231 The libfabric build scripts will install all providers that are sup‐
232 ported by the installation system. Providers that are missing build
233 prerequisites will be disabled. Installed providers will dynamically
234 check for necessary hardware on library initialization and respond ap‐
235 propriately to application queries.
236
237 Users can enable or disable available providers through build configu‐
238 ration options. See `configure –help' for details. In general, a spe‐
239 cific provider can be controlled using the configure option `–enable-'.
240 For example, `–enable-udp' (or `–enable-udp=yes') will add the udp
241 provider to the build. To disable the provider, `–enable-udp=no' can
242 be used.
243
244 Providers can also be enable or disabled at run time using the
245 FI_PROVIDER environment variable. The FI_PROVIDER variable is set to a
246 comma separated list of providers to include. If the list begins with
247 the `^' symbol, then the list will be negated.
248
249 Example: To enable the udp and tcp providers only, set:
250 FI_PROVIDER=“udp,tcp”
251
252 The fi_info utility, which is included as part of the libfabric pack‐
253 age, can be used to retrieve information about which providers are
254 available in the system. Additionally, it can retrieve a list of all
255 environment variables that may be used to configure libfabric and each
256 provider. See fi_info(1) for more details.
257
259 Core features of libfabric and its providers may be configured by an
260 administrator through the use of environment variables. Man pages will
261 usually describe the most commonly accessed variables, such as those
262 mentioned above. However, libfabric defines interfaces for publishing
263 and obtaining environment variables. These are targeted for providers,
264 but allow applications and users to obtain the full list of variables
265 that may be set, along with a brief description of their use.
266
267 A full list of variables available may be obtained by running the
268 fi_info application, with the -e or –env command line option.
269
271 System Calls
272 Because libfabric is designed to provide applications direct access to
273 fabric hardware, there are limits on how libfabric resources may be
274 used in conjunction with system calls. These limitations are notable
275 for developers who may be familiar programming to the sockets inter‐
276 face. Although limits are provider specific, the following restric‐
277 tions apply to many providers and should be adhered to by applications
278 desiring portability across providers.
279
280 fork Fabric resources are not guaranteed to be available by child
281 processes. This includes objects, such as endpoints and comple‐
282 tion queues, as well as application controlled data buffers
283 which have been assigned to the network. For example, data buf‐
284 fers that have been registered with a fabric domain may not be
285 available in a child process because of copy on write restric‐
286 tions.
287
288 CUDA deadlock
289 In some cases, calls to cudaMemcpy within libfabric may result in a
290 deadlock. This typically occurs when a CUDA kernel blocks until a cud‐
291 aMemcpy on the host completes. To avoid this deadlock, cudaMemcpy may
292 be disabled by setting FI_HMEM_CUDA_ENABLE_XFER=0. If this environment
293 variable is set and there is a call to cudaMemcpy with libfabric, a
294 warning will be emitted and no copy will occur. Note that not all
295 providers support this option.
296
297 Another mechanism which can be used to avoid deadlock is Nvidia’s gdr‐
298 copy. Using gdrcopy requires an external library and kernel module
299 available at https://github.com/NVIDIA/gdrcopy. Libfabric must be con‐
300 figured with gdrcopy support using the --with-gdrcopy option, and be
301 run with FI_HMEM_CUDA_USE_GDRCOPY=1. This may be used in conjunction
302 with the above option to provide a method for copying to/from CUDA de‐
303 vice memory when cudaMemcpy cannot be used. Again, this may not be
304 supported by all providers.
305
307 libfabric releases maintain compatibility with older releases, so that
308 compiled applications can continue to work as-is, and previously writ‐
309 ten applications will compile against newer versions of the library
310 without needing source code changes. The changes below describe ABI
311 updates that have occurred and which libfabric release corresponds to
312 the changes.
313
314 Note that because most functions called by applications actually call
315 static inline functions, which in turn reference function pointers in
316 order to call directly into providers, libfabric only exports a handful
317 of functions directly. ABI changes are limited to those functions,
318 most notably the fi_getinfo call and its returned attribute structures.
319
320 The ABI version is independent from the libfabric release version.
321
322 ABI 1.0
323 The initial libfabric release (1.0.0) also corresponds to ABI version
324 1.0. The 1.0 ABI was unchanged for libfabric major.minor versions 1.0,
325 1.1, 1.2, 1.3, and 1.4.
326
327 ABI 1.1
328 A number of external data structures were appended starting with lib‐
329 fabric version 1.5. These changes included adding the fields to the
330 following data structures. The 1.1 ABI was exported by libfabric ver‐
331 sions 1.5 and 1.6.
332
333 fi_fabric_attr
334 Added api_version
335
336 fi_domain_attr
337 Added cntr_cnt, mr_iov_limit, caps, mode, auth_key,
338 auth_key_size, max_err_data, and mr_cnt fields. The mr_mode
339 field was also changed from an enum to an integer flag field.
340
341 fi_ep_attr
342 Added auth_key_size and auth_key fields.
343
344 ABI 1.2
345 The 1.2 ABI version was exported by libfabric versions 1.7 and 1.8, and
346 expanded the following structure.
347
348 fi_info
349 The fi_info structure was expanded to reference a new fabric ob‐
350 ject, fid_nic. When available, the fid_nic references a new set
351 of attributes related to network hardware details.
352
353 ABI 1.3
354 The 1.3 ABI version was exported by libfabric versions 1.9, 1.10, and
355 1.11. Added new fields to the following attributes:
356
357 fi_domain_attr
358 Added tclass
359
360 fi_tx_attr
361 Added tclass
362
363 ABI 1.4
364 The 1.4 ABI version was exported by libfabric 1.12. Added fi_tostr_r,
365 a thread-safe (re-entrant) version of fi_tostr.
366
367 ABI 1.5
368 ABI version starting with libfabric 1.13. Added new fi_open API call.
369
370 ABI 1.6
371 ABI version starting with libfabric 1.14. Added fi_log_ready for
372 providers.
373
375 fi_info(1), fi_provider(7), fi_getinfo(3), fi_endpoint(3), fi_do‐
376 main(3), fi_av(3), fi_eq(3), fi_cq(3), fi_cntr(3), fi_mr(3)
377
379 OpenFabrics.
380
381
382
383Libfabric Programmer’s Manual 2022-12-11 fabric(7)