1fabric(7)                      Libfabric v1.18.1                     fabric(7)
2
3
4

NAME

6       fabric - Fabric Interface Library
7

SYNOPSIS

9              #include <rdma/fabric.h>
10
11       Libfabric  is  a  high-performance  fabric software library designed to
12       provide low-latency interfaces to fabric  hardware.   For  an  in-depth
13       discussion of the motivation and design see fi_guide(7).
14

OVERVIEW

16       Libfabric  provides `process direct I/O' to application software commu‐
17       nicating across fabric software and hardware.  Process direct I/O, his‐
18       torically referred to as RDMA, allows an application to directly access
19       network resources without operating system interventions.  Data  trans‐
20       fers can occur directly to and from application memory.
21
22       There are two components to the libfabric software:
23
24       Fabric Providers
25              Conceptually,  a  fabric provider may be viewed as a local hard‐
26              ware NIC driver, though a provider is not limited by this  defi‐
27              nition.   The  first component of libfabric is a general purpose
28              framework that is capable of handling different types of  fabric
29              hardware.   All fabric hardware devices and their software driv‐
30              ers are required to support this  framework.   Devices  and  the
31              drivers  that  plug into the libfabric framework are referred to
32              as fabric providers, or simply providers.  Provider details  may
33              be found in fi_provider(7).
34
35       Fabric Interfaces
36              The second component is a set of communication operations.  Lib‐
37              fabric defines several  sets  of  communication  functions  that
38              providers can support.  It is not required that providers imple‐
39              ment all the interfaces that  are  defined;  however,  providers
40              clearly indicate which interfaces they do support.
41

FABRIC INTERFACES

43       The  fabric interfaces are designed such that they are cohesive and not
44       simply a union of disjoint interfaces.  The  interfaces  are  logically
45       divided  into  two  groups: control interfaces and communication opera‐
46       tions.  The control interfaces are a common set of operations that pro‐
47       vide  access  to local communication resources, such as address vectors
48       and event queues.  The communication operations expose particular  mod‐
49       els  of communication and fabric functionality, such as message queues,
50       remote memory access, and atomic operations.  Communication  operations
51       are associated with fabric endpoints.
52
53       Applications  will typically use the control interfaces to discover lo‐
54       cal capabilities and allocate necessary resources.  They will then  al‐
55       locate and configure a communication endpoint to send and receive data,
56       or perform other types of data transfers, with remote endpoints.
57

CONTROL INTERFACES

59       The control interfaces APIs provide applications access to network  re‐
60       sources.  This involves listing all the interfaces available, obtaining
61       the capabilities of the interfaces and opening a provider.
62
63       fi_getinfo - Fabric Information
64              The fi_getinfo call is the base call used to  discover  and  re‐
65              quest  fabric  services offered by the system.  Applications can
66              use this call to indicate the type of  communication  that  they
67              desire.   The  results from fi_getinfo, fi_info, are used to re‐
68              serve and configure fabric resources.
69
70       fi_getinfo returns a list of fi_info structures.  Each structure refer‐
71       ences  a  single  fabric  provider,  indicating the interfaces that the
72       provider supports, along with a  named  set  of  resources.   A  fabric
73       provider may include multiple fi_info structures in the returned list.
74
75       fi_fabric - Fabric Domain
76              A fabric domain represents a collection of hardware and software
77              resources that access a single physical or virtual network.  All
78              network  ports  on a system that can communicate with each other
79              through the fabric belong to the same fabric domain.   A  fabric
80              domain shares network addresses and can span multiple providers.
81              libfabric supports systems connected to multiple fabrics.
82
83       fi_domain - Access Domains
84              An access domain represents a single logical connection  into  a
85              fabric.   It  may  map  to a single physical or virtual NIC or a
86              port.  An access domain defines the boundary across which fabric
87              resources  may  be  associated.  Each access domain belongs to a
88              single fabric domain.
89
90       fi_endpoint - Fabric Endpoint
91              A fabric endpoint is a communication portal.  An endpoint may be
92              either  active or passive.  Passive endpoints are used to listen
93              for connection requests.   Active  endpoints  can  perform  data
94              transfers.  Endpoints are configured with specific communication
95              capabilities and data transfer interfaces.
96
97       fi_eq - Event Queue
98              Event queues, are used to collect and report the  completion  of
99              asynchronous  operations and events.  Event queues report events
100              that are not directly associated with data transfer operations.
101
102       fi_cq - Completion Queue
103              Completion queues are high-performance event queues used to  re‐
104              port the completion of data transfer operations.
105
106       fi_cntr - Event Counters
107              Event  counters are used to report the number of completed asyn‐
108              chronous  operations.   Event  counters  are  considered  light-
109              weight, in that a completion simply increments a counter, rather
110              than placing an entry into an event queue.
111
112       fi_mr - Memory Region
113              Memory regions describe application local  memory  buffers.   In
114              order for fabric resources to access application memory, the ap‐
115              plication must first grant permission to the fabric provider  by
116              constructing  a  memory region.  Memory regions are required for
117              specific types of data transfer operations, such as  RMA  trans‐
118              fers (see below).
119
120       fi_av - Address Vector
121              Address  vectors are used to map higher level addresses, such as
122              IP addresses, which may be more natural for  an  application  to
123              use, into fabric specific addresses.  The use of address vectors
124              allows providers to reduce the  amount  of  memory  required  to
125              maintain  large  address look-up tables, and eliminate expensive
126              address resolution and look-up methods during data transfer  op‐
127              erations.
128

DATA TRANSFER INTERFACES

130       Fabric endpoints are associated with multiple data transfer interfaces.
131       Each interface set is designed to support a specific style of  communi‐
132       cation,  with  an endpoint allowing the different interfaces to be used
133       in conjunction.  The following data transfer interfaces are defined  by
134       libfabric.
135
136       fi_msg - Message Queue
137              Message  queues expose a simple, message-based FIFO queue inter‐
138              face to the application.  Message data transfers allow  applica‐
139              tions  to  send  and  receive data with message boundaries being
140              maintained.
141
142       fi_tagged - Tagged Message Queues
143              Tagged message lists expose send/receive  data  transfer  opera‐
144              tions built on the concept of tagged messaging.  The tagged mes‐
145              sage queue is conceptually similar to standard  message  queues,
146              but  with  the  addition  of 64-bit tags for each message.  Sent
147              messages are matched with receive buffers that are tagged with a
148              similar value.
149
150       fi_rma - Remote Memory Access
151              RMA  transfers  are one-sided operations that read or write data
152              directly to a remote memory region.  Other than defining the ap‐
153              propriate  memory region, RMA operations do not require interac‐
154              tion at the target side for the data transfer to complete.
155
156       fi_atomic - Atomic
157              Atomic operations can perform one of several operations on a re‐
158              mote  memory region.  Atomic operations include well-known func‐
159              tionality, such as atomic-add and compare-and-swap, plus several
160              other pre-defined calls.  Unlike other data transfer interfaces,
161              atomic operations are aware of the data formatting at the target
162              memory region.
163

LOGGING INTERFACE

165       Logging  can  be  controlled  using  the FI_LOG_LEVEL, FI_LOG_PROV, and
166       FI_LOG_SUBSYS environment variables.
167
168       FI_LOG_LEVEL
169              FI_LOG_LEVEL controls the amount of logging data that is output.
170              The following log levels are defined.
171
172       - Warn Warn  is the least verbose setting and is intended for reporting
173              errors or warnings.
174
175       - Trace
176              Trace is more verbose and is meant to include non-detailed  out‐
177              put helpful to tracing program execution.
178
179       - Info Info is high traffic and meant for detailed output.
180
181       - Debug
182              Debug  is  high traffic and is likely to impact application per‐
183              formance.  Debug output is only available  if  the  library  has
184              been compiled with debugging enabled.
185
186       FI_LOG_PROV
187              The FI_LOG_PROV environment variable enables or disables logging
188              from specific providers.  Providers can be  enabled  by  listing
189              them  in a comma separated fashion.  If the list begins with the
190              `^' symbol, then the list  will  be  negated.   By  default  all
191              providers are enabled.
192
193       Example:   To  enable  logging  from  the  psm  and  sockets  provider:
194       FI_LOG_PROV=“psm,sockets”
195
196       Example:  To  enable   logging   from   providers   other   than   psm:
197       FI_LOG_PROV=“^psm”
198
199       FI_LOG_SUBSYS
200              The  FI_LOG_SUBSYS environment variable enables or disables log‐
201              ging at the subsystem level.  The syntax for  enabling  or  dis‐
202              abling  subsystems is similar to that used for FI_LOG_PROV.  The
203              following subsystems are defined.
204
205       - core Provides output related to the core framework and its management
206              of providers.
207
208       - fabric
209              Provides  output  specific  to  interactions associated with the
210              fabric object.
211
212       - domain
213              Provides output specific to interactions associated with the do‐
214              main object.
215
216       - ep_ctrl
217              Provides  output  specific  to endpoint non-data transfer opera‐
218              tions, such as CM operations.
219
220       - ep_data
221              Provides output specific to endpoint data transfer operations.
222
223       - av   Provides output specific to address vector operations.
224
225       - cq   Provides output specific to completion queue operations.
226
227       - eq   Provides output specific to event queue operations.
228
229       - mr   Provides output specific to memory registration.
230

PROVIDER INSTALLATION AND SELECTION

232       The libfabric build scripts will install all providers  that  are  sup‐
233       ported  by  the  installation system.  Providers that are missing build
234       prerequisites will be disabled.  Installed providers  will  dynamically
235       check  for necessary hardware on library initialization and respond ap‐
236       propriately to application queries.
237
238       Users can enable or disable available providers through build  configu‐
239       ration options.  See `configure –help' for details.  In general, a spe‐
240       cific provider can be controlled using the configure option `–enable-'.
241       For  example,  `–enable-udp'  (or  `–enable-udp=yes')  will add the udp
242       provider to the build.  To disable the provider,  `–enable-udp=no'  can
243       be used.
244
245       Providers  can  also  be  enable  or  disabled  at  run  time using the
246       FI_PROVIDER environment variable.  The FI_PROVIDER variable is set to a
247       comma  separated list of providers to include.  If the list begins with
248       the `^' symbol, then the list will be negated.
249
250       Example:  To  enable   the   udp   and   tcp   providers   only,   set:
251       FI_PROVIDER=“udp,tcp”
252
253       The  fi_info  utility, which is included as part of the libfabric pack‐
254       age, can be used to retrieve  information  about  which  providers  are
255       available  in  the system.  Additionally, it can retrieve a list of all
256       environment variables that may be used to configure libfabric and  each
257       provider.  See fi_info(1) for more details.
258

ENVIRONMENT VARIABLE CONTROLS

260       Core  features  of  libfabric and its providers may be configured by an
261       administrator through the use of environment variables.  Man pages will
262       usually  describe  the  most commonly accessed variables, such as those
263       mentioned above.  However, libfabric defines interfaces for  publishing
264       and obtaining environment variables.  These are targeted for providers,
265       but allow applications and users to obtain the full list  of  variables
266       that may be set, along with a brief description of their use.
267
268       A  full  list  of  variables  available  may be obtained by running the
269       fi_info application, with the -e or –env command line option.
270

NOTES

272   System Calls
273       Because libfabric is designed to provide applications direct access  to
274       fabric  hardware,  there  are  limits on how libfabric resources may be
275       used in conjunction with system calls.  These limitations  are  notable
276       for  developers  who  may be familiar programming to the sockets inter‐
277       face.  Although limits are provider specific,  the  following  restric‐
278       tions  apply to many providers and should be adhered to by applications
279       desiring portability across providers.
280
281       fork   Fabric resources are not guaranteed to  be  available  by  child
282              processes.  This includes objects, such as endpoints and comple‐
283              tion queues, as well  as  application  controlled  data  buffers
284              which have been assigned to the network.  For example, data buf‐
285              fers that have been registered with a fabric domain may  not  be
286              available  in  a child process because of copy on write restric‐
287              tions.
288
289   CUDA deadlock
290       In some cases, calls to cudaMemcpy within libfabric  may  result  in  a
291       deadlock.  This typically occurs when a CUDA kernel blocks until a cud‐
292       aMemcpy on the host completes.  To avoid this deadlock, cudaMemcpy  may
293       be disabled by setting FI_HMEM_CUDA_ENABLE_XFER=0.  If this environment
294       variable is set and there is a call to  cudaMemcpy  with  libfabric,  a
295       warning  will  be  emitted  and  no copy will occur.  Note that not all
296       providers support this option.
297
298       Another mechanism which can be used to avoid deadlock is Nvidia’s  gdr‐
299       copy.   Using  gdrcopy  requires  an external library and kernel module
300       available at https://github.com/NVIDIA/gdrcopy.  Libfabric must be con‐
301       figured  with  gdrcopy  support using the --with-gdrcopy option, and be
302       run with FI_HMEM_CUDA_USE_GDRCOPY=1.  This may be used  in  conjunction
303       with  the above option to provide a method for copying to/from CUDA de‐
304       vice memory when cudaMemcpy cannot be used.  Again,  this  may  not  be
305       supported by all providers.
306

ABI CHANGES

308       libfabric  releases maintain compatibility with older releases, so that
309       compiled applications can continue to work as-is, and previously  writ‐
310       ten  applications  will  compile  against newer versions of the library
311       without needing source code changes.  The changes  below  describe  ABI
312       updates  that  have occurred and which libfabric release corresponds to
313       the changes.
314
315       Note that because most functions called by applications  actually  call
316       static  inline  functions, which in turn reference function pointers in
317       order to call directly into providers, libfabric only exports a handful
318       of  functions  directly.   ABI  changes are limited to those functions,
319       most notably the fi_getinfo call and its returned attribute structures.
320
321       The ABI version is independent from the libfabric release version.
322
323   ABI 1.0
324       The initial libfabric release (1.0.0) also corresponds to  ABI  version
325       1.0.  The 1.0 ABI was unchanged for libfabric major.minor versions 1.0,
326       1.1, 1.2, 1.3, and 1.4.
327
328   ABI 1.1
329       A number of external data structures were appended starting  with  lib‐
330       fabric  version  1.5.   These changes included adding the fields to the
331       following data structures.  The 1.1 ABI was exported by libfabric  ver‐
332       sions 1.5 and 1.6.
333
334       fi_fabric_attr
335              Added api_version
336
337       fi_domain_attr
338              Added    cntr_cnt,    mr_iov_limit,    caps,   mode,   auth_key,
339              auth_key_size, max_err_data, and  mr_cnt  fields.   The  mr_mode
340              field was also changed from an enum to an integer flag field.
341
342       fi_ep_attr
343              Added auth_key_size and auth_key fields.
344
345   ABI 1.2
346       The 1.2 ABI version was exported by libfabric versions 1.7 and 1.8, and
347       expanded the following structure.
348
349       fi_info
350              The fi_info structure was expanded to reference a new fabric ob‐
351              ject, fid_nic.  When available, the fid_nic references a new set
352              of attributes related to network hardware details.
353
354   ABI 1.3
355       The 1.3 ABI version was exported by libfabric versions 1.9,  1.10,  and
356       1.11.  Added new fields to the following attributes:
357
358       fi_domain_attr
359              Added tclass
360
361       fi_tx_attr
362              Added tclass
363
364   ABI 1.4
365       The  1.4 ABI version was exported by libfabric 1.12.  Added fi_tostr_r,
366       a thread-safe (re-entrant) version of fi_tostr.
367
368   ABI 1.5
369       ABI version starting with libfabric 1.13.  Added new fi_open API call.
370
371   ABI 1.6
372       ABI version starting  with  libfabric  1.14.   Added  fi_log_ready  for
373       providers.
374

SEE ALSO

376       fi_info(1),   fi_provider(7),   fi_getinfo(3),  fi_endpoint(3),  fi_do‐
377       main(3), fi_av(3), fi_eq(3), fi_cq(3), fi_cntr(3), fi_mr(3)
378

AUTHORS

380       OpenFabrics.
381
382
383
384Libfabric Programmer’s Manual     2023-01-02                         fabric(7)
Impressum