1fabric(7)                      Libfabric v1.15.1                     fabric(7)
2
3
4

NAME

6       fabric - Fabric Interface Library
7

SYNOPSIS

9              #include <rdma/fabric.h>
10
11       Libfabric  is  a  high-performance  fabric software library designed to
12       provide low-latency interfaces to fabric hardware.
13

OVERVIEW

15       Libfabric provides `process direct I/O' to application software  commu‐
16       nicating across fabric software and hardware.  Process direct I/O, his‐
17       torically referred to as RDMA, allows an application to directly access
18       network  resources without operating system interventions.  Data trans‐
19       fers can occur directly to and from application memory.
20
21       There are two components to the libfabric software:
22
23       Fabric Providers
24              Conceptually, a fabric provider may be viewed as a  local  hard‐
25              ware  NIC driver, though a provider is not limited by this defi‐
26              nition.  The first component of libfabric is a  general  purpose
27              framework  that is capable of handling different types of fabric
28              hardware.  All fabric hardware devices and their software  driv‐
29              ers  are  required  to  support this framework.  Devices and the
30              drivers that plug into the libfabric framework are  referred  to
31              as  fabric providers, or simply providers.  Provider details may
32              be found in fi_provider(7).
33
34       Fabric Interfaces
35              The second component is a set of communication operations.  Lib‐
36              fabric  defines  several  sets  of  communication functions that
37              providers can support.  It is not required that providers imple‐
38              ment  all  the  interfaces  that are defined; however, providers
39              clearly indicate which interfaces they do support.
40

FABRIC INTERFACES

42       The fabric interfaces are designed such that they are cohesive and  not
43       simply  a  union  of disjoint interfaces.  The interfaces are logically
44       divided into two groups: control interfaces  and  communication  opera‐
45       tions.  The control interfaces are a common set of operations that pro‐
46       vide access to local communication resources, such as  address  vectors
47       and  event queues.  The communication operations expose particular mod‐
48       els of communication and fabric functionality, such as message  queues,
49       remote  memory access, and atomic operations.  Communication operations
50       are associated with fabric endpoints.
51
52       Applications will typically use the control interfaces to discover  lo‐
53       cal  capabilities and allocate necessary resources.  They will then al‐
54       locate and configure a communication endpoint to send and receive data,
55       or perform other types of data transfers, with remote endpoints.
56

CONTROL INTERFACES

58       The  control interfaces APIs provide applications access to network re‐
59       sources.  This involves listing all the interfaces available, obtaining
60       the capabilities of the interfaces and opening a provider.
61
62       fi_getinfo - Fabric Information
63              The  fi_getinfo  call  is the base call used to discover and re‐
64              quest fabric services offered by the system.   Applications  can
65              use  this  call  to indicate the type of communication that they
66              desire.  The results from fi_getinfo, fi_info, are used  to  re‐
67              serve and configure fabric resources.
68
69       fi_getinfo returns a list of fi_info structures.  Each structure refer‐
70       ences a single fabric provider,  indicating  the  interfaces  that  the
71       provider  supports,  along  with  a  named  set of resources.  A fabric
72       provider may include multiple fi_info structures in the returned list.
73
74       fi_fabric - Fabric Domain
75              A fabric domain represents a collection of hardware and software
76              resources that access a single physical or virtual network.  All
77              network ports on a system that can communicate with  each  other
78              through  the  fabric belong to the same fabric domain.  A fabric
79              domain shares network addresses and can span multiple providers.
80              libfabric supports systems connected to multiple fabrics.
81
82       fi_domain - Access Domains
83              An  access  domain represents a single logical connection into a
84              fabric.  It may map to a single physical or  virtual  NIC  or  a
85              port.  An access domain defines the boundary across which fabric
86              resources may be associated.  Each access domain  belongs  to  a
87              single fabric domain.
88
89       fi_endpoint - Fabric Endpoint
90              A fabric endpoint is a communication portal.  An endpoint may be
91              either active or passive.  Passive endpoints are used to  listen
92              for  connection  requests.   Active  endpoints  can perform data
93              transfers.  Endpoints are configured with specific communication
94              capabilities and data transfer interfaces.
95
96       fi_eq - Event Queue
97              Event  queues,  are used to collect and report the completion of
98              asynchronous operations and events.  Event queues report  events
99              that are not directly associated with data transfer operations.
100
101       fi_cq - Completion Queue
102              Completion  queues are high-performance event queues used to re‐
103              port the completion of data transfer operations.
104
105       fi_cntr - Event Counters
106              Event counters are used to report the number of completed  asyn‐
107              chronous    operations.     Event    counters   are   considered
108              light-weight, in that a completion simply increments a  counter,
109              rather than placing an entry into an event queue.
110
111       fi_mr - Memory Region
112              Memory  regions  describe  application local memory buffers.  In
113              order for fabric resources to access application memory, the ap‐
114              plication  must first grant permission to the fabric provider by
115              constructing a memory region.  Memory regions are  required  for
116              specific  types  of data transfer operations, such as RMA trans‐
117              fers (see below).
118
119       fi_av - Address Vector
120              Address vectors are used to map higher level addresses, such  as
121              IP  addresses,  which  may be more natural for an application to
122              use, into fabric specific addresses.  The use of address vectors
123              allows  providers  to  reduce  the  amount of memory required to
124              maintain large address look-up tables, and  eliminate  expensive
125              address  resolution and look-up methods during data transfer op‐
126              erations.
127

DATA TRANSFER INTERFACES

129       Fabric endpoints are associated with multiple data transfer interfaces.
130       Each  interface set is designed to support a specific style of communi‐
131       cation, with an endpoint allowing the different interfaces to  be  used
132       in  conjunction.  The following data transfer interfaces are defined by
133       libfabric.
134
135       fi_msg - Message Queue
136              Message queues expose a simple, message-based FIFO queue  inter‐
137              face  to the application.  Message data transfers allow applica‐
138              tions to send and receive data  with  message  boundaries  being
139              maintained.
140
141       fi_tagged - Tagged Message Queues
142              Tagged  message  lists  expose send/receive data transfer opera‐
143              tions built on the concept of tagged messaging.  The tagged mes‐
144              sage  queue  is conceptually similar to standard message queues,
145              but with the addition of 64-bit tags  for  each  message.   Sent
146              messages are matched with receive buffers that are tagged with a
147              similar value.
148
149       fi_rma - Remote Memory Access
150              RMA transfers are one-sided operations that read or  write  data
151              directly to a remote memory region.  Other than defining the ap‐
152              propriate memory region, RMA operations do not require  interac‐
153              tion at the target side for the data transfer to complete.
154
155       fi_atomic - Atomic
156              Atomic operations can perform one of several operations on a re‐
157              mote memory region.  Atomic operations include well-known  func‐
158              tionality, such as atomic-add and compare-and-swap, plus several
159              other pre-defined calls.  Unlike other data transfer interfaces,
160              atomic operations are aware of the data formatting at the target
161              memory region.
162

LOGGING INTERFACE

164       Logging can be controlled  using  the  FI_LOG_LEVEL,  FI_LOG_PROV,  and
165       FI_LOG_SUBSYS environment variables.
166
167       FI_LOG_LEVEL
168              FI_LOG_LEVEL controls the amount of logging data that is output.
169              The following log levels are defined.
170
171       - Warn Warn is the least verbose setting and is intended for  reporting
172              errors or warnings.
173
174       - Trace
175              Trace  is more verbose and is meant to include non-detailed out‐
176              put helpful to tracing program execution.
177
178       - Info Info is high traffic and meant for detailed output.
179
180       - Debug
181              Debug is high traffic and is likely to impact  application  per‐
182              formance.   Debug  output  is  only available if the library has
183              been compiled with debugging enabled.
184
185       FI_LOG_PROV
186              The FI_LOG_PROV environment variable enables or disables logging
187              from  specific  providers.   Providers can be enabled by listing
188              them in a comma separated fashion.  If the list begins with  the
189              `^'  symbol,  then  the  list  will  be negated.  By default all
190              providers are enabled.
191
192       Example:  To  enable  logging  from  the  psm  and  sockets   provider:
193       FI_LOG_PROV=“psm,sockets”
194
195       Example:   To   enable   logging   from   providers   other  than  psm:
196       FI_LOG_PROV=“^psm”
197
198       FI_LOG_SUBSYS
199              The FI_LOG_SUBSYS environment variable enables or disables  log‐
200              ging  at  the  subsystem level.  The syntax for enabling or dis‐
201              abling subsystems is similar to that used for FI_LOG_PROV.   The
202              following subsystems are defined.
203
204       - core Provides output related to the core framework and its management
205              of providers.
206
207       - fabric
208              Provides output specific to  interactions  associated  with  the
209              fabric object.
210
211       - domain
212              Provides output specific to interactions associated with the do‐
213              main object.
214
215       - ep_ctrl
216              Provides output specific to endpoint  non-data  transfer  opera‐
217              tions, such as CM operations.
218
219       - ep_data
220              Provides output specific to endpoint data transfer operations.
221
222       - av   Provides output specific to address vector operations.
223
224       - cq   Provides output specific to completion queue operations.
225
226       - eq   Provides output specific to event queue operations.
227
228       - mr   Provides output specific to memory registration.
229

PROVIDER INSTALLATION AND SELECTION

231       The  libfabric  build  scripts will install all providers that are sup‐
232       ported by the installation system.  Providers that  are  missing  build
233       prerequisites  will  be disabled.  Installed providers will dynamically
234       check for necessary hardware on library initialization and respond  ap‐
235       propriately to application queries.
236
237       Users  can enable or disable available providers through build configu‐
238       ration options.  See `configure –help' for details.  In general, a spe‐
239       cific provider can be controlled using the configure option `–enable-'.
240       For example, `–enable-udp' (or  `–enable-udp=yes')  will  add  the  udp
241       provider  to  the build.  To disable the provider, `–enable-udp=no' can
242       be used.
243
244       Providers can also  be  enable  or  disabled  at  run  time  using  the
245       FI_PROVIDER environment variable.  The FI_PROVIDER variable is set to a
246       comma separated list of providers to include.  If the list begins  with
247       the `^' symbol, then the list will be negated.
248
249       Example:   To   enable   the   udp   and   tcp   providers  only,  set:
250       FI_PROVIDER=“udp,tcp”
251
252       The fi_info utility, which is included as part of the  libfabric  pack‐
253       age,  can  be  used  to  retrieve information about which providers are
254       available in the system.  Additionally, it can retrieve a list  of  all
255       environment  variables that may be used to configure libfabric and each
256       provider.  See fi_info(1) for more details.
257

ENVIRONMENT VARIABLE CONTROLS

259       Core features of libfabric and its providers may be  configured  by  an
260       administrator through the use of environment variables.  Man pages will
261       usually describe the most commonly accessed variables,  such  as  those
262       mentioned  above.  However, libfabric defines interfaces for publishing
263       and obtaining environment variables.  These are targeted for providers,
264       but  allow  applications and users to obtain the full list of variables
265       that may be set, along with a brief description of their use.
266
267       A full list of variables available  may  be  obtained  by  running  the
268       fi_info application, with the -e or –env command line option.
269

NOTES

271   System Calls
272       Because  libfabric is designed to provide applications direct access to
273       fabric hardware, there are limits on how  libfabric  resources  may  be
274       used  in  conjunction with system calls.  These limitations are notable
275       for developers who may be familiar programming to  the  sockets  inter‐
276       face.   Although  limits  are provider specific, the following restric‐
277       tions apply to many providers and should be adhered to by  applications
278       desiring portability across providers.
279
280       fork   Fabric  resources  are  not  guaranteed to be available by child
281              processes.  This includes objects, such as endpoints and comple‐
282              tion  queues,  as  well  as  application controlled data buffers
283              which have been assigned to the network.  For example, data buf‐
284              fers  that  have been registered with a fabric domain may not be
285              available in a child process because of copy on  write  restric‐
286              tions.
287
288   CUDA deadlock
289       In  some  cases,  calls  to cudaMemcpy within libfabric may result in a
290       deadlock.  This typically occurs when a CUDA kernel blocks until a cud‐
291       aMemcpy  on the host completes.  To avoid this deadlock, cudaMemcpy may
292       be disabled by setting FI_HMEM_CUDA_ENABLE_XFER=0.  If this environment
293       variable  is  set  and  there is a call to cudaMemcpy with libfabric, a
294       warning will be emitted and no copy will  occur.   Note  that  not  all
295       providers support this option.
296
297       Another  mechanism which can be used to avoid deadlock is Nvidia’s gdr‐
298       copy.  Using gdrcopy requires an external  library  and  kernel  module
299       available at https://github.com/NVIDIA/gdrcopy.  Libfabric must be con‐
300       figured with gdrcopy support using the --with-gdrcopy  option,  and  be
301       run  with  FI_HMEM_CUDA_USE_GDRCOPY=1.  This may be used in conjunction
302       with the above option to provide a method for copying to/from CUDA  de‐
303       vice  memory  when  cudaMemcpy  cannot be used.  Again, this may not be
304       supported by all providers.
305

ABI CHANGES

307       libfabric releases maintain compatibility with older releases, so  that
308       compiled  applications can continue to work as-is, and previously writ‐
309       ten applications will compile against newer  versions  of  the  library
310       without  needing  source  code changes.  The changes below describe ABI
311       updates that have occurred and which libfabric release  corresponds  to
312       the changes.
313
314       Note  that  because most functions called by applications actually call
315       static inline functions, which in turn reference function  pointers  in
316       order to call directly into providers, libfabric only exports a handful
317       of functions directly.  ABI changes are  limited  to  those  functions,
318       most notably the fi_getinfo call and its returned attribute structures.
319
320       The ABI version is independent from the libfabric release version.
321
322   ABI 1.0
323       The  initial  libfabric release (1.0.0) also corresponds to ABI version
324       1.0.  The 1.0 ABI was unchanged for libfabric major.minor versions 1.0,
325       1.1, 1.2, 1.3, and 1.4.
326
327   ABI 1.1
328       A  number  of external data structures were appended starting with lib‐
329       fabric version 1.5.  These changes included adding the  fields  to  the
330       following  data structures.  The 1.1 ABI was exported by libfabric ver‐
331       sions 1.5 and 1.6.
332
333       fi_fabric_attr
334              Added api_version
335
336       fi_domain_attr
337              Added   cntr_cnt,   mr_iov_limit,    caps,    mode,    auth_key,
338              auth_key_size,  max_err_data,  and  mr_cnt  fields.  The mr_mode
339              field was also changed from an enum to an integer flag field.
340
341       fi_ep_attr
342              Added auth_key_size and auth_key fields.
343
344   ABI 1.2
345       The 1.2 ABI version was exported by libfabric versions 1.7 and 1.8, and
346       expanded the following structure.
347
348       fi_info
349              The fi_info structure was expanded to reference a new fabric ob‐
350              ject, fid_nic.  When available, the fid_nic references a new set
351              of attributes related to network hardware details.
352
353   ABI 1.3
354       The  1.3  ABI version was exported by libfabric versions 1.9, 1.10, and
355       1.11.  Added new fields to the following attributes:
356
357       fi_domain_attr
358              Added tclass
359
360       fi_tx_attr
361              Added tclass
362
363   ABI 1.4
364       The 1.4 ABI version was exported by libfabric 1.12.  Added  fi_tostr_r,
365       a thread-safe (re-entrant) version of fi_tostr.
366
367   ABI 1.5
368       ABI version starting with libfabric 1.13.  Added new fi_open API call.
369
370   ABI 1.6
371       ABI  version  starting  with  libfabric  1.14.   Added fi_log_ready for
372       providers.
373

SEE ALSO

375       fi_info(1),  fi_provider(7),  fi_getinfo(3),   fi_endpoint(3),   fi_do‐
376       main(3), fi_av(3), fi_eq(3), fi_cq(3), fi_cntr(3), fi_mr(3)
377

AUTHORS

379       OpenFabrics.
380
381
382
383Libfabric Programmer’s Manual     2021-09-22                         fabric(7)
Impressum