1fi_av(3) Libfabric v1.17.0 fi_av(3)
2
3
4
6 fi_av - Address vector operations
7
8 fi_av_open / fi_close
9 Open or close an address vector
10
11 fi_av_bind
12 Associate an address vector with an event queue.
13
14 fi_av_insert / fi_av_insertsvc / fi_av_remove
15 Insert/remove an address into/from the address vector.
16
17 fi_av_lookup
18 Retrieve an address stored in the address vector.
19
20 fi_av_straddr
21 Convert an address into a printable string.
22
24 #include <rdma/fi_domain.h>
25
26 int fi_av_open(struct fid_domain *domain, struct fi_av_attr *attr,
27 struct fid_av **av, void *context);
28
29 int fi_close(struct fid *av);
30
31 int fi_av_bind(struct fid_av *av, struct fid *eq, uint64_t flags);
32
33 int fi_av_insert(struct fid_av *av, void *addr, size_t count,
34 fi_addr_t *fi_addr, uint64_t flags, void *context);
35
36 int fi_av_insertsvc(struct fid_av *av, const char *node,
37 const char *service, fi_addr_t *fi_addr, uint64_t flags,
38 void *context);
39
40 int fi_av_insertsym(struct fid_av *av, const char *node,
41 size_t nodecnt, const char *service, size_t svccnt,
42 fi_addr_t *fi_addr, uint64_t flags, void *context);
43
44 int fi_av_remove(struct fid_av *av, fi_addr_t *fi_addr, size_t count,
45 uint64_t flags);
46
47 int fi_av_lookup(struct fid_av *av, fi_addr_t fi_addr,
48 void *addr, size_t *addrlen);
49
50 fi_addr_t fi_rx_addr(fi_addr_t fi_addr, int rx_index,
51 int rx_ctx_bits);
52
53 const char * fi_av_straddr(struct fid_av *av, const void *addr,
54 char *buf, size_t *len);
55
57 domain Resource domain
58
59 av Address vector
60
61 eq Event queue
62
63 attr Address vector attributes
64
65 context
66 User specified context associated with the address vector or in‐
67 sert operation.
68
69 addr Buffer containing one or more addresses to insert into address
70 vector.
71
72 addrlen
73 On input, specifies size of addr buffer. On output, stores num‐
74 ber of bytes written to addr buffer.
75
76 fi_addr
77 For insert, a reference to an array where returned fabric ad‐
78 dresses will be written. For remove, one or more fabric ad‐
79 dresses to remove. If FI_AV_USER_ID is requested, also used as
80 input into insert calls to assign the user ID with the added ad‐
81 dress.
82
83 count Number of addresses to insert/remove from an AV.
84
85 flags Additional flags to apply to the operation.
86
88 Address vectors are used to map higher-level addresses, which may be
89 more natural for an application to use, into fabric specific addresses.
90 For example, an endpoint may be associated with a struct sockaddr_in
91 address, indicating the endpoint is reachable using a TCP port number
92 over an IPv4 address. This may hold even if the endpoint communicates
93 using a proprietary network protocol. The purpose of the AV is to as‐
94 sociate a higher-level address with a simpler, more efficient value
95 that can be used by the libfabric API in a fabric agnostic way. The
96 mapped address is of type fi_addr_t and is returned through an AV in‐
97 sertion call. The fi_addr_t is designed such that it may be a simple
98 index into an array, a pointer to a structure, or a compact network ad‐
99 dress that may be placed directly into protocol headers.
100
101 The process of mapping an address is fabric and provider specific, but
102 may involve lengthy address resolution and fabric management protocols.
103 AV operations are synchronous by default, but may be set to operate
104 asynchronously by specifying the FI_EVENT flag to fi_av_open. When re‐
105 questing asynchronous operation, the application must first bind an
106 event queue to the AV before inserting addresses. See the NOTES sec‐
107 tion for AV restrictions on duplicate addresses.
108
109 fi_av_open
110 fi_av_open allocates or opens an address vector. The properties and
111 behavior of the address vector are defined by struct fi_av_attr.
112
113 struct fi_av_attr {
114 enum fi_av_type type; /* type of AV */
115 int rx_ctx_bits; /* address bits to identify rx ctx */
116 size_t count; /* # entries for AV */
117 size_t ep_per_node; /* # endpoints per fabric address */
118 const char *name; /* system name of AV */
119 void *map_addr; /* base mmap address */
120 uint64_t flags; /* operation flags */
121 };
122
123 type An AV type corresponds to a conceptual implementation of an ad‐
124 dress vector. The type specifies how an application views data
125 stored in the AV, including how it may be accessed. Valid val‐
126 ues are:
127
128 - FI_AV_MAP
129 Addresses which are inserted into an AV are mapped to a native
130 fabric address for use by the application. The use of FI_AV_MAP
131 requires that an application store the returned fi_addr_t value
132 that is associated with each inserted address. The advantage of
133 using FI_AV_MAP is that the returned fi_addr_t value may contain
134 encoded address data, which is immediately available when pro‐
135 cessing data transfer requests. This can eliminate or reduce
136 the number of memory lookups needed when initiating a transfer.
137 The disadvantage of FI_AV_MAP is the increase in memory usage
138 needed to store the returned addresses. Addresses are stored in
139 the AV using a provider specific mechanism, including, but not
140 limited to a tree, hash table, or maintained on the heap.
141
142 - FI_AV_TABLE
143 Addresses which are inserted into an AV of type FI_AV_TABLE are
144 accessible using a simple index. Conceptually, the AV may be
145 treated as an array of addresses, though the provider may imple‐
146 ment the AV using a variety of mechanisms. When FI_AV_TABLE is
147 used, the returned fi_addr_t is an index, with the index for an
148 inserted address the same as its insertion order into the table.
149 The index of the first address inserted into an FI_AV_TABLE will
150 be 0, and successive insertions will be given sequential in‐
151 dices. Sequential indices will be assigned across insertion
152 calls on the same AV.
153
154 - FI_AV_UNSPEC
155 Provider will choose its preferred AV type. The AV type used
156 will be returned through the type field in fi_av_attr.
157
158 Receive Context Bits (rx_ctx_bits)
159 The receive context bits field is only for use with scalable
160 endpoints. It indicates the number of bits reserved in a re‐
161 turned fi_addr_t, which will be used to identify a specific tar‐
162 get receive context. See fi_rx_addr() and fi_endpoint(3) for
163 additional details on receive contexts. The requested number of
164 bits should be selected such that 2 ^ rx_ctx_bits >= rx_ctx_cnt
165 for the endpoint.
166
167 count Indicates the expected number of addresses that will be inserted
168 into the AV. The provider uses this to optimize resource allo‐
169 cations.
170
171 ep_per_node
172 This field indicates the number of endpoints that will be asso‐
173 ciated with a specific fabric, or network, address. If the num‐
174 ber of endpoints per node is unknown, this value should be set
175 to 0. The provider uses this value to optimize resource alloca‐
176 tions. For example, distributed, parallel applications may set
177 this to the number of processes allocated per node, times the
178 number of endpoints each process will open.
179
180 name An optional system name associated with the address vector to
181 create or open. Address vectors may be shared across multiple
182 processes which access the same named domain on the same node.
183 The name field allows the underlying provider to identify a
184 shared AV.
185
186 If the name field is non-NULL and the AV is not opened for read-only
187 access, a named AV will be created, if it does not already exist.
188
189 map_addr
190 The map_addr determines the base fi_addr_t address that a
191 provider should use when sharing an AV of type FI_AV_MAP between
192 processes. Processes that provide the same value for map_addr
193 to a shared AV may use the same fi_addr_t values returned from
194 an fi_av_insert call.
195
196 The map_addr may be used by the provider to mmap memory allocated for a
197 shared AV between processes; however, the provider is not required to
198 use the map_addr in this fashion. The only requirement is that an
199 fi_addr_t returned as part of an fi_av_insert call on one process is
200 usable on another process which opens an AV of the same name at the
201 same map_addr value. The relationship between the map_addr and any re‐
202 turned fi_addr_t is not defined.
203
204 If name is non-NULL and map_addr is 0, then the map_addr used by the
205 provider will be returned through the attribute structure. The map_ad‐
206 dr field is ignored if name is NULL.
207
208 flags The following flags may be used when opening an AV.
209
210 - FI_EVENT
211 When the flag FI_EVENT is specified, all insert operations on
212 this AV will occur asynchronously. There will be one EQ error
213 entry generated for each failed address insertion, followed by
214 one non-error event indicating that the insertion operation has
215 completed. There will always be one non-error completion event
216 for each insert operation, even if all addresses fail. The con‐
217 text field in all completions will be the context specified to
218 the insert call, and the data field in the final completion en‐
219 try will report the number of addresses successfully inserted.
220 If an error occurs during the asynchronous insertion, an error
221 completion entry is returned (see fi_eq(3) for a discussion of
222 the fi_eq_err_entry error completion struct). The context field
223 of the error completion will be the context that was specified
224 in the insert call; the data field will contain the index of the
225 failed address. There will be one error completion returned for
226 each address that fails to insert into the AV.
227
228 If an AV is opened with FI_EVENT, any insertions attempted before an EQ
229 is bound to the AV will fail with -FI_ENOEQ.
230
231 Error completions for failed insertions will contain the index of the
232 failed address in the index field of the error completion entry.
233
234 Note that the order of delivery of insert completions may not match the
235 order in which the calls to fi_av_insert were made. The only guarantee
236 is that all error completions for a given call to fi_av_insert will
237 precede the single associated non-error completion. • .RS 2
238
239 FI_READ
240 Opens an AV for read-only access. An AV opened for read-only
241 access must be named (name attribute specified), and the AV must
242 exist.
243 • .RS 2
244
245 FI_SYMMETRIC
246 Indicates that each node will be associated with the same number
247 of endpoints, the same transport addresses will be allocated on
248 each node, and the transport addresses will be sequential. This
249 feature targets distributed applications on large fabrics and
250 allows for highly-optimized storage of remote endpoint address‐
251 ing.
252
253 fi_close
254 The fi_close call is used to release all resources associated with an
255 address vector. Note that any events queued on an event queue refer‐
256 encing the AV are left untouched. It is recommended that callers re‐
257 trieve all events associated with the AV before closing it.
258
259 When closing the address vector, there must be no opened endpoints as‐
260 sociated with the AV. If resources are still associated with the AV
261 when attempting to close, the call will return -FI_EBUSY.
262
263 fi_av_bind
264 Associates an event queue with the AV. If an AV has been opened with
265 FI_EVENT, then an event queue must be bound to the AV before any inser‐
266 tion calls are attempted. Any calls to insert addresses before an
267 event queue has been bound will fail with -FI_ENOEQ. Flags are re‐
268 served for future use and must be 0.
269
270 fi_av_insert
271 The fi_av_insert call inserts zero or more addresses into an AV. The
272 number of addresses is specified through the count parameter. The addr
273 parameter references an array of addresses to insert into the AV. Ad‐
274 dresses inserted into an address vector must be in the same format as
275 specified in the addr_format field of the fi_info struct provided when
276 opening the corresponding domain. When using the FI_ADDR_STR format,
277 the addr parameter should reference an array of strings (char **).
278
279 For AV’s of type FI_AV_MAP, once inserted addresses have been mapped,
280 the mapped values are written into the buffer referenced by fi_addr.
281 The fi_addr buffer must remain valid until the AV insertion has com‐
282 pleted and an event has been generated to an associated event queue.
283 The value of the returned fi_addr should be considered opaque by the
284 application for AVs of type FI_AV_MAP. The returned value may point to
285 an internal structure or a provider specific encoding of low-level ad‐
286 dressing data, for example. In the latter case, use of FI_AV_MAP may
287 be able to avoid memory references during data transfer operations.
288
289 For AV’s of type FI_AV_TABLE, addresses are placed into the table in
290 order. An address is inserted at the lowest index that corresponds to
291 an unused table location, with indices starting at 0. That is, the
292 first address inserted may be referenced at index 0, the second at in‐
293 dex 1, and so forth. When addresses are inserted into an AV table, the
294 assigned fi_addr values will be simple indices corresponding to the en‐
295 try into the table where the address was inserted. Index values accu‐
296 mulate across successive insert calls in the order the calls are made,
297 not necessarily in the order the insertions complete.
298
299 Because insertions occur at a pre-determined index, the fi_addr parame‐
300 ter may be NULL. If fi_addr is non-NULL, it must reference an array of
301 fi_addr_t, and the buffer must remain valid until the insertion opera‐
302 tion completes. Note that if fi_addr is NULL and synchronous operation
303 is requested without using FI_SYNC_ERR flag, individual insertion fail‐
304 ures cannot be reported and the application must use other calls, such
305 as fi_av_lookup to learn which specific addresses failed to insert.
306 Since fi_av_remove is provider-specific, it is recommended that calls
307 to fi_av_insert following a call to fi_av_remove always reference a
308 valid buffer in the fi_addr parameter. Otherwise it may be difficult
309 to determine what the next assigned index will be.
310
311 flags The following flag may be passed to AV insertion calls:
312 fi_av_insert, fi_av_insertsvc, or fi_av_insertsym.
313
314 - FI_MORE
315 In order to allow optimized address insertion, the application
316 may specify the FI_MORE flag to the insert call to give a hint
317 to the provider that more insertion requests will follow, allow‐
318 ing the provider to aggregate insertion requests if desired. An
319 application may make any number of insertion calls with FI_MORE
320 set, provided that they are followed by an insertion call with‐
321 out FI_MORE. This signifies to the provider that the insertion
322 list is complete. Providers are free to ignore FI_MORE.
323
324 - FI_SYNC_ERR
325 This flag applies to synchronous insertions only, and is used to
326 retrieve error details of failed insertions. If set, the con‐
327 text parameter of insertion calls references an array of inte‐
328 gers, with context set to address of the first element of the
329 array. The resulting status of attempting to insert each ad‐
330 dress will be written to the corresponding array location. Suc‐
331 cessful insertions will be updated to 0. Failures will contain
332 a fabric errno code.
333
334 - FI_AV_USER_ID
335 This flag associates a user-assigned identifier with each AV en‐
336 try that is returned with any completion entry in place of the
337 AV’s address. See the user ID section below.
338
339 fi_av_insertsvc
340 The fi_av_insertsvc call behaves similar to fi_av_insert, but allows
341 the application to specify the node and service names, similar to the
342 fi_getinfo inputs, rather than an encoded address. The node and ser‐
343 vice parameters are defined the same as fi_getinfo(3). Node should be
344 a string that corresponds to a hostname or network address. The ser‐
345 vice string corresponds to a textual representation of a transport ad‐
346 dress. Applications may also pass in an FI_ADDR_STR formatted address
347 as the node parameter. In such cases, the service parameter must be
348 NULL. See fi_getinfo.3 for details on using FI_ADDR_STR. Supported
349 flags are the same as for fi_av_insert.
350
351 fi_av_insertsym
352 fi_av_insertsym performs a symmetric insert that inserts a sequential
353 range of nodes and/or service addresses into an AV. The svccnt parame‐
354 ter indicates the number of transport (endpoint) addresses to insert
355 into the AV for each node address, with the service parameter specify‐
356 ing the starting transport address. Inserted transport addresses will
357 be of the range {service, service + svccnt - 1}, inclusive. All ser‐
358 vice addresses for a node will be inserted before the next node is in‐
359 serted.
360
361 The nodecnt parameter indicates the number of node (network) addresses
362 to insert into the AV, with the node parameter specifying the starting
363 node address. Inserted node addresses will be of the range {node, node
364 + nodecnt - 1}, inclusive. If node is a non-numeric string, such as a
365 hostname, it must contain a numeric suffix if nodecnt > 1.
366
367 As an example, if node = “10.1.1.1”, nodecnt = 2, service = “5000”, and
368 svccnt = 2, the following addresses will be inserted into the AV in the
369 order shown: 10.1.1.1:5000, 10.1.1.1:5001, 10.1.1.2:5000,
370 10.1.1.2:5001. If node were replaced by the hostname “host10”, the ad‐
371 dresses would be: host10:5000, host10:5001, host11:5000, host11:5001.
372
373 The total number of inserted addresses will be nodecnt x svccnt.
374
375 Supported flags are the same as for fi_av_insert.
376
377 fi_av_remove
378 fi_av_remove removes a set of addresses from an address vector. All
379 resources associated with the indicated addresses are released. The
380 removed address - either the mapped address (in the case of FI_AV_MAP)
381 or index (FI_AV_TABLE) - is invalid until it is returned again by a new
382 fi_av_insert.
383
384 The behavior of operations in progress that reference the removed ad‐
385 dresses is undefined.
386
387 The use of fi_av_remove is an optimization that applications may use to
388 free memory allocated with addresses that will no longer be accessed.
389 Inserted addresses are not required to be removed. fi_av_close will
390 automatically cleanup any resources associated with addresses remaining
391 in the AV when it is invoked.
392
393 Flags are reserved for future use and must be 0.
394
395 fi_av_lookup
396 This call returns the address stored in the address vector that corre‐
397 sponds to the given fi_addr. The returned address is the same format
398 as those stored by the AV. On input, the addrlen parameter should in‐
399 dicate the size of the addr buffer. If the actual address is larger
400 than what can fit into the buffer, it will be truncated. On output,
401 addrlen is set to the size of the buffer needed to store the address,
402 which may be larger than the input value.
403
404 fi_rx_addr
405 This function is used to convert an endpoint address, returned by
406 fi_av_insert, into an address that specifies a target receive context.
407 The specified fi_addr parameter must either be a value returned from
408 fi_av_insert, in the case of FI_AV_MAP, or an index, in the case of
409 FI_AV_TABLE. The value for rx_ctx_bits must match that specified in
410 the AV attributes for the given address.
411
412 Connected endpoints that support multiple receive contexts, but are not
413 associated with address vectors should specify FI_ADDR_NOTAVAIL for the
414 fi_addr parameter.
415
416 fi_av_straddr
417 The fi_av_straddr function converts the provided address into a print‐
418 able string. The specified address must be of the same format as those
419 stored by the AV, though the address itself is not required to have
420 been inserted. On input, the len parameter should specify the size of
421 the buffer referenced by buf. On output, addrlen is set to the size of
422 the buffer needed to store the address. This size may be larger than
423 the input len. If the provided buffer is too small, the results will
424 be truncated. fi_av_straddr returns a pointer to buf.
425
427 An AV should only store a single instance of an address. Attempting to
428 insert a duplicate copy of the same address into an AV may result in
429 undefined behavior, depending on the provider implementation.
430 Providers are not required to check for duplicates, as doing so could
431 incur significant overhead to the insertion process. For portability,
432 applications may need to track which peer addresses have been inserted
433 into a given AV in order to avoid duplicate entries. However,
434 providers are required to support the removal, followed by the re-in‐
435 sertion of an address. Only duplicate insertions are restricted.
436
437 Providers may implement AV’s using a variety of mechanisms. Specifi‐
438 cally, a provider may begin resolving inserted addresses as soon as
439 they have been added to an AV, even if asynchronous operation has been
440 specified. Similarly, a provider may lazily release resources from re‐
441 moved entries.
442
444 As described above, endpoint addresses that are inserted into an AV are
445 mapped to an fi_addr_t value. The fi_addr_t is used in data transfer
446 APIs to specify the destination of an outbound transfer, in receive
447 APIs to indicate the source for an inbound transfer, and also in com‐
448 pletion events to report the source address of inbound transfers. The
449 FI_AV_USER_ID capability bit and flag provide a mechanism by which the
450 fi_addr_t value reported by a completion event is replaced with a user-
451 specified value instead. This is useful for applications that need to
452 map the source address to their own data structure.
453
454 Support for FI_AV_USER_ID is provider specific, as it may not be feasi‐
455 ble for a provider to implement this support without significant over‐
456 head. For example, some providers may need to add a reverse lookup
457 mechanism. This feature may be unavailable if shared AVs are request‐
458 ed, or negatively impact the per process memory footprint if implement‐
459 ed. For providers that do not support FI_AV_USER_ID, users may be able
460 to trade off lookup processing with protocol overhead, by carrying
461 source identification within a message header.
462
463 User-specified fi_addr_t values are provided as part of address inser‐
464 tion (e.g. fi_av_insert) through the fi_addr parameter. The fi_addr
465 parameter acts as input/output in this case. When the FI_AV_USER_ID
466 flag is passed to any of the insert calls, the caller must specify an
467 fi_addr_t identifier value to associate with each address. The
468 provider will record that identifier and use it where required as part
469 of any completion event. Note that the output from the AV insertion
470 call is unchanged. The provider will return an fi_addr_t value that
471 maps to each address, and that value must be used for all data transfer
472 operations.
473
475 Insertion calls for an AV opened for synchronous operation will return
476 the number of addresses that were successfully inserted. In the case
477 of failure, the return value will be less than the number of addresses
478 that was specified.
479
480 Insertion calls for an AV opened for asynchronous operation (with
481 FI_EVENT flag specified) will return 0 if the operation was successful‐
482 ly initiated. In the case of failure, a negative fabric errno will be
483 returned. Providers are allowed to abort insertion operations in the
484 case of an error. Addresses that are not inserted because they were
485 aborted will fail with an error code of FI_ECANCELED.
486
487 In both the synchronous and asynchronous modes of operation, the fi_ad‐
488 dr buffer associated with a failed or aborted insertion will be set to
489 FI_ADDR_NOTAVAIL.
490
491 All other calls return 0 on success, or a negative value corresponding
492 to fabric errno on error. Fabric errno values are defined in rd‐
493 ma/fi_errno.h.
494
496 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_eq(3)
497
499 OpenFabrics.
500
501
502
503Libfabric Programmer’s Manual 2022-12-11 fi_av(3)