1fi_av(3) Libfabric v1.12.1 fi_av(3)
2
3
4
6 fi_av - Address vector operations
7
8 fi_av_open / fi_close
9 Open or close an address vector
10
11 fi_av_bind
12 Associate an address vector with an event queue.
13
14 fi_av_insert / fi_av_insertsvc / fi_av_remove
15 Insert/remove an address into/from the address vector.
16
17 fi_av_lookup
18 Retrieve an address stored in the address vector.
19
20 fi_av_straddr
21 Convert an address into a printable string.
22
24 #include <rdma/fi_domain.h>
25
26 int fi_av_open(struct fid_domain *domain, struct fi_av_attr *attr,
27 struct fid_av **av, void *context);
28
29 int fi_close(struct fid *av);
30
31 int fi_av_bind(struct fid_av *av, struct fid *eq, uint64_t flags);
32
33 int fi_av_insert(struct fid_av *av, void *addr, size_t count,
34 fi_addr_t *fi_addr, uint64_t flags, void *context);
35
36 int fi_av_insertsvc(struct fid_av *av, const char *node,
37 const char *service, fi_addr_t *fi_addr, uint64_t flags,
38 void *context);
39
40 int fi_av_insertsym(struct fid_av *av, const char *node,
41 size_t nodecnt, const char *service, size_t svccnt,
42 fi_addr_t *fi_addr, uint64_t flags, void *context);
43
44 int fi_av_remove(struct fid_av *av, fi_addr_t *fi_addr, size_t count,
45 uint64_t flags);
46
47 int fi_av_lookup(struct fid_av *av, fi_addr_t fi_addr,
48 void *addr, size_t *addrlen);
49
50 fi_addr_t fi_rx_addr(fi_addr_t fi_addr, int rx_index,
51 int rx_ctx_bits);
52
53 const char * fi_av_straddr(struct fid_av *av, const void *addr,
54 char *buf, size_t *len);
55
57 domain Resource domain
58
59 av Address vector
60
61 eq Event queue
62
63 attr Address vector attributes
64
65 context
66 User specified context associated with the address vector or in‐
67 sert operation.
68
69 addr Buffer containing one or more addresses to insert into address
70 vector.
71
72 addrlen
73 On input, specifies size of addr buffer. On output, stores num‐
74 ber of bytes written to addr buffer.
75
76 fi_addr
77 For insert, a reference to an array where returned fabric ad‐
78 dresses will be written. For remove, one or more fabric ad‐
79 dresses to remove.
80
81 count Number of addresses to insert/remove from an AV.
82
83 flags Additional flags to apply to the operation.
84
86 Address vectors are used to map higher level addresses, which may be
87 more natural for an application to use, into fabric specific addresses.
88 The mapping of addresses is fabric and provider specific, but may in‐
89 volve lengthy address resolution and fabric management protocols. AV
90 operations are synchronous by default, but may be set to operate asyn‐
91 chronously by specifying the FI_EVENT flag to fi_av_open. When re‐
92 questing asynchronous operation, the application must first bind an
93 event queue to the AV before inserting addresses.
94
95 fi_av_open
96 fi_av_open allocates or opens an address vector. The properties and
97 behavior of the address vector are defined by struct fi_av_attr.
98
99 struct fi_av_attr {
100 enum fi_av_type type; /* type of AV */
101 int rx_ctx_bits; /* address bits to identify rx ctx */
102 size_t count; /* # entries for AV */
103 size_t ep_per_node; /* # endpoints per fabric address */
104 const char *name; /* system name of AV */
105 void *map_addr; /* base mmap address */
106 uint64_t flags; /* operation flags */
107 };
108
109 type An AV type corresponds to a conceptual implementation of an ad‐
110 dress vector. The type specifies how an application views data
111 stored in the AV, including how it may be accessed. Valid val‐
112 ues are:
113
114 - FI_AV_MAP
115 Addresses which are inserted into an AV are mapped to a native
116 fabric address for use by the application. The use of FI_AV_MAP
117 requires that an application store the returned fi_addr_t value
118 that is associated with each inserted address. The advantage of
119 using FI_AV_MAP is that the returned fi_addr_t value may contain
120 encoded address data, which is immediately available when pro‐
121 cessing data transfer requests. This can eliminate or reduce
122 the number of memory lookups needed when initiating a transfer.
123 The disadvantage of FI_AV_MAP is the increase in memory usage
124 needed to store the returned addresses. Addresses are stored in
125 the AV using a provider specific mechanism, including, but not
126 limited to a tree, hash table, or maintained on the heap.
127
128 - FI_AV_TABLE
129 Addresses which are inserted into an AV of type FI_AV_TABLE are
130 accessible using a simple index. Conceptually, the AV may be
131 treated as an array of addresses, though the provider may imple‐
132 ment the AV using a variety of mechanisms. When FI_AV_TABLE is
133 used, the returned fi_addr_t is an index, with the index for an
134 inserted address the same as its insertion order into the table.
135 The index of the first address inserted into an FI_AV_TABLE will
136 be 0, and successive insertions will be given sequential in‐
137 dices. Sequential indices will be assigned across insertion
138 calls on the same AV.
139
140 - FI_AV_UNSPEC
141 Provider will choose its preferred AV type. The AV type used
142 will be returned through the type field in fi_av_attr.
143
144 Receive Context Bits (rx_ctx_bits)
145 The receive context bits field is only for use with scalable
146 endpoints. It indicates the number of bits reserved in a re‐
147 turned fi_addr_t, which will be used to identify a specific tar‐
148 get receive context. See fi_rx_addr() and fi_endpoint(3) for
149 additional details on receive contexts. The requested number of
150 bits should be selected such that 2 ^ rx_ctx_bits >= rx_ctx_cnt
151 for the endpoint.
152
153 count Indicates the expected number of addresses that will be inserted
154 into the AV. The provider uses this to optimize resource allo‐
155 cations.
156
157 ep_per_node
158 This field indicates the number of endpoints that will be asso‐
159 ciated with a specific fabric, or network, address. If the num‐
160 ber of endpoints per node is unknown, this value should be set
161 to 0. The provider uses this value to optimize resource alloca‐
162 tions. For example, distributed, parallel applications may set
163 this to the number of processes allocated per node, times the
164 number of endpoints each process will open.
165
166 name An optional system name associated with the address vector to
167 create or open. Address vectors may be shared across multiple
168 processes which access the same named domain on the same node.
169 The name field allows the underlying provider to identify a
170 shared AV.
171
172 If the name field is non-NULL and the AV is not opened for read-only
173 access, a named AV will be created, if it does not already exist.
174
175 map_addr
176 The map_addr determines the base fi_addr_t address that a
177 provider should use when sharing an AV of type FI_AV_MAP between
178 processes. Processes that provide the same value for map_addr
179 to a shared AV may use the same fi_addr_t values returned from
180 an fi_av_insert call.
181
182 The map_addr may be used by the provider to mmap memory allocated for a
183 shared AV between processes; however, the provider is not required to
184 use the map_addr in this fashion. The only requirement is that an
185 fi_addr_t returned as part of an fi_av_insert call on one process is
186 usable on another process which opens an AV of the same name at the
187 same map_addr value. The relationship between the map_addr and any re‐
188 turned fi_addr_t is not defined.
189
190 If name is non-NULL and map_addr is 0, then the map_addr used by the
191 provider will be returned through the attribute structure. The map_ad‐
192 dr field is ignored if name is NULL.
193
194 flags The following flags may be used when opening an AV.
195
196 - FI_EVENT
197 When the flag FI_EVENT is specified, all insert operations on
198 this AV will occur asynchronously. There will be one EQ error
199 entry generated for each failed address insertion, followed by
200 one non-error event indicating that the insertion operation has
201 completed. There will always be one non-error completion event
202 for each insert operation, even if all addresses fail. The con‐
203 text field in all completions will be the context specified to
204 the insert call, and the data field in the final completion en‐
205 try will report the number of addresses successfully inserted.
206 If an error occurs during the asynchronous insertion, an error
207 completion entry is returned (see fi_eq(3) for a discussion of
208 the fi_eq_err_entry error completion struct). The context field
209 of the error completion will be the context that was specified
210 in the insert call; the data field will contain the index of the
211 failed address. There will be one error completion returned for
212 each address that fails to insert into the AV.
213
214 If an AV is opened with FI_EVENT, any insertions attempted before an EQ
215 is bound to the AV will fail with -FI_ENOEQ.
216
217 Error completions for failed insertions will contain the index of the
218 failed address in the index field of the error completion entry.
219
220 Note that the order of delivery of insert completions may not match the
221 order in which the calls to fi_av_insert were made. The only guarantee
222 is that all error completions for a given call to fi_av_insert will
223 precede the single associated non-error completion. • .RS 2
224
225 FI_READ
226 Opens an AV for read-only access. An AV opened for read-only
227 access must be named (name attribute specified), and the AV must
228 exist.
229 • .RS 2
230
231 FI_SYMMETRIC
232 Indicates that each node will be associated with the same number
233 of endpoints, the same transport addresses will be allocated on
234 each node, and the transport addresses will be sequential. This
235 feature targets distributed applications on large fabrics and
236 allows for highly-optimized storage of remote endpoint address‐
237 ing.
238
239 fi_close
240 The fi_close call is used to release all resources associated with an
241 address vector. Note that any events queued on an event queue refer‐
242 encing the AV are left untouched. It is recommended that callers re‐
243 trieve all events associated with the AV before closing it.
244
245 When closing the address vector, there must be no opened endpoints as‐
246 sociated with the AV. If resources are still associated with the AV
247 when attempting to close, the call will return -FI_EBUSY.
248
249 fi_av_bind
250 Associates an event queue with the AV. If an AV has been opened with
251 FI_EVENT, then an event queue must be bound to the AV before any inser‐
252 tion calls are attempted. Any calls to insert addresses before an
253 event queue has been bound will fail with -FI_ENOEQ. Flags are re‐
254 served for future use and must be 0.
255
256 fi_av_insert
257 The fi_av_insert call inserts zero or more addresses into an AV. The
258 number of addresses is specified through the count parameter. The addr
259 parameter references an array of addresses to insert into the AV. Ad‐
260 dresses inserted into an address vector must be in the same format as
261 specified in the addr_format field of the fi_info struct provided when
262 opening the corresponding domain. When using the FI_ADDR_STR format,
263 the addr parameter should reference an array of strings (char **).
264
265 For AV's of type FI_AV_MAP, once inserted addresses have been mapped,
266 the mapped values are written into the buffer referenced by fi_addr.
267 The fi_addr buffer must remain valid until the AV insertion has com‐
268 pleted and an event has been generated to an associated event queue.
269 The value of the returned fi_addr should be considered opaque by the
270 application for AVs of type FI_AV_MAP. The returned value may point to
271 an internal structure or a provider specific encoding of low-level ad‐
272 dressing data, for example. In the latter case, use of FI_AV_MAP may
273 be able to avoid memory references during data transfer operations.
274
275 For AV's of type FI_AV_TABLE, addresses are placed into the table in
276 order. An address is inserted at the lowest index that corresponds to
277 an unused table location, with indices starting at 0. That is, the
278 first address inserted may be referenced at index 0, the second at in‐
279 dex 1, and so forth. When addresses are inserted into an AV table, the
280 assigned fi_addr values will be simple indices corresponding to the en‐
281 try into the table where the address was inserted. Index values accu‐
282 mulate across successive insert calls in the order the calls are made,
283 not necessarily in the order the insertions complete.
284
285 Because insertions occur at a pre-determined index, the fi_addr parame‐
286 ter may be NULL. If fi_addr is non-NULL, it must reference an array of
287 fi_addr_t, and the buffer must remain valid until the insertion opera‐
288 tion completes. Note that if fi_addr is NULL and synchronous operation
289 is requested without using FI_SYNC_ERR flag, individual insertion fail‐
290 ures cannot be reported and the application must use other calls, such
291 as fi_av_lookup to learn which specific addresses failed to insert.
292 Since fi_av_remove is provider-specific, it is recommended that calls
293 to fi_av_insert following a call to fi_av_remove always reference a
294 valid buffer in the fi_addr parameter. Otherwise it may be difficult
295 to determine what the next assigned index will be.
296
297 flags The following flag may be passed to AV insertion calls:
298 fi_av_insert, fi_av_insertsvc, or fi_av_insertsym.
299
300 - FI_MORE
301 In order to allow optimized address insertion, the application
302 may specify the FI_MORE flag to the insert call to give a hint
303 to the provider that more insertion requests will follow, allow‐
304 ing the provider to aggregate insertion requests if desired. An
305 application may make any number of insertion calls with FI_MORE
306 set, provided that they are followed by an insertion call with‐
307 out FI_MORE. This signifies to the provider that the insertion
308 list is complete. Providers are free to ignore FI_MORE.
309
310 - FI_SYNC_ERR
311 This flag applies to synchronous insertions only, and is used to
312 retrieve error details of failed insertions. If set, the con‐
313 text parameter of insertion calls references an array of inte‐
314 gers, with context set to address of the first element of the
315 array. The resulting status of attempting to insert each ad‐
316 dress will be written to the corresponding array location. Suc‐
317 cessful insertions will be updated to 0. Failures will contain
318 a fabric errno code.
319
320 fi_av_insertsvc
321 The fi_av_insertsvc call behaves similar to fi_av_insert, but allows
322 the application to specify the node and service names, similar to the
323 fi_getinfo inputs, rather than an encoded address. The node and ser‐
324 vice parameters are defined the same as fi_getinfo(3). Node should be
325 a string that corresponds to a hostname or network address. The ser‐
326 vice string corresponds to a textual representation of a transport ad‐
327 dress. Applications may also pass in an FI_ADDR_STR formatted address
328 as the node parameter. In such cases, the service parameter must be
329 NULL. See fi_getinfo.3 for details on using FI_ADDR_STR. Supported
330 flags are the same as for fi_av_insert.
331
332 fi_av_insertsym
333 fi_av_insertsym performs a symmetric insert that inserts a sequential
334 range of nodes and/or service addresses into an AV. The svccnt parame‐
335 ter indicates the number of transport (endpoint) addresses to insert
336 into the AV for each node address, with the service parameter specify‐
337 ing the starting transport address. Inserted transport addresses will
338 be of the range {service, service + svccnt - 1}, inclusive. All ser‐
339 vice addresses for a node will be inserted before the next node is in‐
340 serted.
341
342 The nodecnt parameter indicates the number of node (network) addresses
343 to insert into the AV, with the node parameter specifying the starting
344 node address. Inserted node addresses will be of the range {node, node
345 + nodecnt - 1}, inclusive. If node is a non-numeric string, such as a
346 hostname, it must contain a numeric suffix if nodecnt > 1.
347
348 As an example, if node = "10.1.1.1", nodecnt = 2, service = "5000", and
349 svccnt = 2, the following addresses will be inserted into the AV in the
350 order shown: 10.1.1.1:5000, 10.1.1.1:5001, 10.1.1.2:5000,
351 10.1.1.2:5001. If node were replaced by the hostname "host10", the ad‐
352 dresses would be: host10:5000, host10:5001, host11:5000, host11:5001.
353
354 The total number of inserted addresses will be nodecnt x svccnt.
355
356 Supported flags are the same as for fi_av_insert.
357
358 fi_av_remove
359 fi_av_remove removes a set of addresses from an address vector. All
360 resources associated with the indicated addresses are released. The
361 removed address - either the mapped address (in the case of FI_AV_MAP)
362 or index (FI_AV_TABLE) - is invalid until it is returned again by a new
363 fi_av_insert.
364
365 The behavior of operations in progress that reference the removed ad‐
366 dresses is undefined.
367
368 The use of fi_av_remove is an optimization that applications may use to
369 free memory allocated with addresses that will no longer be accessed.
370 Inserted addresses are not required to be removed. fi_av_close will
371 automatically cleanup any resources associated with addresses remaining
372 in the AV when it is invoked.
373
374 Flags are reserved for future use and must be 0.
375
376 fi_av_lookup
377 This call returns the address stored in the address vector that corre‐
378 sponds to the given fi_addr. The returned address is the same format
379 as those stored by the AV. On input, the addrlen parameter should in‐
380 dicate the size of the addr buffer. If the actual address is larger
381 than what can fit into the buffer, it will be truncated. On output,
382 addrlen is set to the size of the buffer needed to store the address,
383 which may be larger than the input value.
384
385 fi_rx_addr
386 This function is used to convert an endpoint address, returned by
387 fi_av_insert, into an address that specifies a target receive context.
388 The specified fi_addr parameter must either be a value returned from
389 fi_av_insert, in the case of FI_AV_MAP, or an index, in the case of
390 FI_AV_TABLE. The value for rx_ctx_bits must match that specified in
391 the AV attributes for the given address.
392
393 Connected endpoints that support multiple receive contexts, but are not
394 associated with address vectors should specify FI_ADDR_NOTAVAIL for the
395 fi_addr parameter.
396
397 fi_av_straddr
398 The fi_av_straddr function converts the provided address into a print‐
399 able string. The specified address must be of the same format as those
400 stored by the AV, though the address itself is not required to have
401 been inserted. On input, the len parameter should specify the size of
402 the buffer referenced by buf. On output, addrlen is set to the size of
403 the buffer needed to store the address. This size may be larger than
404 the input len. If the provided buffer is too small, the results will
405 be truncated. fi_av_straddr returns a pointer to buf.
406
408 Providers may implement AV's using a variety of mechanisms. Specifi‐
409 cally, a provider may begin resolving inserted addresses as soon as
410 they have been added to an AV, even if asynchronous operation has been
411 specified. Similarly, a provider may lazily release resources from re‐
412 moved entries.
413
415 Insertion calls for an AV opened for synchronous operation will return
416 the number of addresses that were successfully inserted. In the case
417 of failure, the return value will be less than the number of addresses
418 that was specified.
419
420 Insertion calls for an AV opened for asynchronous operation (with
421 FI_EVENT flag specified) will return 0 if the operation was successful‐
422 ly initiated. In the case of failure, a negative fabric errno will be
423 returned. Providers are allowed to abort insertion operations in the
424 case of an error. Addresses that are not inserted because they were
425 aborted will fail with an error code of FI_ECANCELED.
426
427 In both the synchronous and asynchronous modes of operation, the fi_ad‐
428 dr buffer associated with a failed or aborted insertion will be set to
429 FI_ADDR_NOTAVAIL.
430
431 All other calls return 0 on success, or a negative value corresponding
432 to fabric errno on error. Fabric errno values are defined in rd‐
433 ma/fi_errno.h.
434
436 fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_eq(3)
437
439 OpenFabrics.
440
441
442
443Libfabric Programmer's Manual 2019-07-17 fi_av(3)