fi_alias(3)

1fi_endpoint(3)                 Libfabric v1.7.0                 fi_endpoint(3)
2
3
4

NAME

6       fi_endpoint - Fabric endpoint operations
7
8       fi_endpoint / fi_scalable_ep / fi_passive_ep / fi_close
9              Allocate or close an endpoint.
10
11       fi_ep_bind
12              Associate  an  endpoint  with  hardware resources, such as event
13              queues, completion queues, counters, address vectors, or  shared
14              transmit/receive contexts.
15
16       fi_scalable_ep_bind
17              Associate a scalable endpoint with an address vector
18
19       fi_pep_bind
20              Associate a passive endpoint with an event queue
21
22       fi_enable
23              Transitions an active endpoint into an enabled state.
24
25       fi_cancel
26              Cancel a pending asynchronous data transfer
27
28       fi_ep_alias
29              Create an alias to the endpoint
30
31       fi_control
32              Control endpoint operation.
33
34       fi_getopt / fi_setopt
35              Get or set endpoint options.
36
37       fi_rx_context / fi_tx_context / fi_srx_context / fi_stx_context
38              Open a transmit or receive context.
39
40       fi_rx_size_left / fi_tx_size_left (DEPRECATED)
41              Query the lower bound on how many RX/TX operations may be posted
42              without an operation returning -FI_EAGAIN.  This functions  have
43              been  deprecated  and will be removed in a future version of the
44              library.
45

SYNOPSIS

47              #include <rdma/fabric.h>
48
49              #include <rdma/fi_endpoint.h>
50
51              int fi_endpoint(struct fid_domain *domain, struct fi_info *info,
52                  struct fid_ep **ep, void *context);
53
54              int fi_scalable_ep(struct fid_domain *domain, struct fi_info *info,
55                  struct fid_ep **sep, void *context);
56
57              int fi_passive_ep(struct fi_fabric *fabric, struct fi_info *info,
58                  struct fid_pep **pep, void *context);
59
60              int fi_tx_context(struct fid_ep *sep, int index,
61                  struct fi_tx_attr *attr, struct fid_ep **tx_ep,
62                  void *context);
63
64              int fi_rx_context(struct fid_ep *sep, int index,
65                  struct fi_rx_attr *attr, struct fid_ep **rx_ep,
66                  void *context);
67
68              int fi_stx_context(struct fid_domain *domain,
69                  struct fi_tx_attr *attr, struct fid_stx **stx,
70                  void *context);
71
72              int fi_srx_context(struct fid_domain *domain,
73                  struct fi_rx_attr *attr, struct fid_ep **rx_ep,
74                  void *context);
75
76              int fi_close(struct fid *ep);
77
78              int fi_ep_bind(struct fid_ep *ep, struct fid *fid, uint64_t flags);
79
80              int fi_scalable_ep_bind(struct fid_ep *sep, struct fid *fid, uint64_t flags);
81
82              int fi_pep_bind(struct fid_pep *pep, struct fid *fid, uint64_t flags);
83
84              int fi_enable(struct fid_ep *ep);
85
86              int fi_cancel(struct fid_ep *ep, void *context);
87
88              int fi_ep_alias(struct fid_ep *ep, struct fid_ep **alias_ep, uint64_t flags);
89
90              int fi_control(struct fid *ep, int command, void *arg);
91
92              int fi_getopt(struct fid *ep, int level, int optname,
93                  void *optval, size_t *optlen);
94
95              int fi_setopt(struct fid *ep, int level, int optname,
96                  const void *optval, size_t optlen);
97
98              DEPRECATED ssize_t fi_rx_size_left(struct fid_ep *ep);
99
100              DEPRECATED ssize_t fi_tx_size_left(struct fid_ep *ep);
101

ARGUMENTS

103       fid    On creation, specifies a fabric  or  access  domain.   On  bind,
104              identifies  the  event  queue, completion queue, counter, or ad‐
105              dress vector to bind to the endpoint.  In other  cases,  it's  a
106              fabric identifier of an associated resource.
107
108       info   Details  about  the  fabric interface endpoint to be opened, ob‐
109              tained from fi_getinfo.
110
111       ep     A fabric endpoint.
112
113       sep    A scalable fabric endpoint.
114
115       pep    A passive fabric endpoint.
116
117       context
118              Context associated with the endpoint or asynchronous operation.
119
120       index  Index to retrieve a specific transmit/receive context.
121
122       attr   Transmit or receive context attributes.
123
124       flags  Additional flags to apply to the operation.
125
126       command
127              Command of control operation to perform on endpoint.
128
129       arg    Optional control argument.
130
131       level  Protocol level at which the desired option resides.
132
133       optname
134              The protocol option to read or set.
135
136       optval The option value that was read or to set.
137
138       optlen The size of the optval buffer.
139

DESCRIPTION

141       Endpoints are transport level communication  portals.   There  are  two
142       types  of endpoints: active and passive.  Passive endpoints belong to a
143       fabric domain and are most often used to listen for incoming connection
144       requests.   However, a passive endpoint may be used to reserve a fabric
145       address that can be granted to an active  endpoint.   Active  endpoints
146       belong to access domains and can perform data transfers.
147
148       Active  endpoints may be connection-oriented or connectionless, and may
149       provide data reliability.  The data  transfer  interfaces  --  messages
150       (fi_msg),  tagged  messages  (fi_tagged),  RMA  (fi_rma),  and  atomics
151       (fi_atomic) -- are associated with active endpoints.  In basic configu‐
152       rations, an active endpoint has transmit and receive queues.  In gener‐
153       al, operations that generate traffic on the fabric are  posted  to  the
154       transmit  queue.   This  includes  all RMA and atomic operations, along
155       with sent messages and sent tagged messages.  Operations that post buf‐
156       fers for receiving incoming data are submitted to the receive queue.
157
158       Active  endpoints are created in the disabled state.  They must transi‐
159       tion into an enabled state before accepting data  transfer  operations,
160       including  posting  of  receive buffers.  The fi_enable call is used to
161       transition an active endpoint into an enabled  state.   The  fi_connect
162       and  fi_accept  calls will also transition an endpoint into the enabled
163       state, if it is not already active.
164
165       In order to transition an endpoint into an enabled state,  it  must  be
166       bound  to one or more fabric resources.  An endpoint that will generate
167       asynchronous completions, either through data  transfer  operations  or
168       communication  establishment  events,  must be bound to the appropriate
169       completion queues or event queues, respectively, before being  enabled.
170       Additionally,  endpoints  that  use  manual progress must be associated
171       with relevant completion queues or  event  queues  in  order  to  drive
172       progress.   For  endpoints  that  are only used as the target of RMA or
173       atomic operations, this means binding  the  endpoint  to  a  completion
174       queue  associated  with receive processing.  Unconnected endpoints must
175       be bound to an address vector.
176
177       Once an endpoint has been activated, it may be associated with  an  ad‐
178       dress  vector.   Receive  buffers  may be posted to it and calls may be
179       made to connection establishment  routines.   Connectionless  endpoints
180       may also perform data transfers.
181
182       The behavior of an endpoint may be adjusted by setting its control data
183       and protocol options.  This allows the underlying provider to  redirect
184       function  calls to implementations optimized to meet the desired appli‐
185       cation behavior.
186
187       If an endpoint experiences a critical error, it  will  transition  back
188       into  a disabled state.  Critical errors are reported through the event
189       queue associated with the EP.  In certain cases,  a  disabled  endpoint
190       may  be  re-enabled.   The  ability  to transition back into an enabled
191       state is provider specific and depends on the type of  error  that  the
192       endpoint  experienced.   When  an endpoint is disabled as a result of a
193       critical error, all pending operations are discarded.
194
195   fi_endpoint / fi_passive_ep / fi_scalable_ep
196       fi_endpoint allocates a new active endpoint.  fi_passive_ep allocates a
197       new  passive  endpoint.   fi_scalable_ep allocates a scalable endpoint.
198       The properties and behavior of the endpoint are defined  based  on  the
199       provided  struct  fi_info.   See  fi_getinfo  for additional details on
200       fi_info.  fi_info flags that control the operation of an  endpoint  are
201       defined below.  See section SCALABLE ENDPOINTS.
202
203       If  an active endpoint is allocated in order to accept a connection re‐
204       quest, the fi_info parameter must be the same as the fi_info  structure
205       provided with the connection request (FI_CONNREQ) event.
206
207       An  active endpoint may acquire the properties of a passive endpoint by
208       setting the fi_info handle field to the  passive  endpoint  fabric  de‐
209       scriptor.   This  is  useful  for applications that need to reserve the
210       fabric address of an endpoint prior to knowing if the endpoint will  be
211       used  on the active or passive side of a connection.  For example, this
212       feature is useful for simulating socket semantics.  Once an active end‐
213       point  acquires  the properties of a passive endpoint, the passive end‐
214       point is no longer bound to any fabric resources and must no longer  be
215       used.  The user is expected to close the passive endpoint after opening
216       the active endpoint in order to free up any  lingering  resources  that
217       had been used.
218
219   fi_close
220       Closes an endpoint and release all resources associated with it.
221
222       When closing a scalable endpoint, there must be no opened transmit con‐
223       texts, or receive contexts associated with the scalable  endpoint.   If
224       resources are still associated with the scalable endpoint when attempt‐
225       ing to close, the call will return -FI_EBUSY.
226
227       Outstanding operations posted to the endpoint when fi_close  is  called
228       will be discarded.  Discarded operations will silently be dropped, with
229       no completions reported.  Additionally, a provider may  discard  previ‐
230       ously  completed  operations  from  the associated completion queue(s).
231       The behavior to discard completed operations is provider specific.
232
233   fi_ep_bind
234       fi_ep_bind is used to associate an endpoint  with  hardware  resources.
235       The common use of fi_ep_bind is to direct asynchronous operations asso‐
236       ciated with an endpoint to a completion queue.   An  endpoint  must  be
237       bound  with  CQs  capable of reporting completions for any asynchronous
238       operation initiated on the endpoint.  This is true even  for  endpoints
239       which  are configured to suppress successful completions, in order that
240       operations that complete in error may be reported  to  the  user.   For
241       passive  endpoints,  this requires binding the endpoint with an EQ that
242       supports the communication management (CM) domain.
243
244       An active endpoint may direct  asynchronous  completions  to  different
245       CQs,  based  on  the  type  of  operation.   This  is  specified  using
246       fi_ep_bind flags.  The following flags may be used separately or  OR'ed
247       together when binding an endpoint to a completion domain CQ.
248
249       FI_TRANSMIT
250              Directs the completion of outbound data transfer requests to the
251              specified completion queue.  This includes  send  message,  RMA,
252              and atomic operations.
253
254       FI_RECV
255              Directs the notification of inbound data transfers to the speci‐
256              fied completion queue.  This includes received  messages.   This
257              binding automatically includes FI_REMOTE_WRITE, if applicable to
258              the endpoint.
259
260       FI_SELECTIVE_COMPLETION
261              By default, data transfer operations generate completion entries
262              into  a completion queue after they have successfully completed.
263              Applications can use this bind flag to selectively  enable  when
264              completions are generated.  If FI_SELECTIVE_COMPLETION is speci‐
265              fied, data transfer operations will  not  generate  entries  for
266              successful  completions unless FI_COMPLETION is set as an opera‐
267              tional flag for the given operation.  Operations that fail asyn‐
268              chronously will still generate completions, even if a completion
269              is not requested.  FI_SELECTIVE_COMPLETION must  be  OR'ed  with
270              FI_TRANSMIT and/or FI_RECV flags.
271
272       When FI_SELECTIVE_COMPLETION is set, the user must determine when a re‐
273       quest that does NOT have FI_COMPLETION set  has  completed  indirectly,
274       usually based on the completion of a subsequent operation.  Use of this
275       flag may improve performance by allowing the provider to avoid  writing
276       a completion entry for every operation.
277
278       Example:  An  application  can selectively generate send completions by
279       using the following general approach:
280
281                fi_tx_attr::op_flags = 0; // default - no completion
282                fi_ep_bind(ep, cq, FI_TRANSMIT | FI_SELECTIVE_COMPLETION);
283                fi_send(ep, ...);                   // no completion
284                fi_sendv(ep, ...);                  // no completion
285                fi_sendmsg(ep, ..., FI_COMPLETION); // completion!
286                fi_inject(ep, ...);                 // no completion
287
288       Example: An application can selectively  disable  send  completions  by
289       modifying the operational flags:
290
291                fi_tx_attr::op_flags = FI_COMPLETION; // default - completion
292                fi_ep_bind(ep, cq, FI_TRANSMIT | FI_SELECTIVE_COMPLETION);
293                fi_send(ep, ...);       // completion
294                fi_sendv(ep, ...);      // completion
295                fi_sendmsg(ep, ..., 0); // no completion!
296                fi_inject(ep, ...);     // no completion!
297
298       Example:  Omitting  FI_SELECTIVE_COMPLETION  when binding will generate
299       completions for all non-fi_inject calls:
300
301                fi_tx_attr::op_flags = 0;
302                fi_ep_bind(ep, cq, FI_TRANSMIT);    // default - completion
303                fi_send(ep, ...);                   // completion
304                fi_sendv(ep, ...);                  // completion
305                fi_sendmsg(ep, ..., 0);             // completion!
306                fi_sendmsg(ep, ..., FI_COMPLETION); // completion
307                fi_sendmsg(ep, ..., FI_INJECT|FI_COMPLETION); // completion!
308                fi_inject(ep, ...);                 // no completion!
309
310       An endpoint may also be bound to a fabric  counter.   When  binding  an
311       endpoint to a counter, the following flags may be specified.
312
313       FI_SEND
314              Increments  the  specified  counter  whenever a message transfer
315              initiated over the endpoint has completed successfully or in er‐
316              ror.  Sent messages include both tagged and normal message oper‐
317              ations.
318
319       FI_RECV
320              Increments the specified counter whenever a message is  received
321              over  the  endpoint.   Received messages include both tagged and
322              normal message operations.
323
324       FI_READ
325              Increments the specified counter whenever an  RMA  read,  atomic
326              fetch,  or  atomic compare operation initiated from the endpoint
327              has completed successfully or in error.
328
329       FI_WRITE
330              Increments the specified counter whenever an RMA write  or  base
331              atomic  operation initiated from the endpoint has completed suc‐
332              cessfully or in error.
333
334       FI_REMOTE_READ
335              Increments the specified counter whenever an  RMA  read,  atomic
336              fetch,  or  atomic  compare operation is initiated from a remote
337              endpoint that targets the given endpoint.  Use of this flag  re‐
338              quires that the endpoint be created using FI_RMA_EVENT.
339
340       FI_REMOTE_WRITE
341              Increments  the  specified counter whenever an RMA write or base
342              atomic operation is initiated from a remote endpoint  that  tar‐
343              gets  the  given  endpoint.   Use of this flag requires that the
344              endpoint be created using FI_RMA_EVENT.
345
346       An endpoint may only be bound to a single CQ or  counter  for  a  given
347       type of operation.  For example, a EP may not bind to two counters both
348       using FI_WRITE.  Furthermore, providers may limit CQ and counter  bind‐
349       ings to endpoints of the same endpoint type (DGRAM, MSG, RDM, etc.).
350
351       Connectionless endpoints must be bound to a single address vector.
352
353       If  an  endpoint is using a shared transmit and/or receive context, the
354       shared contexts must be bound to the endpoint.  CQs, counters, AV,  and
355       shared contexts must be bound to endpoints before they are enabled.
356
357   fi_scalable_ep_bind
358       fi_scalable_ep_bind  is  used  to associate a scalable endpoint with an
359       address vector.  See section on SCALABLE ENDPOINTS.   A  scalable  end‐
360       point  has  a  single  transport level address and can support multiple
361       transmit and receive contexts.  The transmit and receive contexts share
362       the  transport-level  address.  Address vectors that are bound to scal‐
363       able endpoints are implicitly bound to any transmit or receive contexts
364       created using the scalable endpoint.
365
366   fi_enable
367       This  call transitions the endpoint into an enabled state.  An endpoint
368       must be enabled before it may be used to perform data  transfers.   En‐
369       abling  an  endpoint  typically results in hardware resources being as‐
370       signed to it.  Endpoints making use  of  completion  queues,  counters,
371       event queues, and/or address vectors must be bound to them before being
372       enabled.
373
374       Calling connect or accept on an endpoint will implicitly enable an end‐
375       point if it has not already been enabled.
376
377       fi_enable  may also be used to re-enable an endpoint that has been dis‐
378       abled as a result  of  experiencing  a  critical  error.   Applications
379       should  check the return value from fi_enable to see if a disabled end‐
380       point has successfully be re-enabled.
381
382   fi_cancel
383       fi_cancel attempts to cancel  an  outstanding  asynchronous  operation.
384       Canceling an operation causes the fabric provider to search for the op‐
385       eration and, if it is still pending, complete it as  having  been  can‐
386       celed.   An error queue entry will be available in the associated error
387       queue with error code FI_ECANCELED.  On the other hand, if  the  opera‐
388       tion completed before the call to fi_cancel, then the completion status
389       of that operation will be available in the associated completion queue.
390       No specific entry related to fi_cancel itself will be posted.
391
392       Cancel uses the context parameter associated with an operation to iden‐
393       tify the request to cancel.  Operations posted without a valid  context
394       parameter  --  either  no context parameter is specified or the context
395       value was ignored by the provider -- cannot be canceled.   If  multiple
396       outstanding  operations  match  the context parameter, only one will be
397       canceled.  In this case, the operation which is  canceled  is  provider
398       specific.   The  cancel  operation  is  asynchronous, but will complete
399       within a bounded period of time.
400
401   fi_ep_alias
402       This call creates an alias to the specified endpoint.  Conceptually, an
403       endpoint alias provides an alternate software path from the application
404       to the underlying provider hardware.  An alias EP differs from its par‐
405       ent  endpoint only by its default data transfer flags.  For example, an
406       alias EP may be configured to use a different completion mode.  By  de‐
407       fault,  an alias EP inherits the same data transfer flags as the parent
408       endpoint.  An application can use fi_control to modify the alias EP op‐
409       erational flags.
410
411       When  allocating  an  alias,  an  application  may configure either the
412       transmit or receive operational flags.  This avoids needing a  separate
413       call to fi_control to set those flags.  The flags passed to fi_ep_alias
414       must include FI_TRANSMIT or FI_RECV (not both) with  other  operational
415       flags  OR'ed in.  This will override the transmit or receive flags, re‐
416       spectively, for operations posted through the alias endpoint.  All  al‐
417       located  aliases  must  be closed for the underlying endpoint to be re‐
418       leased.
419
420   fi_control
421       The control operation is used to adjust the default behavior of an end‐
422       point.  It allows the underlying provider to redirect function calls to
423       implementations optimized to meet the desired application behavior.  As
424       a  result,  calls to fi_ep_control must be serialized against all other
425       calls to an endpoint.
426
427       The base operation of an endpoint is  selected  during  creation  using
428       struct  fi_info.   The  following control commands and arguments may be
429       assigned to an endpoint.
430
431       **FI_GETOPSFLAG -- uint64_t *flags**
432              Used to retrieve the current value of flags associated with  the
433              data transfer operations initiated on the endpoint.  The control
434              argument must include FI_TRANSMIT or FI_RECV (not both) flags to
435              indicate  the  type  of data transfer flags to be returned.  See
436              below for a list of control flags.
437
438       **FI_SETOPSFLAG -- uint64_t *flags**
439              Used to change the data transfer operation flags associated with
440              an  endpoint.   The control argument must include FI_TRANSMIT or
441              FI_RECV (not both) to indicate the type of  data  transfer  that
442              the flags should apply to, with other flags OR'ed in.  The given
443              flags will override the previous transmit and receive attributes
444              that  were  set  when  the  endpoint was created.  Valid control
445              flags are defined below.
446
447       **FI_BACKLOG - int *value**
448              This option only applies to passive endpoints.  It  is  used  to
449              set the connection request backlog for listening endpoints.
450
451       FI_GETWAIT (void **)
452              This command allows the user to retrieve the file descriptor as‐
453              sociated with a socket endpoint.  The fi_control  arg  parameter
454              should  be  an  address where a pointer to the returned file de‐
455              scriptor will be written.  See fi_eq.3 for addition details  us‐
456              ing fi_control with FI_GETWAIT.  The file descriptor may be used
457              for notification that the endpoint is ready to send  or  receive
458              data.
459
460   fi_getopt / fi_setopt
461       Endpoint  protocol  operations  may be retrieved using fi_getopt or set
462       using fi_setopt.  Applications specify the level that a desired  option
463       exists, identify the option, and provide input/output buffers to get or
464       set the option.  fi_setopt provides an  application  a  way  to  adjust
465       low-level protocol and implementation specific details of an endpoint.
466
467       The  following  option  levels  and option names and parameters are de‐
468       fined.
469
470       FI_OPT_ENDPOINT · .RS 2
471
472       FI_OPT_MIN_MULTI_RECV - size_t
473              Defines the minimum receive buffer space available when the  re‐
474              ceive  buffer  is  released by the provider (see FI_MULTI_RECV).
475              Modifying this value is only guaranteed to set the minimum  buf‐
476              fer  space  needed  on  receives posted after the value has been
477              changed.  It is recommended that applications that want to over‐
478              ride the default MIN_MULTI_RECV value set this option before en‐
479              abling the corresponding endpoint.
480       · .RS 2
481
482       FI_OPT_CM_DATA_SIZE - size_t
483              Defines the size of available space in CM messages for  user-de‐
484              fined  data.  This value limits the amount of data that applica‐
485              tions can exchange between peer endpoints using the  fi_connect,
486              fi_accept,  and  fi_reject operations.  The size returned is de‐
487              pendent upon the properties of the endpoint, except in the  case
488              of  passive  endpoints,  in  which the size reflects the maximum
489              size of the data that may be present as part of a connection re‐
490              quest event.  This option is read only.
491       · .RS 2
492
493       FI_OPT_BUFFERED_LIMIT - size_t
494              Defines  the maximum size of a buffered message that will be re‐
495              ported to users  as  part  of  a  receive  completion  when  the
496              FI_BUFFERED_RECV mode is enabled on an endpoint.
497
498       fi_getopt()  will  return  the  currently  configured threshold, or the
499       provider's default threshold if one has not be set by the  application.
500       fi_setopt()  allows  an application to configure the threshold.  If the
501       provider cannot support the  requested  threshold,  it  will  fail  the
502       fi_setopt()  call  with  FI_EMSGSIZE.   Calling  fi_setopt()  with  the
503       threshold set to SIZE_MAX will set the threshold to  the  maximum  sup‐
504       ported  by  the provider.  fi_getopt() can then be used to retrieve the
505       set size.
506
507       In most cases, the sending and receiving endpoints must  be  configured
508       to use the same threshold value, and the threshold must be set prior to
509       enabling the endpoint.  · .RS 2
510
511       FI_OPT_BUFFERED_MIN - size_t
512              Defines the minimum size of a buffered message that will be  re‐
513              ported.  Applications would set this to a size that's big enough
514              to decide whether to discard or claim a buffered receive or when
515              to  claim  a buffered receive on getting a buffered receive com‐
516              pletion.  The value is typically used by a provider when sending
517              a  rendezvous  protocol  request  where  it  would  send atleast
518              FI_OPT_BUFFERED_MIN bytes of application data along with it.   A
519              smaller  sized  renedezvous  protocol message usually results in
520              better latency for the overall transfer of a large message.
521
522   fi_rx_size_left (DEPRECATED)
523       This function has been deprecated and will be removed in a future  ver‐
524       sion of the library.  It may not be supported by all providers.
525
526       The fi_rx_size_left call returns a lower bound on the number of receive
527       operations that may be posted to the given endpoint without that opera‐
528       tion  returning  -FI_EAGAIN.   Depending on the specific details of the
529       subsequently posted receive operations (e.g., number  of  iov  entries,
530       which  receive  function  is  called, etc.), it may be possible to post
531       more receive operations than originally indicated by fi_rx_size_left.
532
533   fi_tx_size_left (DEPRECATED)
534       This function has been deprecated and will be removed in a future  ver‐
535       sion of the library.  It may not be supported by all providers.
536
537       The  fi_tx_size_left call returns a lower bound on the number of trans‐
538       mit operations that may be posted to the given  endpoint  without  that
539       operation  returning  -FI_EAGAIN.  Depending on the specific details of
540       the subsequently posted transmit operations (e.g., number  of  iov  en‐
541       tries,  which transmit function is called, etc.), it may be possible to
542       post  more   transmit   operations   than   originally   indicated   by
543       fi_tx_size_left.
544

ENDPOINT ATTRIBUTES

546       The  fi_ep_attr structure defines the set of attributes associated with
547       an endpoint.  Endpoint attributes may  be  further  refined  using  the
548       transmit and receive context attributes as shown below.
549
550              struct fi_ep_attr {
551                  enum fi_ep_type type;
552                  uint32_t        protocol;
553                  uint32_t        protocol_version;
554                  size_t          max_msg_size;
555                  size_t          msg_prefix_size;
556                  size_t          max_order_raw_size;
557                  size_t          max_order_war_size;
558                  size_t          max_order_waw_size;
559                  uint64_t        mem_tag_format;
560                  size_t          tx_ctx_cnt;
561                  size_t          rx_ctx_cnt;
562                  size_t          auth_key_size;
563                  uint8_t         *auth_key;
564              };
565
566   type - Endpoint Type
567       If  specified, indicates the type of fabric interface communication de‐
568       sired.  Supported types are:
569
570       FI_EP_UNSPEC
571              The type of endpoint is not specified.  This is usually provided
572              as  input, with other attributes of the endpoint or the provider
573              selecting the type.
574
575       FI_EP_MSG
576              Provides a reliable, connection-oriented data  transfer  service
577              with flow control that maintains message boundaries.
578
579       FI_EP_DGRAM
580              Supports  a  connectionless,  unreliable datagram communication.
581              Message boundaries are maintained, but the maximum message  size
582              may  be  limited to the fabric MTU.  Flow control is not guaran‐
583              teed.
584
585       FI_EP_RDM
586              Reliable datagram message.  Provides a reliable, unconnected da‐
587              ta  transfer  service  with  flow control that maintains message
588              boundaries.
589
590       FI_EP_SOCK_STREAM
591              Data streaming endpoint with TCP  socket-like  semantics.   Pro‐
592              vides a reliable, connection-oriented data transfer service that
593              does not maintain message boundaries.  FI_EP_SOCK_STREAM is most
594              useful  for applications designed around using TCP sockets.  See
595              the SOCKET ENDPOINT section for additional details and  restric‐
596              tions that apply to stream endpoints.
597
598       FI_EP_SOCK_DGRAM
599              A  connectionless,  unreliable  datagram endpoint with UDP sock‐
600              et-like semantics.  FI_EP_SOCK_DGRAM is most useful for applica‐
601              tions  designed  around  using UDP sockets.  See the SOCKET END‐
602              POINT section for additional details and restrictions that apply
603              to datagram socket endpoints.
604
605   Protocol
606       Specifies  the  low-level end to end protocol employed by the provider.
607       A matching protocol must be used by communicating endpoints  to  ensure
608       interoperability.  The following protocol values are defined.  Provider
609       specific protocols are also allowed.  Provider specific protocols  will
610       be indicated by having the upper bit of the protocol value set to one.
611
612       FI_PROTO_UNSPEC
613              The  protocol is not specified.  This is usually provided as in‐
614              put, with other attributes of the socket or the provider select‐
615              ing the actual protocol.
616
617       FI_PROTO_RDMA_CM_IB_RC
618              The  protocol  runs  over  Infiniband  reliable-connected  queue
619              pairs, using the RDMA CM protocol for connection establishment.
620
621       FI_PROTO_IWARP
622              The protocol runs over the  Internet  wide  area  RDMA  protocol
623              transport.
624
625       FI_PROTO_IB_UD
626              The  protocol  runs  over  Infiniband  unreliable datagram queue
627              pairs.
628
629       FI_PROTO_PSMX
630              The protocol is based on an Intel proprietary protocol known  as
631              PSM,  performance scaled messaging.  PSMX is an extended version
632              of the PSM protocol to support the libfabric interfaces.
633
634       FI_PROTO_UDP
635              The protocol sends and receives UDP datagrams.  For example,  an
636              endpoint  using  FI_PROTO_UDP will be able to communicate with a
637              remote peer that is using Berkeley SOCK_DGRAM sockets using  IP‐
638              PROTO_UDP.
639
640       FI_PROTO_SOCK_TCP
641              The protocol is layered over TCP packets.
642
643       FI_PROTO_IWARP_RDM
644              Reliable-datagram  protocol implemented over iWarp reliable-con‐
645              nected queue pairs.
646
647       FI_PROTO_IB_RDM
648              Reliable-datagram protocol  implemented  over  InfiniBand  reli‐
649              able-connected queue pairs.
650
651       FI_PROTO_GNI
652              Protocol runs over Cray GNI low-level interface.
653
654       FI_PROTO_RXM
655              Reliable-datagram  protocol  implemented over message endpoints.
656              RXM is a libfabric utility component that adds RDM endpoint  se‐
657              mantics over MSG endpoint semantics.
658
659       FI_PROTO_RXD
660              Reliable-datagram  protocol implemented over datagram endpoints.
661              RXD is a libfabric utility component that adds RDM endpoint  se‐
662              mantics over DGRAM endpoint semantics.
663
664       FI_PROTO_NETWORKDIRECT
665              Protocol  runs over Microsoft NetworkDirect service provider in‐
666              terface.  This adds reliable-datagram semantics  over  the  Net‐
667              workDirect connection- oriented endpoint semantics.
668
669       FI_PROTO_PSMX2
670              The  protocol is based on an Intel proprietary protocol known as
671              PSM2, performance scaled messaging version 2.  PSMX2 is  an  ex‐
672              tended version of the PSM2 protocol to support the libfabric in‐
673              terfaces.
674
675   protocol_version - Protocol Version
676       Identifies which version of the protocol is employed by  the  provider.
677       The  protocol  version allows providers to extend an existing protocol,
678       by adding support for additional features or functionality for example,
679       in a backward compatible manner.  Providers that support different ver‐
680       sions of the same protocol should inter-operate, but  only  when  using
681       the capabilities defined for the lesser version.
682
683   max_msg_size - Max Message Size
684       Defines  the  maximum size for an application data transfer as a single
685       operation.
686
687   msg_prefix_size - Message Prefix Size
688       Specifies the size of any required message prefix buffer  space.   This
689       field  will be 0 unless the FI_MSG_PREFIX mode is enabled.  If msg_pre‐
690       fix_size is > 0 the specified value will be a multiple of 8-bytes.
691
692   Max RMA Ordered Size
693       The maximum ordered size specifies the delivery order of transport data
694       into  target  memory  for  RMA and atomic operations.  Data ordering is
695       separate, but dependent on message ordering (defined below).  Data  or‐
696       dering is unspecified where message order is not defined.
697
698       Data ordering refers to the access of target memory by subsequent oper‐
699       ations.  When back to back RMA read or write operations access the same
700       registered  memory location, data ordering indicates whether the second
701       operation reads or writes the target memory after the  first  operation
702       has  completed.   Because  RMA ordering applies between two operations,
703       and not within a single data transfer, ordering is defined per byte-ad‐
704       dressable memory location.  I.e.  ordering specifies whether location X
705       is accessed by the second operation after the first operation.  Nothing
706       is  implied about the completion of the first operation before the sec‐
707       ond operation is initiated.
708
709       In order to support large data transfers  being  broken  into  multiple
710       packets and sent using multiple paths through the fabric, data ordering
711       may be limited to transfers of a  specific  size  or  less.   Providers
712       specify  when data ordering is maintained through the following values.
713       Note that even if data ordering is not maintained, message ordering may
714       be.
715
716       max_order_raw_size
717              Read  after write size.  If set, an RMA or atomic read operation
718              issued after an RMA or atomic write operation, both of which are
719              smaller than the size, will be ordered.  Where the target memory
720              locations overlap, the RMA or atomic read operation will see the
721              results of the previous RMA or atomic write.
722
723       max_order_war_size
724              Write after read size.  If set, an RMA or atomic write operation
725              issued after an RMA or atomic read operation, both of which  are
726              smaller  than the size, will be ordered.  The RMA or atomic read
727              operation will see the initial value of the target memory  loca‐
728              tion before a subsequent RMA or atomic write updates the value.
729
730       max_order_waw_size
731              Write  after  write size.  If set, an RMA or atomic write opera‐
732              tion issued after an RMA or  atomic  write  operation,  both  of
733              which  are  smaller  than the size, will be ordered.  The target
734              memory location will reflect the results of the  second  RMA  or
735              atomic write.
736
737       An  order size value of 0 indicates that ordering is not guaranteed.  A
738       value of -1 guarantees ordering for any data size.
739
740   mem_tag_format - Memory Tag Format
741       The memory tag format is a bit array  used  to  convey  the  number  of
742       tagged  bits  supported by a provider.  Additionally, it may be used to
743       divide the bit array into separate fields.  The mem_tag_format  option‐
744       ally  begins  with a series of bits set to 0, to signify bits which are
745       ignored by the provider.  Following the initial prefix of ignored bits,
746       the  array will consist of alternating groups of bits set to all 1's or
747       all 0's.  Each group of bits corresponds to a tagged field.  The impli‐
748       cation of defining a tagged field is that when a mask is applied to the
749       tagged bit array, all bits belonging to a single field will  either  be
750       set to 1 or 0, collectively.
751
752       For example, a mem_tag_format of 0x30FF indicates support for 14 tagged
753       bits, separated into 3 fields.  The first field consists of 2-bits, the
754       second  field 4-bits, and the final field 8-bits.  Valid masks for such
755       a tagged field would be a bitwise OR'ing of zero or more of the follow‐
756       ing  values: 0x3000, 0x0F00, and 0x00FF.  The provider may not validate
757       the mask provided by the application for performance reasons.
758
759       By identifying fields within a tag, a provider may be able to  optimize
760       their  search  routines.  An application which requests tag fields must
761       provide tag masks that either set all  mask  bits  corresponding  to  a
762       field  to  all 0 or all 1.  When negotiating tag fields, an application
763       can request a specific number of fields of a given  size.   A  provider
764       must  return a tag format that supports the requested number of fields,
765       with each field being at least the size requested, or fail the request.
766       A provider may increase the size of the fields.  When reporting comple‐
767       tions (see FI_CQ_FORMAT_TAGGED), it is not guaranteed that the provider
768       would  clear  out any unsupported tag bits in the tag field of the com‐
769       pletion entry.
770
771       It is recommended that field sizes be ordered from smallest to largest.
772       A  generic,  unstructured  tag and mask can be achieved by requesting a
773       bit array consisting of alternating 1's and 0's.
774
775   tx_ctx_cnt - Transmit Context Count
776       Number of transmit contexts to associate with  the  endpoint.   If  not
777       specified (0), 1 context will be assigned if the endpoint supports out‐
778       bound transfers.  Transmit contexts  are  independent  transmit  queues
779       that  may be separately configured.  Each transmit context may be bound
780       to a separate CQ, and no ordering is defined between  contexts.   Addi‐
781       tionally,  no synchronization is needed when accessing contexts in par‐
782       allel.
783
784       If the count is set to the value FI_SHARED_CONTEXT, the  endpoint  will
785       be  configured  to  use  a shared transmit context, if supported by the
786       provider.  Providers that do not support shared transmit contexts  will
787       fail the request.
788
789       See  the  scalable endpoint and shared contexts sections for additional
790       details.
791
792   rx_ctx_cnt - Receive Context Count
793       Number of receive contexts to associate  with  the  endpoint.   If  not
794       specified,  1 context will be assigned if the endpoint supports inbound
795       transfers.  Receive contexts are independent processing queues that may
796       be separately configured.  Each receive context may be bound to a sepa‐
797       rate CQ, and no ordering is defined between contexts.  Additionally, no
798       synchronization is needed when accessing contexts in parallel.
799
800       If  the  count is set to the value FI_SHARED_CONTEXT, the endpoint will
801       be configured to use a shared receive  context,  if  supported  by  the
802       provider.   Providers  that do not support shared receive contexts will
803       fail the request.
804
805       See the scalable endpoint and shared contexts sections  for  additional
806       details.
807
808   auth_key_size - Authorization Key Length
809       The  length of the authorization key in bytes.  This field will be 0 if
810       authorization keys are not available or used.  This  field  is  ignored
811       unless the fabric is opened with API version 1.5 or greater.
812
813   auth_key - Authorization Key
814       If  supported  by the fabric, an authorization key (a.k.a.  job key) to
815       associate with the endpoint.  An authorization key  is  used  to  limit
816       communication  between  endpoints.   Only  peer endpoints that are pro‐
817       grammed to use the same authorization key may communicate.   Authoriza‐
818       tion keys are often used to implement job keys, to ensure that process‐
819       es running in different jobs do not accidentally  cross  traffic.   The
820       domain  authorization  key  will  be used if auth_key_size is set to 0.
821       This field is ignored unless the fabric is opened with API version  1.5
822       or greater.
823

TRANSMIT CONTEXT ATTRIBUTES

825       Attributes  specific  to  the  transmit capabilities of an endpoint are
826       specified using struct fi_tx_attr.
827
828              struct fi_tx_attr {
829                  uint64_t  caps;
830                  uint64_t  mode;
831                  uint64_t  op_flags;
832                  uint64_t  msg_order;
833                  uint64_t  comp_order;
834                  size_t    inject_size;
835                  size_t    size;
836                  size_t    iov_limit;
837                  size_t    rma_iov_limit;
838              };
839
840   caps - Capabilities
841       The requested capabilities of the context.  The capabilities must be  a
842       subset of those requested of the associated endpoint.  See the CAPABIL‐
843       ITIES section of fi_getinfo(3) for capability  details.   If  the  caps
844       field  is  0 on input to fi_getinfo(3), the caps value from the fi_info
845       structure will be used.
846
847   mode
848       The operational mode bits of the context.  The mode bits will be a sub‐
849       set  of  those  associated  with the endpoint.  See the MODE section of
850       fi_getinfo(3) for details.  A mode value of 0 will be ignored on  input
851       to fi_getinfo(3), with the mode value of the fi_info structure used in‐
852       stead.  On return from fi_getinfo(3), the mode  will  be  set  only  to
853       those constraints specific to transmit operations.
854
855   op_flags - Default transmit operation flags
856       Flags  that  control  the operation of operations submitted against the
857       context.  Applicable flags are listed in the Operation Flags section.
858
859   msg_order - Message Ordering
860       Message ordering refers to the order in which transport  layer  headers
861       (as  viewed  by  the application) are processed.  Relaxed message order
862       enables data transfers to be sent and received out of order, which  may
863       improve performance by utilizing multiple paths through the fabric from
864       the initiating endpoint to a target endpoint.   Message  order  applies
865       only  between  a single source and destination endpoint pair.  Ordering
866       between different target endpoints is not defined.
867
868       Message order is determined using a set of ordering bits.  Each set bit
869       indicates  that  ordering  is  maintained between data transfers of the
870       specified type.  Message order is defined for [read | write | send] op‐
871       erations submitted by an application after [read | write | send] opera‐
872       tions.
873
874       Message ordering only applies to the end to end transmission of  trans‐
875       port  headers.   Message ordering is necessary, but does not guarantee,
876       the order in which message data is sent or received  by  the  transport
877       layer.   Message  ordering  requires matching ordering semantics on the
878       receiving side of a data transfer operation in order to guarantee  that
879       ordering is met.
880
881       FI_ORDER_NONE
882              No  ordering  is  specified.  This value may be used as input in
883              order to obtain the  default  message  order  supported  by  the
884              provider.  FI_ORDER_NONE is an alias for the value 0.
885
886       FI_ORDER_RAR
887              Read  after  read.   If  set, RMA and atomic read operations are
888              transmitted in the order submitted relative  to  other  RMA  and
889              atomic read operations.  If not set, RMA and atomic reads may be
890              transmitted out of order from their submission.
891
892       FI_ORDER_RAW
893              Read after write.  If set, RMA and atomic  read  operations  are
894              transmitted  in  the  order submitted relative to RMA and atomic
895              write operations.  If not set,  RMA  and  atomic  reads  may  be
896              transmitted ahead of RMA and atomic writes.
897
898       FI_ORDER_RAS
899              Read  after  send.   If  set, RMA and atomic read operations are
900              transmitted in the order submitted relative to message send  op‐
901              erations,  including  tagged  sends.  If not set, RMA and atomic
902              reads may be transmitted ahead of sends.
903
904       FI_ORDER_WAR
905              Write after read.  If set, RMA and atomic write  operations  are
906              transmitted  in  the  order submitted relative to RMA and atomic
907              read operations.  If not set,  RMA  and  atomic  writes  may  be
908              transmitted ahead of RMA and atomic reads.
909
910       FI_ORDER_WAW
911              Write  after write.  If set, RMA and atomic write operations are
912              transmitted in the order submitted relative  to  other  RMA  and
913              atomic  write operations.  If not set, RMA and atomic writes may
914              be transmitted out of order from their submission.
915
916       FI_ORDER_WAS
917              Write after send.  If set, RMA and atomic write  operations  are
918              transmitted  in the order submitted relative to message send op‐
919              erations, including tagged sends.  If not set,  RMA  and  atomic
920              writes may be transmitted ahead of sends.
921
922       FI_ORDER_SAR
923              Send  after  read.   If  set, message send operations, including
924              tagged sends, are transmitted in order submitted relative to RMA
925              and  atomic  read  operations.  If not set, message sends may be
926              transmitted ahead of RMA and atomic reads.
927
928       FI_ORDER_SAW
929              Send after write.  If set, message  send  operations,  including
930              tagged sends, are transmitted in order submitted relative to RMA
931              and atomic write operations.  If not set, message sends  may  be
932              transmitted ahead of RMA and atomic writes.
933
934       FI_ORDER_SAS
935              Send  after  send.   If  set, message send operations, including
936              tagged sends, are transmitted in the order submitted relative to
937              other  message send.  If not set, message sends may be transmit‐
938              ted out of order from their submission.
939
940   comp_order - Completion Ordering
941       Completion ordering refers to the order in which completed requests are
942       written  into  the completion queue.  Completion ordering is similar to
943       message order.  Relaxed completion order may enable faster reporting of
944       completed  transfers,  allow  acknowledgments to be sent over different
945       fabric paths, and support more sophisticated  retry  mechanisms.   This
946       can result in lower-latency completions, particularly when using uncon‐
947       nected  endpoints.   Strict  completion  ordering  may   require   that
948       providers queue completed operations or limit available optimizations.
949
950       For transmit requests, completion ordering depends on the endpoint com‐
951       munication type.  For unreliable communication, completion ordering ap‐
952       plies  to all data transfer requests submitted to an endpoint.  For re‐
953       liable communication, completion ordering only applies to requests that
954       target  a single destination endpoint.  Completion ordering of requests
955       that target different endpoints over a reliable transport  is  not  de‐
956       fined.
957
958       Applications  should  specify the completion ordering that they support
959       or require.  Providers should return the completion order that they ac‐
960       tually  provide,  with  the  constraint  that  the returned ordering is
961       stricter than that specified by the application.  Supported  completion
962       order values are:
963
964       FI_ORDER_NONE
965              No  ordering is defined for completed operations.  Requests sub‐
966              mitted to the transmit context may complete in any order.
967
968       FI_ORDER_STRICT
969              Requests complete in the order in which they  are  submitted  to
970              the transmit context.
971
972   inject_size
973       The  requested  inject operation size (see the FI_INJECT flag) that the
974       context will support.  This is the maximum size data transfer that  can
975       be  associated  with  an inject operation (such as fi_inject) or may be
976       used with the FI_INJECT data transfer flag.
977
978   size
979       The size of the context.  The size is specified as the  minimum  number
980       of  transmit  operations that may be posted to the endpoint without the
981       operation returning -FI_EAGAIN.
982
983   iov_limit
984       This is the maximum number of IO vectors (scatter-gather elements) that
985       a single posted operation may reference.
986
987   rma_iov_limit
988       This  is the maximum number of RMA IO vectors (scatter-gather elements)
989       that an RMA or atomic operation may reference.  The rma_iov_limit  cor‐
990       responds to the rma_iov_count values in RMA and atomic operations.  See
991       struct fi_msg_rma and struct fi_msg_atomic in fi_rma.3 and fi_atomic.3,
992       for  additional  details.  This limit applies to both the number of RMA
993       IO vectors that may be specified when initiating an operation from  the
994       local endpoint, as well as the maximum number of IO vectors that may be
995       carried in a single request from a remote endpoint.
996

RECEIVE CONTEXT ATTRIBUTES

998       Attributes specific to the receive  capabilities  of  an  endpoint  are
999       specified using struct fi_rx_attr.
1000
1001              struct fi_rx_attr {
1002                  uint64_t  caps;
1003                  uint64_t  mode;
1004                  uint64_t  op_flags;
1005                  uint64_t  msg_order;
1006                  uint64_t  comp_order;
1007                  size_t    total_buffered_recv;
1008                  size_t    size;
1009                  size_t    iov_limit;
1010              };
1011
1012   caps - Capabilities
1013       The  requested capabilities of the context.  The capabilities must be a
1014       subset of those requested of the associated endpoint.  See the CAPABIL‐
1015       ITIES  section  if  fi_getinfo(3)  for capability details.  If the caps
1016       field is 0 on input to fi_getinfo(3), the caps value from  the  fi_info
1017       structure will be used.
1018
1019   mode
1020       The operational mode bits of the context.  The mode bits will be a sub‐
1021       set of those associated with the endpoint.  See  the  MODE  section  of
1022       fi_getinfo(3)  for details.  A mode value of 0 will be ignored on input
1023       to fi_getinfo(3), with the mode value of the fi_info structure used in‐
1024       stead.   On  return  from  fi_getinfo(3),  the mode will be set only to
1025       those constraints specific to receive operations.
1026
1027   op_flags - Default receive operation flags
1028       Flags that control the operation of operations  submitted  against  the
1029       context.  Applicable flags are listed in the Operation Flags section.
1030
1031   msg_order - Message Ordering
1032       For  a  description of message ordering, see the msg_order field in the
1033       Transmit Context Attribute section.  Receive context  message  ordering
1034       defines  the order in which received transport message headers are pro‐
1035       cessed when received by an endpoint.
1036
1037       The following ordering flags, as defined for  transmit  ordering,  also
1038       apply  to  the processing of received operations: FI_ORDER_NONE, FI_OR‐
1039       DER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR, FI_ORDER_WAW, FI_OR‐
1040       DER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, and FI_ORDER_SAS.
1041
1042   comp_order - Completion Ordering
1043       For  a  description of completion ordering, see the comp_order field in
1044       the Transmit Context Attribute section.
1045
1046       FI_ORDER_NONE
1047              No ordering is defined for completed operations.  Receive opera‐
1048              tions  may complete in any order, regardless of their submission
1049              order.
1050
1051       FI_ORDER_STRICT
1052              Receive operations complete in the order in which they are  pro‐
1053              cessed by the receive context, based on the receive side msg_or‐
1054              der attribute.
1055
1056       FI_ORDER_DATA
1057              When set, this bit indicates that received data is written  into
1058              memory  in  order.   Data ordering applies to memory accessed as
1059              part of a single operation and between operations if message or‐
1060              dering is guaranteed.
1061
1062   total_buffered_recv
1063       This  field is supported for backwards compatibility purposes.  It is a
1064       hint to the provider of the total available space that may be needed to
1065       buffer  messages  that  are received for which there is no matching re‐
1066       ceive operation.  The provider may adjust or ignore  this  value.   The
1067       allocation  of  internal  network  buffering  among received message is
1068       provider specific.  For instance, a provider may limit the size of mes‐
1069       sages  which  can be buffered or the amount of buffering allocated to a
1070       single message.
1071
1072       If receive side buffering is disabled (total_buffered_recv = 0)  and  a
1073       message  is  received by an endpoint, then the behavior is dependent on
1074       whether resource management has been enabled (FI_RM_ENABLED has be  set
1075       or  not).   See the Resource Management section of fi_domain.3 for fur‐
1076       ther clarification.  It is recommended  that  applications  enable  re‐
1077       source  management  if  they  anticipate receiving unexpected messages,
1078       rather than modifying this value.
1079
1080   size
1081       The size of the context.  The size is specified as the  minimum  number
1082       of  receive  operations  that may be posted to the endpoint without the
1083       operation returning -FI_EAGAIN.
1084
1085   iov_limit
1086       This is the maximum number of IO vectors (scatter-gather elements) that
1087       a single posted operating may reference.
1088

SCALABLE ENDPOINTS

1090       A  scalable  endpoint  is a communication portal that supports multiple
1091       transmit and receive contexts.  Scalable endpoints are loosely  modeled
1092       after  the  networking  concept  of transmit/receive side scaling, also
1093       known as multi-queue.  Support for scalable endpoints is domain specif‐
1094       ic.   Scalable  endpoints may improve the performance of multi-threaded
1095       and parallel applications, by allowing threads  to  access  independent
1096       transmit  and  receive queues.  A scalable endpoint has a single trans‐
1097       port level address, which can reduce the memory requirements needed  to
1098       store  remote  addressing data, versus using standard endpoints.  Scal‐
1099       able endpoints cannot be used directly  for  communication  operations,
1100       and  require  the application to explicitly create transmit and receive
1101       contexts as described below.
1102
1103   fi_tx_context
1104       Transmit contexts are independent transmit queues.  Ordering  and  syn‐
1105       chronization between contexts are not defined.  Conceptually a transmit
1106       context behaves similar to a send-only endpoint.   A  transmit  context
1107       may  be  configured  with fewer capabilities than the base endpoint and
1108       with different attributes (such as  ordering  requirements  and  inject
1109       size)  than  other contexts associated with the same scalable endpoint.
1110       Each transmit context has its own  completion  queue.   The  number  of
1111       transmit  contexts associated with an endpoint is specified during end‐
1112       point creation.
1113
1114       The fi_tx_context call is used to retrieve a specific context,  identi‐
1115       fied  by  an  index  (see  above  for  details  on transmit context at‐
1116       tributes).  Providers may dynamically allocate contexts when fi_tx_con‐
1117       text  is called, or may statically create all contexts when fi_endpoint
1118       is invoked.  By default, a transmit context inherits the properties  of
1119       its  associated  endpoint.   However,  applications may request context
1120       specific attributes through the attr parameter.  Support for per trans‐
1121       mit  context  attributes  is  provider  specific  and  not  guaranteed.
1122       Providers will return the actual attributes  assigned  to  the  context
1123       through the attr parameter, if provided.
1124
1125   fi_rx_context
1126       Receive  contexts are independent receive queues for receiving incoming
1127       data.  Ordering and synchronization between contexts  are  not  guaran‐
1128       teed.  Conceptually a receive context behaves similar to a receive-only
1129       endpoint.  A receive context may be configured with fewer  capabilities
1130       than  the base endpoint and with different attributes (such as ordering
1131       requirements and inject size) than other contexts associated  with  the
1132       same  scalable  endpoint.   Each receive context has its own completion
1133       queue.  The number of receive contexts associated with an  endpoint  is
1134       specified during endpoint creation.
1135
1136       Receive contexts are often associated with steering flows, that specify
1137       which incoming packets targeting a scalable endpoint to process.   How‐
1138       ever,  receive  contexts  may be targeted directly by the initiator, if
1139       supported by the underlying protocol.  Such contexts are referred to as
1140       'named'.   Support  for named contexts must be indicated by setting the
1141       caps FI_NAMED_RX_CTX capability when the corresponding endpoint is cre‐
1142       ated.   Support  for named receive contexts is coordinated with address
1143       vectors.  See fi_av(3) and fi_rx_addr(3).
1144
1145       The fi_rx_context call is used to retrieve a specific context,  identi‐
1146       fied by an index (see above for details on receive context attributes).
1147       Providers may  dynamically  allocate  contexts  when  fi_rx_context  is
1148       called,  or  may statically create all contexts when fi_endpoint is in‐
1149       voked.  By default, a receive context inherits the  properties  of  its
1150       associated endpoint.  However, applications may request context specif‐
1151       ic attributes through the attr parameter.  Support for per receive con‐
1152       text  attributes  is  provider  specific and not guaranteed.  Providers
1153       will return the actual attributes assigned to the context  through  the
1154       attr parameter, if provided.
1155

SHARED CONTEXTS

1157       Shared  contexts  are  transmit  and receive contexts explicitly shared
1158       among one or more endpoints.  A shareable context allows an application
1159       to  use  a  single dedicated provider resource among multiple transport
1160       addressable endpoints.  This can greatly reduce the resources needed to
1161       manage  communication  over multiple endpoints by multiplexing transmit
1162       and/or receive processing, with the potential cost of  serializing  ac‐
1163       cess  across multiple endpoints.  Support for shareable contexts is do‐
1164       main specific.
1165
1166       Conceptually, shareable transmit contexts are transmit queues that  may
1167       be accessed by many endpoints.  The use of a shared transmit context is
1168       mostly opaque to an application.  Applications must allocate  and  bind
1169       shared  transmit  contexts  to endpoints, but operations are posted di‐
1170       rectly to the endpoint.  Shared transmit contexts  are  not  associated
1171       with completion queues or counters.  Completed operations are posted to
1172       the CQs bound to the endpoint.  An endpoint may only be associated with
1173       a single shared transmit context.
1174
1175       Unlike  shared  transmit  contexts, applications interact directly with
1176       shared receive contexts.  Users post  receive  buffers  directly  to  a
1177       shared  receive  context, with the buffers usable by any endpoint bound
1178       to the shared receive context.  Shared receive contexts are not associ‐
1179       ated  with completion queues or counters.  Completed receive operations
1180       are posted to the CQs bound to the endpoint.  An endpoint may  only  be
1181       associated  with  a single receive context, and all connectionless end‐
1182       points associated with a shared receive context  must  also  share  the
1183       same address vector.
1184
1185       Endpoints  associated  with a shared transmit context may use dedicated
1186       receive contexts, and vice-versa.  Or an endpoint may use shared trans‐
1187       mit  and  receive  contexts.  And there is no requirement that the same
1188       group of endpoints sharing a context of one type also share the context
1189       of  an  alternate type.  Furthermore, an endpoint may use a shared con‐
1190       text of one type, but a scalable set of contexts of the alternate type.
1191
1192   fi_stx_context
1193       This call is used to open a shareable transmit context (see  above  for
1194       details on the transmit context attributes).  Endpoints associated with
1195       a shared transmit context must use a subset of the  transmit  context's
1196       attributes.   Note  that  this  is  the  reverse of the requirement for
1197       transmit contexts for scalable endpoints.
1198
1199   fi_srx_context
1200       This allocates a shareable receive context (see above  for  details  on
1201       the  receive  context  attributes).  Endpoints associated with a shared
1202       receive context must use a subset of the receive context's  attributes.
1203       Note  that  this is the reverse of the requirement for receive contexts
1204       for scalable endpoints.
1205

SOCKET ENDPOINTS

1207       The following feature and description should be  considered  experimen‐
1208       tal.  Until the experimental tag is removed, the interfaces, semantics,
1209       and data structures associated with socket endpoints may change between
1210       library versions.
1211
1212       This  section  applies  to  endpoints  of  type  FI_EP_SOCK_STREAM  and
1213       FI_EP_SOCK_DGRAM, commonly referred to as socket endpoints.
1214
1215       Socket endpoints are defined with semantics that  allow  them  to  more
1216       easily  be  adopted by developers familiar with the UNIX socket API, or
1217       by middleware that exposes the socket API, while still taking advantage
1218       of high-performance hardware features.
1219
1220       The  key difference between socket endpoints and other active endpoints
1221       are socket endpoints use synchronous data  transfers.   Buffers  passed
1222       into  send and receive operations revert to the control of the applica‐
1223       tion upon returning from the function  call.   As  a  result,  no  data
1224       transfer  completions  are reported to the application, and socket end‐
1225       points are not associated with completion queues or counters.
1226
1227       Socket endpoints support  a  subset  of  message  operations:  fi_send,
1228       fi_sendv,  fi_sendmsg,  fi_recv,  fi_recvv,  fi_recvmsg, and fi_inject.
1229       Because data transfers are synchronous, the return value from send  and
1230       receive operations indicate the number of bytes transferred on success,
1231       or a negative value on error, including -FI_EAGAIN if the endpoint can‐
1232       not  send  or receive any data because of full or empty queues, respec‐
1233       tively.
1234
1235       Socket endpoints are associated with event queues and address  vectors,
1236       and  process  connection  management  events asynchronously, similar to
1237       other endpoints.  Unlike UNIX sockets, socket endpoint  must  still  be
1238       declared as either active or passive.
1239
1240       Socket endpoints behave like non-blocking sockets.  In order to support
1241       select and poll semantics, active socket endpoints are associated  with
1242       a  file  descriptor  that is signaled whenever the endpoint is ready to
1243       send and/or receive data.  The file descriptor may be  retrieved  using
1244       fi_control.
1245

OPERATION FLAGS

1247       Operation  flags  are  obtained by OR-ing the following flags together.
1248       Operation flags define the default flags applied to an endpoint's  data
1249       transfer  operations,  where  a flags parameter is not available.  Data
1250       transfer operations that take flags as input override the op_flags val‐
1251       ue of transmit or receive context attributes of an endpoint.
1252
1253       FI_INJECT
1254              Indicates  that  all outbound data buffers should be returned to
1255              the user's control immediately after a data  transfer  call  re‐
1256              turns,  even  if  the operation is handled asynchronously.  This
1257              may require that the provider copy the data into a local  buffer
1258              and transfer out of that buffer.  A provider can limit the total
1259              amount of send data that may be buffered and/or the  size  of  a
1260              single send that can use this flag.  This limit is indicated us‐
1261              ing inject_size (see inject_size above).
1262
1263       FI_MULTI_RECV
1264              Applies to posted receive operations.  This flag allows the user
1265              to post a single buffer that will receive multiple incoming mes‐
1266              sages.  Received messages will be packed into the receive buffer
1267              until  the buffer has been consumed.  Use of this flag may cause
1268              a single posted receive operation to generate  multiple  comple‐
1269              tions  as messages are placed into the buffer.  The placement of
1270              received data into the buffer may be subjected to provider  spe‐
1271              cific  alignment  restrictions.   The buffer will be released by
1272              the provider when the available buffer  space  falls  below  the
1273              specified minimum (see FI_OPT_MIN_MULTI_RECV).
1274
1275       FI_COMPLETION
1276              Indicates  that  a completion entry should be generated for data
1277              transfer operations.  This flag only applies to  operations  is‐
1278              sued  on  endpoints  that were bound to a CQ or counter with the
1279              FI_SELECTIVE_COMPLETION flag.  See the fi_ep_bind section  above
1280              for more detail.
1281
1282       FI_INJECT_COMPLETE
1283              Indicates  that a completion should be generated when the source
1284              buffer(s) may be reused.  See fi_cq(3) for additional details on
1285              completion semantics.
1286
1287       FI_TRANSMIT_COMPLETE
1288              Indicates  that a completion should be generated when the trans‐
1289              mit operation has completed relative to the local provider.  See
1290              fi_cq(3) for additional details on completion semantics.
1291
1292       FI_DELIVERY_COMPLETE
1293              Indicates  that a completion should be generated when the opera‐
1294              tion has been processed by  the  destination  endpoint(s).   See
1295              fi_cq(3) for additional details on completion semantics.
1296
1297       FI_COMMIT_COMPLETE
1298              Indicates  that a completion should not be generated (locally or
1299              at the peer) until the result of an  operation  have  been  made
1300              persistent.   See  fi_cq(3) for additional details on completion
1301              semantics.
1302
1303       FI_MULTICAST
1304              Indicates that data transfers will target multicast addresses by
1305              default.   Any  fi_addr_t  passed into a data transfer operation
1306              will be treated as a multicast address.
1307

NOTES

1309       Users should call fi_close to release all resources  allocated  to  the
1310       fabric endpoint.
1311
1312       Endpoints allocated with the FI_CONTEXT mode set must typically provide
1313       struct fi_context as  their  per  operation  context  parameter.   (See
1314       fi_getinfo.3 for details.) However, when FI_SELECTIVE_COMPLETION is en‐
1315       abled to suppress completion entries, and  an  operation  is  initiated
1316       without  FI_COMPLETION flag set, then the context parameter is ignored.
1317       An application does not need to pass in a valid struct fi_context  into
1318       such data transfers.
1319
1320       Operations  that  complete  in error that are not associated with valid
1321       operational context will use the endpoint context in any error  report‐
1322       ing structures.
1323
1324       Although  applications  typically associate individual completions with
1325       either completion queues or counters, an endpoint can  be  attached  to
1326       both  a  counter and completion queue.  When combined with using selec‐
1327       tive completions, this allows an application to use counters  to  track
1328       successful  completions,  with  a CQ used to report errors.  Operations
1329       that complete with an error increment the error counter and generate  a
1330       completion  event.   The generation of entries going to the CQ can then
1331       be controlled using FI_SELECTIVE_COMPLETION.
1332
1333       As mentioned in fi_getinfo(3), the ep_attr structure  can  be  used  to
1334       query  providers  that support various endpoint attributes.  fi_getinfo
1335       can return provider info structures that can support the minimal set of
1336       requirements (such that the application maintains correctness).  Howev‐
1337       er, it can also return provider info structures that exceed application
1338       requirements.   As  an  example,  consider  an  application  requesting
1339       msg_order as FI_ORDER_NONE.  The resulting output from  fi_getinfo  may
1340       have all the ordering bits set.  The application can reset the ordering
1341       bits it does not require before creating the endpoint.  The provider is
1342       free  to implement a stricter ordering than is required by the applica‐
1343       tion.
1344

RETURN VALUES

1346       Returns 0 on success.  On error, a negative value corresponding to fab‐
1347       ric  errno  is  returned.  For fi_cancel, a return value of 0 indicates
1348       that the cancel request was submitted for processing.
1349
1350       Fabric errno values are defined in rdma/fi_errno.h.
1351

ERRORS

1353       -FI_EDOMAIN
1354              A resource domain was not bound to the endpoint  or  an  attempt
1355              was made to bind multiple domains.
1356
1357       -FI_ENOCQ
1358              The endpoint has not been configured with necessary event queue.
1359
1360       -FI_EOPBADSTATE
1361              The endpoint's state does not permit the requested operation.
1362

AUTHORS

1368       OpenFabrics.
1369
1370
1371
1372Libfabric Programmer's Manual     2018-11-30                    fi_endpoint(3)