fi_psm2(7)

1fi_psm2(7)                     Libfabric v1.6.1                     fi_psm2(7)
2
3
4

NAME

6       fi_psm2 - The PSM2 Fabric Provider
7

OVERVIEW

9       The  psm2 provider runs over the PSM 2.x interface that is supported by
10       the Intel Omni-Path Fabric.  PSM 2.x has all the PSM 1.x features  plus
11       a  set  of new functions with enhanced capabilities.  Since PSM 1.x and
12       PSM 2.x are not ABI compatible the psm2 provider only  works  with  PSM
13       2.x and doesn't support Intel TrueScale Fabric.
14

LIMITATIONS

16       The  psm2 provider doesn't support all the features defined in the lib‐
17       fabric API.  Here are some of the limitations:
18
19       Endpoint types : Only support non-connection based types  FI_DGRAM  and
20       FI_RDM
21
22       Endpoint  capabilities  : Endpoints can support any combination of data
23       transfer capabilities FI_TAGGED, FI_MSG, FI_ATOMICS, and FI_RMA.  These
24       capabilities  can  be  further  refined  by  FI_SEND, FI_RECV, FI_READ,
25       FI_WRITE, FI_REMOTE_READ, and FI_REMOTE_WRITE to limit the direction of
26       operations.
27
28       FI_MULTI_RECV is supported for non-tagged message queue only.
29
30       Scalable  endpoints  are  supported if the underlying PSM2 library sup‐
31       ports multiple endpoints.  This condition must be satisfied  both  when
32       the  provider is built and when the provider is used.  See the Scalable
33       endpoints section for more information.
34
35       Other supported  capabilities  include  FI_TRIGGER,  FI_REMOTE_CQ_DATA,
36       FI_RMA_EVENT,     FI_SOURCE,     and    FI_SOURCE_ERR.     Furthermore,
37       FI_NAMED_RX_CTX is supported when scalable endpoints are enabled.
38
39       Modes : FI_CONTEXT is required for the FI_TAGGED and  FI_MSG  capabili‐
40       ties.   That  means, any request belonging to these two categories that
41       generates a completion must pass  as  the  operation  context  a  valid
42       pointer  to  type  struct  fi_context,  and the space referenced by the
43       pointer must remain untouched until the request has completed.  If none
44       of  FI_TAGGED  and  FI_MSG  is  asked  for,  the FI_CONTEXT mode is not
45       required.
46
47       Progress : The psm2 provider requires manual progress.  The application
48       is  expected  to  call fi_cq_read or fi_cntr_read function from time to
49       time when no other libfabric function is called to ensure  progress  is
50       made in a timely manner.  The provider does support auto progress mode.
51       However, the performance can be significantly impacted if the  applica‐
52       tion purely depends on the provider to make auto progress.
53
54       Scalable endpoints : Scalable endpoints support depends on the multi-EP
55       feature of the PSM2 library.  If the PSM2 library  supports  this  fea‐
56       ture, the availability is further controlled by an environment variable
57       PSM2_MULTI_EP.  The psm2 provider automatically sets this variable to 1
58       if  it  is  not set.  The feature can be disabled explicitly by setting
59       PSM2_MULTI_EP to 0.
60
61       When creating  a  scalable  endpoint,  the  exact  number  of  contexts
62       requested  should  be  set  in  the  "fi_info"  structure passed to the
63       fi_scalable_ep   function.    This   number   should    be    set    in
64       "fi_info->ep_attr->tx_ctx_cnt"   or  "fi_info->ep_attr->rx_ctx_cnt"  or
65       both, whichever greater is  used.   The  psm2  provider  allocates  all
66       requested  contexts upfront when the scalable endpoint is created.  The
67       same context is used for both Tx and Rx.
68
69       For optimal performance, it is advised to avoid having multiple threads
70       accessing    the    same    context,   either   directly   by   posting
71       send/recv/read/write request, or indirectly by polling associated  com‐
72       pletion queues or counters.
73
74       Shared Tx contexts : In order to achieve the purpose of saving PSM con‐
75       text by using shared Tx context, the endpoints bound to the  shared  Tx
76       contexts  need  to be Tx only.  The reason is that Rx capability always
77       requires a PSM context, which can also be automatically  used  for  Tx.
78       As  the result, allocating a shared Tx context for Rx capable endpoints
79       actually consumes one extra context instead of saving some.
80
81       Unsupported features : These features are unsupported: connection  man‐
82       agement, passive endpoint, and shared receive context.
83

RUNTIME PARAMETERS

85       The psm2 provider checks for the following environment variables:
86
87       FI_PSM2_UUID  : PSM requires that each job has a unique ID (UUID).  All
88       the processes in the same job need to use the same UUID in order to  be
89       able  to  talk to each other.  The PSM reference manual advises to keep
90       UUID unique to each job.  In practice, it generally works fine to reuse
91       UUID  as  long as (1) no two jobs with the same UUID are running at the
92       same time; and (2) previous jobs with the same UUID  have  exited  nor‐
93       mally.   If running into "resource busy" or "connection failure" issues
94       with unknown reason, it is advisable to manually  set  the  UUID  to  a
95       value different from the default.
96
97       The default UUID is 00FF00FF-0000-0000-0000-00FF0F0F00FF.
98
99       FI_PSM2_NAME_SERVER  :  The  psm2  provider  has a simple built-in name
100       server that can be used to resolve an IP address or host  name  into  a
101       transport address needed by the fi_av_insert call.  The main purpose of
102       this name server is to allow  simple  client-server  type  applications
103       (such  as those in fabtests) to be written purely with libfabric, with‐
104       out using any out-of-band communication mechanism.  For  such  applica‐
105       tions,  the  server  would  run first to allow endpoints be created and
106       registered with the name server, and then the client would call fi_get‐
107       info  with the node parameter set to the IP address or host name of the
108       server.  The resulting  fi_info  structure  would  have  the  transport
109       address  of  the endpoint created by the server in the dest_addr field.
110       Optionally the service parameter can  be  used  in  addition  to  node.
111       Notice  that  the  service number is interpreted by the provider and is
112       not a TCP/IP port number.
113
114       The name server is on by default.  It can be turned off by setting  the
115       variable  to 0.  This may save a small amount of resource since a sepa‐
116       rate thread is created when the name server is on.
117
118       The provider detects OpenMPI and MPICH runs  and  changes  the  default
119       setting to off.
120
121       FI_PSM2_TAGGED_RMA  :  The  RMA functions are implemented on top of the
122       PSM Active Message functions.  The Active Message functions have  limit
123       on  the  size  of  data  can be transferred in a single message.  Large
124       transfers can be divided into small chunks and be pipe-lined.  However,
125       the bandwidth is sub-optimal by doing this way.
126
127       The  psm2  provider  use  PSM  tag-matching  message queue functions to
128       achieve higher bandwidth for large size RMA.  It takes advantage of the
129       extra  tag  bits available in PSM2 to separate the RMA traffic from the
130       regular tagged message queue.
131
132       The option is on by default.  To turn it off set the variable to 0.
133
134       FI_PSM2_DELAY : Time (seconds) to sleep before closing  PSM  endpoints.
135       This is a workaround for a bug in some versions of PSM library.
136
137       The default setting is 0.
138
139       FI_PSM2_TIMEOUT  :  Timeout  (seconds)  for gracefully closing PSM end‐
140       points.  A forced closing will be issued if timeout expires.
141
142       The default setting is 5.
143
144       FI_PSM2_PROG_INTERVAL : When auto progress is enabled  (asked  via  the
145       hints  to  fi_getinfo),  a  progress thread is created to make progress
146       calls from time to time.  This option set the  interval  (microseconds)
147       between progress calls.
148
149       The  default  setting  is  1  if  affinity is set, or 1000 if not.  See
150       FI_PSM2_PROG_AFFINITY.
151
152       FI_PSM2_PROG_AFFINITY : When set, specify the set of CPU cores  to  set
153       the     progress     thread     affinity    to.     The    format    is
154       <start>[:<end>[:<stride>]][,<start>[:<end>[:<stride>]]]*,  where   each
155       triplet  <start>:<end>:<stride>  defines  a  block  of  core_ids.  Both
156       <start>  and  <end>  can  be  either  the   core_id   (when   >=0)   or
157       core_id - num_cores (when <0).
158
159       By default affinity is not set.
160
161       FI_PSM2_INJECT_SIZE  :  Maximum  message size allowed for fi_inject and
162       fi_tinject calls.  This is an experimental feature to allow some appli‐
163       cations  to  override  default inject size limitation.  When the inject
164       size is larger than the default value, some inject calls might block.
165
166       The default setting is 64.
167
168       FI_PSM2_LOCK_LEVEL : When set, dictate the level of locking being  used
169       by  the  provider.   Level 2 means all locks are enabled.  Level 1 dis‐
170       ables some locks and is suitable for runs that limit the access to each
171       PSM2  context  to a single thread.  Level 0 disables all locks and thus
172       is only suitable for single threaded runs.
173
174       To use level 0 or level 1, wait object and auto progress mode cannot be
175       used  because they introduce internal threads that may break the condi‐
176       tions needed for these levels.
177
178       The default setting is 2.
179
180       FI_PSM2_LAZY_CONN : Control when connections  are  established  between
181       PSM2  endpoints that OFI endpoints are built on top of.  When set to 0,
182       connections are  established  when  addresses  are  inserted  into  the
183       address  vector.   This  is  the eager connection mode.  When set to 1,
184       connections are established when addresses are used the first  time  in
185       communication.  This is the lazy connection mode.
186
187       Lazy  connection  mode may reduce the start-up time on large systems at
188       the expense of slightly higher data path  overhead.   For  applications
189       that  use  multiple  endpoints,  lazy connection mode can be especially
190       helpful with the potential of greatly reduce the time to set up address
191       vectors and to close endpoints.
192
193       The default setting is 0.
194
195       FI_PSM2_DISCONNECT : The provider has a mechanism to automatically send
196       disconnection notifications to all connected  peers  before  the  local
197       endpoint is closed.  As the response, the peers call psm2_ep_disconnect
198       to clean up the connection state at their side.  This allows  the  same
199       PSM2  epid be used by different dynamically started processes (clients)
200       to communicate with the same peer (server).  This  mechanism,  however,
201       introduce  extra  overhead to the finalization phase.  For applications
202       that never reuse epids within the same session such overhead is  unnec‐
203       essary.
204
205       This  option  controls whether the automatic disconnection notification
206       mechanism should be enabled.  For client-server  application  mentioned
207       above,  the  client  side  should  set this option to 1, but the server
208       should set it to 0.
209
210       The default setting is 0.
211
212       FI_PSM2_TAG_LAYOUT : Select how the 96-bit PSM2 tag bits are organized.
213       Currently three choices are available: tag60 means 32-4-60 partitioning
214       for CQ data, internal protocol flags, and application tag.  tag64 means
215       4-28-64 partitioning for internal protocol flags, CQ data, and applica‐
216       tion tag.  auto means to choose either tag60 or tag64 based on the  the
217       hints  passed  to fi_getinfo -- tag60 is used if remote CQ data support
218       is  requested  explicitly,  either  by  passing  non-zero   value   via
219       hints->domain_attr->cq_data_size  or  by including FI_REMOTE_CQ_DATA in
220       hints->caps, otherwise tag64 is used.  If tag64 is the result of  auto‐
221       matic  selection,  fi_getinfo  also  returns  a  second instance of the
222       provider with tag60 layout.
223
224       The default setting is auto.
225
226       Notice that if the provider is  compiled  with  macro  PSMX2_TAG_LAYOUT
227       defined  to  1 (means tag60) or 2 (means tag64), the choice is fixed at
228       compile time and this runtime option will be disabled.
229

AUTHORS

234       OpenFabrics.
235
236
237
238Libfabric Programmer's Manual     2018-02-21                        fi_psm2(7)

NAME

OVERVIEW

LIMITATIONS

RUNTIME PARAMETERS

SEE ALSO

AUTHORS