fi_psm2(7)

1fi_psm2(7)                     Libfabric v1.7.0                     fi_psm2(7)
2
3
4

NAME

6       fi_psm2 - The PSM2 Fabric Provider
7

OVERVIEW

9       The  psm2 provider runs over the PSM 2.x interface that is supported by
10       the Intel Omni-Path Fabric.  PSM 2.x has all the PSM 1.x features  plus
11       a  set  of new functions with enhanced capabilities.  Since PSM 1.x and
12       PSM 2.x are not ABI compatible the psm2 provider only  works  with  PSM
13       2.x and doesn't support Intel TrueScale Fabric.
14

LIMITATIONS

16       The  psm2 provider doesn't support all the features defined in the lib‐
17       fabric API.  Here are some of the limitations:
18
19       Endpoint types
20              Only support non-connection based types FI_DGRAM and FI_RDM
21
22       Endpoint capabilities
23              Endpoints can support any combination of data transfer capabili‐
24              ties FI_TAGGED, FI_MSG, FI_ATOMICS, and FI_RMA.  These capabili‐
25              ties can  be  further  refined  by  FI_SEND,  FI_RECV,  FI_READ,
26              FI_WRITE,  FI_REMOTE_READ,  and FI_REMOTE_WRITE to limit the di‐
27              rection of operations.
28
29       FI_MULTI_RECV is supported for non-tagged message queue only.
30
31       Scalable endpoints are supported if the underlying  PSM2  library  sup‐
32       ports  multiple  endpoints.  This condition must be satisfied both when
33       the provider is built and when the provider is used.  See the  Scalable
34       endpoints section for more information.
35
36       Other  supported  capabilities  include  FI_TRIGGER, FI_REMOTE_CQ_DATA,
37       FI_RMA_EVENT,    FI_SOURCE,    and     FI_SOURCE_ERR.      Furthermore,
38       FI_NAMED_RX_CTX is supported when scalable endpoints are enabled.
39
40       Modes  FI_CONTEXT  is  required  for the FI_TAGGED and FI_MSG capabili‐
41              ties.  That means, any request belonging to these two categories
42              that generates a completion must pass as the operation context a
43              valid pointer to type struct fi_context, and  the  space  refer‐
44              enced by the pointer must remain untouched until the request has
45              completed.  If none of FI_TAGGED and FI_MSG is  asked  for,  the
46              FI_CONTEXT mode is not required.
47
48       Progress
49              The  psm2 provider requires manual progress.  The application is
50              expected to call fi_cq_read or fi_cntr_read function  from  time
51              to  time  when  no  other libfabric function is called to ensure
52              progress is made in a timely manner.  The provider does  support
53              auto  progress  mode.   However, the performance can be signifi‐
54              cantly  impacted  if  the  application  purely  depends  on  the
55              provider to make auto progress.
56
57       Scalable endpoints
58              Scalable  endpoints  support  depends on the multi-EP feature of
59              the PSM2 library.  If the PSM2 library  supports  this  feature,
60              the  availability  is further controlled by an environment vari‐
61              able PSM2_MULTI_EP.  The psm2 provider automatically  sets  this
62              variable to 1 if it is not set.  The feature can be disabled ex‐
63              plicitly by setting PSM2_MULTI_EP to 0.
64
65       When creating a scalable endpoint, the exact  number  of  contexts  re‐
66       quested should be set in the "fi_info" structure passed to the fi_scal‐
67       able_ep function.   This  number  should  be  set  in  "fi_info->ep_at‐
68       tr->tx_ctx_cnt"  or  "fi_info->ep_attr->rx_ctx_cnt"  or both, whichever
69       greater is used.  The psm2 provider allocates  all  requested  contexts
70       upfront  when  the  scalable  endpoint is created.  The same context is
71       used for both Tx and Rx.
72
73       For optimal performance, it is advised to avoid having multiple threads
74       accessing    the    same    context,   either   directly   by   posting
75       send/recv/read/write request, or indirectly by polling associated  com‐
76       pletion queues or counters.
77
78       Shared Tx contexts
79              In  order  to achieve the purpose of saving PSM context by using
80              shared Tx context, the endpoints bound to the shared Tx contexts
81              need to be Tx only.  The reason is that Rx capability always re‐
82              quires a PSM context, which can also be automatically  used  for
83              Tx.   As the result, allocating a shared Tx context for Rx capa‐
84              ble endpoints actually consumes one  extra  context  instead  of
85              saving some.
86
87       Unsupported features
88              These  features  are unsupported: connection management, passive
89              endpoint, and shared receive context.
90

RUNTIME PARAMETERS

92       The psm2 provider checks for the following environment variables:
93
94       FI_PSM2_UUID
95              PSM requires that each job has a unique ID (UUID).  All the pro‐
96              cesses  in the same job need to use the same UUID in order to be
97              able to talk to each other.  The PSM reference manual advises to
98              keep  UUID  unique to each job.  In practice, it generally works
99              fine to reuse UUID as long as (1) no two jobs with the same UUID
100              are  running  at  the  same time; and (2) previous jobs with the
101              same UUID have exited normally.  If running into "resource busy"
102              or "connection failure" issues with unknown reason, it is advis‐
103              able to manually set the UUID to a value different from the  de‐
104              fault.
105
106       The default UUID is 00FF00FF-0000-0000-0000-00FF0F0F00FF.
107
108       FI_PSM2_NAME_SERVER
109              The  psm2 provider has a simple built-in name server that can be
110              used to resolve an IP address or host name into a transport  ad‐
111              dress needed by the fi_av_insert call.  The main purpose of this
112              name server is to allow simple client-server  type  applications
113              (such as those in fabtests) to be written purely with libfabric,
114              without using any out-of-band communication mechanism.  For such
115              applications,  the  server would run first to allow endpoints be
116              created and registered with the name server, and then the client
117              would  call fi_getinfo with the node parameter set to the IP ad‐
118              dress or host name of the server.  The resulting fi_info  struc‐
119              ture would have the transport address of the endpoint created by
120              the server in the dest_addr field.  Optionally the  service  pa‐
121              rameter  can  be used in addition to node.  Notice that the ser‐
122              vice number is interpreted by the provider and is not  a  TCP/IP
123              port number.
124
125       The  name server is on by default.  It can be turned off by setting the
126       variable to 0.  This may save a small amount of resource since a  sepa‐
127       rate thread is created when the name server is on.
128
129       The  provider  detects  OpenMPI  and MPICH runs and changes the default
130       setting to off.
131
132       FI_PSM2_TAGGED_RMA
133              The RMA functions are implemented on top of the PSM Active  Mes‐
134              sage  functions.  The Active Message functions have limit on the
135              size of data can be transferred  in  a  single  message.   Large
136              transfers  can  be  divided into small chunks and be pipe-lined.
137              However, the bandwidth is sub-optimal by doing this way.
138
139       The psm2 provider use  PSM  tag-matching  message  queue  functions  to
140       achieve higher bandwidth for large size RMA.  It takes advantage of the
141       extra tag bits available in PSM2 to separate the RMA traffic  from  the
142       regular tagged message queue.
143
144       The option is on by default.  To turn it off set the variable to 0.
145
146       FI_PSM2_DELAY
147              Time (seconds) to sleep before closing PSM endpoints.  This is a
148              workaround for a bug in some versions of PSM library.
149
150       The default setting is 0.
151
152       FI_PSM2_TIMEOUT
153              Timeout (seconds)  for  gracefully  closing  PSM  endpoints.   A
154              forced closing will be issued if timeout expires.
155
156       The default setting is 5.
157
158       FI_PSM2_PROG_INTERVAL
159              When  auto  progress  is enabled (asked via the hints to fi_get‐
160              info), a progress thread is created to make progress calls  from
161              time  to  time.  This option set the interval (microseconds) be‐
162              tween progress calls.
163
164       The default setting is 1 if affinity is  set,  or  1000  if  not.   See
165       FI_PSM2_PROG_AFFINITY.
166
167       FI_PSM2_PROG_AFFINITY
168              When  set,  specify  the  set  of  CPU cores to set the progress
169              thread       affinity       to.        The       format       is
170              <start>[:<end>[:<stride>]][,<start>[:<end>[:<stride>]]]*,  where
171              each triplet <start>:<end>:<stride> defines a block of core_ids.
172              Both  <start>  and <end> can be either the core_id (when >=0) or
173              core_id - num_cores (when <0).
174
175       By default affinity is not set.
176
177       FI_PSM2_INJECT_SIZE
178              Maximum message size allowed for fi_inject and fi_tinject calls.
179              This  is  an  experimental feature to allow some applications to
180              override default inject size limitation.  When the  inject  size
181              is larger than the default value, some inject calls might block.
182
183       The default setting is 64.
184
185       FI_PSM2_LOCK_LEVEL
186              When  set,  dictate  the  level  of  locking  being  used by the
187              provider.  Level 2 means all locks are enabled.   Level  1  dis‐
188              ables  some locks and is suitable for runs that limit the access
189              to each PSM2 context to a single thread.  Level 0  disables  all
190              locks and thus is only suitable for single threaded runs.
191
192       To use level 0 or level 1, wait object and auto progress mode cannot be
193       used because they introduce internal threads that may break the  condi‐
194       tions needed for these levels.
195
196       The default setting is 2.
197
198       FI_PSM2_DISCONNECT
199              The provider has a mechanism to automatically send disconnection
200              notifications to all connected peers before the  local  endpoint
201              is  closed.   As the response, the peers call psm2_ep_disconnect
202              to clean up the connection state at their side.  This allows the
203              same PSM2 epid be used by different dynamically started process‐
204              es (clients) to communicate with the same peer  (server).   This
205              mechanism, however, introduce extra overhead to the finalization
206              phase.  For applications that never reuse epids within the  same
207              session such overhead is unnecessary.
208
209       This  option  controls whether the automatic disconnection notification
210       mechanism should be enabled.  For client-server  application  mentioned
211       above,  the  client  side  should  set this option to 1, but the server
212       should set it to 0.
213
214       The default setting is 0.
215
216       FI_PSM2_TAG_LAYOUT
217              Select how the 96-bit PSM2 tag bits  are  organized.   Currently
218              three  choices  are  available: tag60 means 32-4-60 partitioning
219              for CQ data,  internal  protocol  flags,  and  application  tag.
220              tag64 means 4-28-64 partitioning for internal protocol flags, CQ
221              data, and application tag.  auto means to choose either tag60 or
222              tag64  based  on  the the hints passed to fi_getinfo -- tag60 is
223              used if remote CQ data support is requested  explicitly,  either
224              by  passing  non-zero value via hints->domain_attr->cq_data_size
225              or by  including  FI_REMOTE_CQ_DATA  in  hints->caps,  otherwise
226              tag64  is  used.  If tag64 is the result of automatic selection,
227              fi_getinfo also returns a second instance of the  provider  with
228              tag60 layout.
229
230       The default setting is auto.
231
232       Notice that if the provider is compiled with macro PSMX2_TAG_LAYOUT de‐
233       fined to 1 (means tag60) or 2 (means tag64), the  choice  is  fixed  at
234       compile time and this runtime option will be disabled.
235

AUTHORS

240       OpenFabrics.
241
242
243
244Libfabric Programmer's Manual     2018-10-23                        fi_psm2(7)

NAME

OVERVIEW

LIMITATIONS

RUNTIME PARAMETERS

SEE ALSO

AUTHORS