1fi_psm(7)                      Libfabric v1.6.1                      fi_psm(7)
2
3
4

NAME

6       fi_psm - The PSM Fabric Provider
7

OVERVIEW

9       The psm provider runs over the PSM 1.x interface that is currently sup‐
10       ported by the Intel TrueScale Fabric.  PSM provides  tag-matching  mes‐
11       sage  queue  functions that are optimized for MPI implementations.  PSM
12       also has limited Active Message support, which is not  officially  pub‐
13       lished but is quite stable and well documented in the source code (part
14       of the  OFED  release).   The  psm  provider  makes  use  of  both  the
15       tag-matching  message  queue functions and the Active Message functions
16       to support a variety of libfabric data transfer APIs, including  tagged
17       message queue, message queue, RMA, and atomic operations.
18
19       The psm provider can work with the psm2-compat library, which exposes a
20       PSM 1.x interface over the Intel Omni-Path Fabric.
21

LIMITATIONS

23       The psm provider doesn't support all the features defined in  the  lib‐
24       fabric API.  Here are some of the limitations:
25
26       Endpoint  types  : Only support non-connection based types FI_DGRAM and
27       FI_RDM
28
29       Endpoint capabilities : Endpoints can support any combination  of  data
30       transfer capabilities FI_TAGGED, FI_MSG, FI_ATOMICS, and FI_RMA.  These
31       capabilities can be  further  refined  by  FI_SEND,  FI_RECV,  FI_READ,
32       FI_WRITE, FI_REMOTE_READ, and FI_REMOTE_WRITE to limit the direction of
33       operations.  The limitation is that no two endpoints can have  overlap‐
34       ping receive or RMA target capabilities in any of the above categories.
35       For example it is fine to have two endpoints with FI_TAGGED |  FI_SEND,
36       one  endpoint  with  FI_TAGGED | FI_RECV, one endpoint with FI_MSG, one
37       endpoint with FI_RMA | FI_ATOMICS.  But it is not allowed to  have  two
38       endpoints with FI_TAGGED, or two endpoints with FI_RMA.
39
40       FI_MULTI_RECV is supported for non-tagged message queue only.
41
42       Other supported capabilities include FI_TRIGGER.
43
44       Modes  :  FI_CONTEXT is required for the FI_TAGGED and FI_MSG capabili‐
45       ties.  That means, any request belonging to these two  categories  that
46       generates  a  completion  must  pass  as  the operation context a valid
47       pointer to type struct fi_context, and  the  space  referenced  by  the
48       pointer must remain untouched until the request has completed.  If none
49       of FI_TAGGED and FI_MSG is  asked  for,  the  FI_CONTEXT  mode  is  not
50       required.
51
52       Progress  : The psm provider requires manual progress.  The application
53       is expected to call fi_cq_read or fi_cntr_read function  from  time  to
54       time  when  no other libfabric function is called to ensure progress is
55       made in a timely manner.  The provider does support auto progress mode.
56       However,  the performance can be significantly impacted if the applica‐
57       tion purely depends on the provider to make auto progress.
58
59       Unsupported features : These features are unsupported: connection  man‐
60       agement,  scalable  endpoint, passive endpoint, shared receive context,
61       send/inject with immediate data.
62

RUNTIME PARAMETERS

64       The psm provider checks for the following environment variables:
65
66       FI_PSM_UUID : PSM requires that each job has a unique ID  (UUID).   All
67       the  processes in the same job need to use the same UUID in order to be
68       able to talk to each other.  The PSM reference manual advises  to  keep
69       UUID unique to each job.  In practice, it generally works fine to reuse
70       UUID as long as (1) no two jobs with the same UUID are running  at  the
71       same  time;  and  (2) previous jobs with the same UUID have exited nor‐
72       mally.  If running into "resource busy" or "connection failure"  issues
73       with  unknown  reason,  it  is  advisable to manually set the UUID to a
74       value different from the default.
75
76       The default UUID is 0FFF0FFF-0000-0000-0000-0FFF0FFF0FFF.
77
78       FI_PSM_NAME_SERVER : The psm provider has a simple built-in name server
79       that can be used to resolve an IP address or host name into a transport
80       address needed by the fi_av_insert call.  The main purpose of this name
81       server  is  to  allow  simple  client-server type applications (such as
82       those in fabtests) to be written purely with libfabric,  without  using
83       any  out-of-band  communication  mechanism.  For such applications, the
84       server would run first to allow endpoints  be  created  and  registered
85       with  the  name  server, and then the client would call fi_getinfo with
86       the node parameter set to the IP address or host name  of  the  server.
87       The resulting fi_info structure would have the transport address of the
88       endpoint created by the server in the dest_addr field.  Optionally  the
89       service  parameter  can  be  used in addition to node.  Notice that the
90       service number is interpreted by the provider and is not a TCP/IP  port
91       number.
92
93       The  name server is on by default.  It can be turned off by setting the
94       variable to 0.  This may save a small amount of resource since a  sepa‐
95       rate thread is created when the name server is on.
96
97       The  provider  detects  OpenMPI  and MPICH runs and changes the default
98       setting to off.
99
100       FI_PSM_TAGGED_RMA : The RMA functions are implemented on top of the PSM
101       Active  Message  functions.  The Active Message functions have limit on
102       the size of data can be transferred in a single message.  Large  trans‐
103       fers  can be divided into small chunks and be pipe-lined.  However, the
104       bandwidth is sub-optimal by doing this way.
105
106       The psm provider  use  PSM  tag-matching  message  queue  functions  to
107       achieve  higher  bandwidth for large size RMA.  For this purpose, a bit
108       is reserved from the tag space to separate the  RMA  traffic  from  the
109       regular tagged message queue.
110
111       The option is on by default.  To turn it off set the variable to 0.
112
113       FI_PSM_AM_MSG  :  The  psm  provider  implements the non-tagged message
114       queue over the PSM tag-matching message queue.  One tag bit is reserved
115       for  this  purpose.  Alternatively, the non-tagged message queue can be
116       implemented  over  Active  Message.   This  experimental  feature   has
117       slightly larger latency.
118
119       This option is off by default.  To turn it on set the variable to 1.
120
121       FI_PSM_DELAY  :  Time  (seconds) to sleep before closing PSM endpoints.
122       This is a workaround for a bug in some versions of PSM library.
123
124       The default setting is 1.
125
126       FI_PSM_TIMEOUT : Timeout (seconds)  for  gracefully  closing  PSM  end‐
127       points.  A forced closing will be issued if timeout expires.
128
129       The default setting is 5.
130
131       FI_PSM_PROG_INTERVAL  :  When  auto  progress is enabled (asked via the
132       hints to fi_getinfo), a progress thread is  created  to  make  progress
133       calls  from  time to time.  This option set the interval (microseconds)
134       between progress calls.
135
136       The default setting is 1 if affinity is  set,  or  1000  if  not.   See
137       FI_PSM_PROG_AFFINITY.
138
139       FI_PSM_PROG_AFFINITY  :  When  set, specify the set of CPU cores to set
140       the    progress    thread    affinity    to.      The     format     is
141       <start>[:<end>[:<stride>]][,<start>[:<end>[:<stride>]]]*,   where  each
142       triplet <start>:<end>:<stride>  defines  a  block  of  core_ids.   Both
143       <start>   and   <end>   can   be  either  the  core_id  (when  >=0)  or
144       core_id - num_cores (when <0).
145
146       By default affinity is not set.
147

SEE ALSO

149       fabric(7), fi_provider(7), fi_psm2(7),
150

AUTHORS

152       OpenFabrics.
153
154
155
156Libfabric Programmer's Manual     2018-02-13                         fi_psm(7)
Impressum