1fi_psm(7) Libfabric v1.15.1 fi_psm(7)
2
3
4
6 fi_psm - The PSM Fabric Provider
7
9 The psm provider runs over the PSM 1.x interface that is currently sup‐
10 ported by the Intel TrueScale Fabric. PSM provides tag-matching mes‐
11 sage queue functions that are optimized for MPI implementations. PSM
12 also has limited Active Message support, which is not officially pub‐
13 lished but is quite stable and well documented in the source code (part
14 of the OFED release). The psm provider makes use of both the
15 tag-matching message queue functions and the Active Message functions
16 to support a variety of libfabric data transfer APIs, including tagged
17 message queue, message queue, RMA, and atomic operations.
18
19 The psm provider can work with the psm2-compat library, which exposes a
20 PSM 1.x interface over the Intel Omni-Path Fabric.
21
23 The psm provider doesn’t support all the features defined in the lib‐
24 fabric API. Here are some of the limitations:
25
26 Endpoint types
27 Only support non-connection based types FI_DGRAM and FI_RDM
28
29 Endpoint capabilities
30 Endpoints can support any combination of data transfer capabili‐
31 ties FI_TAGGED, FI_MSG, FI_ATOMICS, and FI_RMA. These capabili‐
32 ties can be further refined by FI_SEND, FI_RECV, FI_READ,
33 FI_WRITE, FI_REMOTE_READ, and FI_REMOTE_WRITE to limit the di‐
34 rection of operations. The limitation is that no two endpoints
35 can have overlapping receive or RMA target capabilities in any
36 of the above categories. For example it is fine to have two
37 endpoints with FI_TAGGED | FI_SEND, one endpoint with FI_TAGGED
38 | FI_RECV, one endpoint with FI_MSG, one endpoint with FI_RMA |
39 FI_ATOMICS. But it is not allowed to have two endpoints with
40 FI_TAGGED, or two endpoints with FI_RMA.
41
42 FI_MULTI_RECV is supported for non-tagged message queue only.
43
44 Other supported capabilities include FI_TRIGGER.
45
46 Modes FI_CONTEXT is required for the FI_TAGGED and FI_MSG capabili‐
47 ties. That means, any request belonging to these two categories
48 that generates a completion must pass as the operation context a
49 valid pointer to type struct fi_context, and the space refer‐
50 enced by the pointer must remain untouched until the request has
51 completed. If none of FI_TAGGED and FI_MSG is asked for, the
52 FI_CONTEXT mode is not required.
53
54 Progress
55 The psm provider requires manual progress. The application is
56 expected to call fi_cq_read or fi_cntr_read function from time
57 to time when no other libfabric function is called to ensure
58 progress is made in a timely manner. The provider does support
59 auto progress mode. However, the performance can be signifi‐
60 cantly impacted if the application purely depends on the
61 provider to make auto progress.
62
63 Unsupported features
64 These features are unsupported: connection management, scalable
65 endpoint, passive endpoint, shared receive context, send/inject
66 with immediate data.
67
69 The psm provider checks for the following environment variables:
70
71 FI_PSM_UUID
72 PSM requires that each job has a unique ID (UUID). All the pro‐
73 cesses in the same job need to use the same UUID in order to be
74 able to talk to each other. The PSM reference manual advises to
75 keep UUID unique to each job. In practice, it generally works
76 fine to reuse UUID as long as (1) no two jobs with the same UUID
77 are running at the same time; and (2) previous jobs with the
78 same UUID have exited normally. If running into “resource busy”
79 or “connection failure” issues with unknown reason, it is advis‐
80 able to manually set the UUID to a value different from the de‐
81 fault.
82
83 The default UUID is 0FFF0FFF-0000-0000-0000-0FFF0FFF0FFF.
84
85 FI_PSM_NAME_SERVER
86 The psm provider has a simple built-in name server that can be
87 used to resolve an IP address or host name into a transport ad‐
88 dress needed by the fi_av_insert call. The main purpose of this
89 name server is to allow simple client-server type applications
90 (such as those in fabtests) to be written purely with libfabric,
91 without using any out-of-band communication mechanism. For such
92 applications, the server would run first to allow endpoints be
93 created and registered with the name server, and then the client
94 would call fi_getinfo with the node parameter set to the IP ad‐
95 dress or host name of the server. The resulting fi_info struc‐
96 ture would have the transport address of the endpoint created by
97 the server in the dest_addr field. Optionally the service pa‐
98 rameter can be used in addition to node. Notice that the ser‐
99 vice number is interpreted by the provider and is not a TCP/IP
100 port number.
101
102 The name server is on by default. It can be turned off by setting the
103 variable to 0. This may save a small amount of resource since a sepa‐
104 rate thread is created when the name server is on.
105
106 The provider detects OpenMPI and MPICH runs and changes the default
107 setting to off.
108
109 FI_PSM_TAGGED_RMA
110 The RMA functions are implemented on top of the PSM Active Mes‐
111 sage functions. The Active Message functions have limit on the
112 size of data can be transferred in a single message. Large
113 transfers can be divided into small chunks and be pipe-lined.
114 However, the bandwidth is sub-optimal by doing this way.
115
116 The psm provider use PSM tag-matching message queue functions to
117 achieve higher bandwidth for large size RMA. For this purpose, a bit
118 is reserved from the tag space to separate the RMA traffic from the
119 regular tagged message queue.
120
121 The option is on by default. To turn it off set the variable to 0.
122
123 FI_PSM_AM_MSG
124 The psm provider implements the non-tagged message queue over
125 the PSM tag-matching message queue. One tag bit is reserved for
126 this purpose. Alternatively, the non-tagged message queue can
127 be implemented over Active Message. This experimental feature
128 has slightly larger latency.
129
130 This option is off by default. To turn it on set the variable to 1.
131
132 FI_PSM_DELAY
133 Time (seconds) to sleep before closing PSM endpoints. This is a
134 workaround for a bug in some versions of PSM library.
135
136 The default setting is 1.
137
138 FI_PSM_TIMEOUT
139 Timeout (seconds) for gracefully closing PSM endpoints. A
140 forced closing will be issued if timeout expires.
141
142 The default setting is 5.
143
144 FI_PSM_PROG_INTERVAL
145 When auto progress is enabled (asked via the hints to fi_get‐
146 info), a progress thread is created to make progress calls from
147 time to time. This option set the interval (microseconds) be‐
148 tween progress calls.
149
150 The default setting is 1 if affinity is set, or 1000 if not. See
151 FI_PSM_PROG_AFFINITY.
152
153 FI_PSM_PROG_AFFINITY
154 When set, specify the set of CPU cores to set the progress
155 thread affinity to. The format is
156 <start>[:<end>[:<stride>]][,<start>[:<end>[:<stride>]]]*, where
157 each triplet <start>:<end>:<stride> defines a block of core_ids.
158 Both <start> and <end> can be either the core_id (when >=0) or
159 core_id - num_cores (when <0).
160
161 By default affinity is not set.
162
164 fabric(7), fi_provider(7), fi_psm2(7), fi_psm3(7),
165
167 OpenFabrics.
168
169
170
171Libfabric Programmer’s Manual 2021-03-22 fi_psm(7)