1fi_psm2(7) Libfabric v1.6.1 fi_psm2(7)
2
3
4
6 fi_psm2 - The PSM2 Fabric Provider
7
9 The psm2 provider runs over the PSM 2.x interface that is supported by
10 the Intel Omni-Path Fabric. PSM 2.x has all the PSM 1.x features plus
11 a set of new functions with enhanced capabilities. Since PSM 1.x and
12 PSM 2.x are not ABI compatible the psm2 provider only works with PSM
13 2.x and doesn't support Intel TrueScale Fabric.
14
16 The psm2 provider doesn't support all the features defined in the lib‐
17 fabric API. Here are some of the limitations:
18
19 Endpoint types : Only support non-connection based types FI_DGRAM and
20 FI_RDM
21
22 Endpoint capabilities : Endpoints can support any combination of data
23 transfer capabilities FI_TAGGED, FI_MSG, FI_ATOMICS, and FI_RMA. These
24 capabilities can be further refined by FI_SEND, FI_RECV, FI_READ,
25 FI_WRITE, FI_REMOTE_READ, and FI_REMOTE_WRITE to limit the direction of
26 operations.
27
28 FI_MULTI_RECV is supported for non-tagged message queue only.
29
30 Scalable endpoints are supported if the underlying PSM2 library sup‐
31 ports multiple endpoints. This condition must be satisfied both when
32 the provider is built and when the provider is used. See the Scalable
33 endpoints section for more information.
34
35 Other supported capabilities include FI_TRIGGER, FI_REMOTE_CQ_DATA,
36 FI_RMA_EVENT, FI_SOURCE, and FI_SOURCE_ERR. Furthermore,
37 FI_NAMED_RX_CTX is supported when scalable endpoints are enabled.
38
39 Modes : FI_CONTEXT is required for the FI_TAGGED and FI_MSG capabili‐
40 ties. That means, any request belonging to these two categories that
41 generates a completion must pass as the operation context a valid
42 pointer to type struct fi_context, and the space referenced by the
43 pointer must remain untouched until the request has completed. If none
44 of FI_TAGGED and FI_MSG is asked for, the FI_CONTEXT mode is not
45 required.
46
47 Progress : The psm2 provider requires manual progress. The application
48 is expected to call fi_cq_read or fi_cntr_read function from time to
49 time when no other libfabric function is called to ensure progress is
50 made in a timely manner. The provider does support auto progress mode.
51 However, the performance can be significantly impacted if the applica‐
52 tion purely depends on the provider to make auto progress.
53
54 Scalable endpoints : Scalable endpoints support depends on the multi-EP
55 feature of the PSM2 library. If the PSM2 library supports this fea‐
56 ture, the availability is further controlled by an environment variable
57 PSM2_MULTI_EP. The psm2 provider automatically sets this variable to 1
58 if it is not set. The feature can be disabled explicitly by setting
59 PSM2_MULTI_EP to 0.
60
61 When creating a scalable endpoint, the exact number of contexts
62 requested should be set in the "fi_info" structure passed to the
63 fi_scalable_ep function. This number should be set in
64 "fi_info->ep_attr->tx_ctx_cnt" or "fi_info->ep_attr->rx_ctx_cnt" or
65 both, whichever greater is used. The psm2 provider allocates all
66 requested contexts upfront when the scalable endpoint is created. The
67 same context is used for both Tx and Rx.
68
69 For optimal performance, it is advised to avoid having multiple threads
70 accessing the same context, either directly by posting
71 send/recv/read/write request, or indirectly by polling associated com‐
72 pletion queues or counters.
73
74 Shared Tx contexts : In order to achieve the purpose of saving PSM con‐
75 text by using shared Tx context, the endpoints bound to the shared Tx
76 contexts need to be Tx only. The reason is that Rx capability always
77 requires a PSM context, which can also be automatically used for Tx.
78 As the result, allocating a shared Tx context for Rx capable endpoints
79 actually consumes one extra context instead of saving some.
80
81 Unsupported features : These features are unsupported: connection man‐
82 agement, passive endpoint, and shared receive context.
83
85 The psm2 provider checks for the following environment variables:
86
87 FI_PSM2_UUID : PSM requires that each job has a unique ID (UUID). All
88 the processes in the same job need to use the same UUID in order to be
89 able to talk to each other. The PSM reference manual advises to keep
90 UUID unique to each job. In practice, it generally works fine to reuse
91 UUID as long as (1) no two jobs with the same UUID are running at the
92 same time; and (2) previous jobs with the same UUID have exited nor‐
93 mally. If running into "resource busy" or "connection failure" issues
94 with unknown reason, it is advisable to manually set the UUID to a
95 value different from the default.
96
97 The default UUID is 00FF00FF-0000-0000-0000-00FF0F0F00FF.
98
99 FI_PSM2_NAME_SERVER : The psm2 provider has a simple built-in name
100 server that can be used to resolve an IP address or host name into a
101 transport address needed by the fi_av_insert call. The main purpose of
102 this name server is to allow simple client-server type applications
103 (such as those in fabtests) to be written purely with libfabric, with‐
104 out using any out-of-band communication mechanism. For such applica‐
105 tions, the server would run first to allow endpoints be created and
106 registered with the name server, and then the client would call fi_get‐
107 info with the node parameter set to the IP address or host name of the
108 server. The resulting fi_info structure would have the transport
109 address of the endpoint created by the server in the dest_addr field.
110 Optionally the service parameter can be used in addition to node.
111 Notice that the service number is interpreted by the provider and is
112 not a TCP/IP port number.
113
114 The name server is on by default. It can be turned off by setting the
115 variable to 0. This may save a small amount of resource since a sepa‐
116 rate thread is created when the name server is on.
117
118 The provider detects OpenMPI and MPICH runs and changes the default
119 setting to off.
120
121 FI_PSM2_TAGGED_RMA : The RMA functions are implemented on top of the
122 PSM Active Message functions. The Active Message functions have limit
123 on the size of data can be transferred in a single message. Large
124 transfers can be divided into small chunks and be pipe-lined. However,
125 the bandwidth is sub-optimal by doing this way.
126
127 The psm2 provider use PSM tag-matching message queue functions to
128 achieve higher bandwidth for large size RMA. It takes advantage of the
129 extra tag bits available in PSM2 to separate the RMA traffic from the
130 regular tagged message queue.
131
132 The option is on by default. To turn it off set the variable to 0.
133
134 FI_PSM2_DELAY : Time (seconds) to sleep before closing PSM endpoints.
135 This is a workaround for a bug in some versions of PSM library.
136
137 The default setting is 0.
138
139 FI_PSM2_TIMEOUT : Timeout (seconds) for gracefully closing PSM end‐
140 points. A forced closing will be issued if timeout expires.
141
142 The default setting is 5.
143
144 FI_PSM2_PROG_INTERVAL : When auto progress is enabled (asked via the
145 hints to fi_getinfo), a progress thread is created to make progress
146 calls from time to time. This option set the interval (microseconds)
147 between progress calls.
148
149 The default setting is 1 if affinity is set, or 1000 if not. See
150 FI_PSM2_PROG_AFFINITY.
151
152 FI_PSM2_PROG_AFFINITY : When set, specify the set of CPU cores to set
153 the progress thread affinity to. The format is
154 <start>[:<end>[:<stride>]][,<start>[:<end>[:<stride>]]]*, where each
155 triplet <start>:<end>:<stride> defines a block of core_ids. Both
156 <start> and <end> can be either the core_id (when >=0) or
157 core_id - num_cores (when <0).
158
159 By default affinity is not set.
160
161 FI_PSM2_INJECT_SIZE : Maximum message size allowed for fi_inject and
162 fi_tinject calls. This is an experimental feature to allow some appli‐
163 cations to override default inject size limitation. When the inject
164 size is larger than the default value, some inject calls might block.
165
166 The default setting is 64.
167
168 FI_PSM2_LOCK_LEVEL : When set, dictate the level of locking being used
169 by the provider. Level 2 means all locks are enabled. Level 1 dis‐
170 ables some locks and is suitable for runs that limit the access to each
171 PSM2 context to a single thread. Level 0 disables all locks and thus
172 is only suitable for single threaded runs.
173
174 To use level 0 or level 1, wait object and auto progress mode cannot be
175 used because they introduce internal threads that may break the condi‐
176 tions needed for these levels.
177
178 The default setting is 2.
179
180 FI_PSM2_LAZY_CONN : Control when connections are established between
181 PSM2 endpoints that OFI endpoints are built on top of. When set to 0,
182 connections are established when addresses are inserted into the
183 address vector. This is the eager connection mode. When set to 1,
184 connections are established when addresses are used the first time in
185 communication. This is the lazy connection mode.
186
187 Lazy connection mode may reduce the start-up time on large systems at
188 the expense of slightly higher data path overhead. For applications
189 that use multiple endpoints, lazy connection mode can be especially
190 helpful with the potential of greatly reduce the time to set up address
191 vectors and to close endpoints.
192
193 The default setting is 0.
194
195 FI_PSM2_DISCONNECT : The provider has a mechanism to automatically send
196 disconnection notifications to all connected peers before the local
197 endpoint is closed. As the response, the peers call psm2_ep_disconnect
198 to clean up the connection state at their side. This allows the same
199 PSM2 epid be used by different dynamically started processes (clients)
200 to communicate with the same peer (server). This mechanism, however,
201 introduce extra overhead to the finalization phase. For applications
202 that never reuse epids within the same session such overhead is unnec‐
203 essary.
204
205 This option controls whether the automatic disconnection notification
206 mechanism should be enabled. For client-server application mentioned
207 above, the client side should set this option to 1, but the server
208 should set it to 0.
209
210 The default setting is 0.
211
212 FI_PSM2_TAG_LAYOUT : Select how the 96-bit PSM2 tag bits are organized.
213 Currently three choices are available: tag60 means 32-4-60 partitioning
214 for CQ data, internal protocol flags, and application tag. tag64 means
215 4-28-64 partitioning for internal protocol flags, CQ data, and applica‐
216 tion tag. auto means to choose either tag60 or tag64 based on the the
217 hints passed to fi_getinfo -- tag60 is used if remote CQ data support
218 is requested explicitly, either by passing non-zero value via
219 hints->domain_attr->cq_data_size or by including FI_REMOTE_CQ_DATA in
220 hints->caps, otherwise tag64 is used. If tag64 is the result of auto‐
221 matic selection, fi_getinfo also returns a second instance of the
222 provider with tag60 layout.
223
224 The default setting is auto.
225
226 Notice that if the provider is compiled with macro PSMX2_TAG_LAYOUT
227 defined to 1 (means tag60) or 2 (means tag64), the choice is fixed at
228 compile time and this runtime option will be disabled.
229
231 fabric(7), fi_provider(7), fi_psm(7),
232
234 OpenFabrics.
235
236
237
238Libfabric Programmer's Manual 2018-02-21 fi_psm2(7)