1fi_shm(7) Libfabric v1.17.0 fi_shm(7)
2
3
4
6 fi_shm - The SHM Fabric Provider
7
9 The SHM provider is a complete provider that can be used on Linux sys‐
10 tems supporting shared memory and process_vm_readv/process_vm_writev
11 calls. The provider is intended to provide high-performance communica‐
12 tion between processes on the same system.
13
15 This release contains an initial implementation of the SHM provider
16 that offers the following support:
17
18 Endpoint types
19 The provider supports only endpoint type FI_EP_RDM.
20
21 Endpoint capabilities
22 Endpoints cna support any combinations of the following data
23 transfer capabilities: FI_MSG, FI_TAGGED, FI_RMA, amd FI_ATOM‐
24 ICS. These capabilities can be further defined by FI_SEND,
25 FI_RECV, FI_READ, FI_WRITE, FI_REMOTE_READ, and FI_REMOTE_WRITE
26 to limit the direction of operations.
27
28 Modes The provider does not require the use of any mode bits.
29
30 Progress
31 The SHM provider supports FI_PROGRESS_MANUAL. Receive side data
32 buffers are not modified outside of completion processing rou‐
33 tines. The provider processes messages using three different
34 methods, based on the size of the message. For messages smaller
35 than 4096 bytes, tx completions are generated immediately after
36 the send. For larger messages, tx completions are not generated
37 until the receiving side has processed the message.
38
39 Address Format
40 The SHM provider uses the address format FI_ADDR_STR, which fol‐
41 lows the general format pattern “[prefix]://[addr]”. The appli‐
42 cation can provide addresses through the node or hints parame‐
43 ter. As long as the address is in a valid FI_ADDR_STR format
44 (contains “://”), the address will be used as is. If the appli‐
45 cation input is incorrectly formatted or no input was provided,
46 the SHM provider will resolve it according to the following SHM
47 provider standards:
48
49 (flags & FI_SOURCE) ? src_addr : dest_addr = - if (node && service) :
50 “fi_ns://node:service” - if (service) : “fi_ns://service” - if (node &&
51 !service) : “fi_shm://node” - if (!node && !service) : “fi_shm://PID”
52
53 !(flags & FI_SOURCE) - src_addr = “fi_shm://PID”
54
55 In other words, if the application provides a source and/or destination
56 address in an acceptable FI_ADDR_STR format (contains “://”), the call
57 to util_getinfo will successfully fill in src_addr and dest_addr with
58 the provided input. If the input is not in an ADDR_STR format, the
59 shared memory provider will then create a proper FI_ADDR_STR address
60 with either the “fi_ns://” (node/service format) or “fi_shm://” (shm
61 format) prefixes signaling whether the addr is a “unique” address and
62 does or does not need an extra endpoint name identifier appended in or‐
63 der to make it unique. For the shared memory provider, we assume that
64 the service (with or without a node) is enough to make it unique, but a
65 node alone is not sufficient. If only a node is provided, the
66 “fi_shm://” prefix is used to signify that it is not a unique address.
67 If no node or service are provided (and in the case of setting the src
68 address without FI_SOURCE and no hints), the process ID will be used as
69 a default address. On endpoint creation, if the src_addr has the
70 “fi_shm://” prefix, the provider will append “:[uid]:[ep_idx]” as a
71 unique endpoint name (essentially, in place of a service). In the case
72 of the “fi_ns://” prefix (or any other prefix if one was provided by
73 the application), no supplemental information is required to make it
74 unique and it will remain with only the application-defined address.
75 Note that the actual endpoint name will not include the FI_ADDR_STR
76 "*://" prefix since it cannot be included in any shared memory region
77 names. The provider will strip off the prefix before setting the end‐
78 point name. As a result, the addresses “fi_prefix1://my_node:my_ser‐
79 vice” and “fi_prefix2://my_node:my_service” would result in endpoints
80 and regions of the same name. The application can also override the
81 endpoint name after creating an endpoint using setname() without any
82 address format restrictions.
83
84 Msg flags The provider currently only supports the FI_REMOTE_CQ_DATA
85 msg flag.
86
87 MR registration mode The provider implements FI_MR_VIRT_ADDR memory
88 mode.
89
90 Atomic operations The provider supports all combinations of datatype
91 and operations as long as the message is less than 4096 bytes (or 2048
92 for compare operations).
93
95 Intel Data Streaming Accelerator (DSA) is an integrated accelerator in
96 Intel Xeon processors starting with Sapphire Rapids generation. One of
97 the capabilities of DSA is to offload memory copy operations from the
98 CPU. A system may have one or more DSA devices. Each DSA device may
99 have one or more work queues. The DSA specification can be found here.
100
101 The SAR protocol of SHM provider is enabled to take advantage of DSA to
102 offload memory copy operations into and out of SAR buffers in shared
103 memory regions. To fully take advantage of the DSA offload capability,
104 memory copy operations are performed asynchronously. Copy initiator
105 thread constructs the DSA commands and submits to work queues. A copy
106 operation may consists of more than one DSA commands. In such case,
107 commands are spread across all available work queues in round robin
108 fashion. The progress thread checks for DSA command completions. If
109 the copy command successfully completes, it then notifies the peer to
110 consume the data. If DSA encountered a page fault during command exe‐
111 cution, the page fault is reported via completion records. In such
112 case, the progress thread accesses the page to resolve the page fault
113 and resubmits the command after adjusting for partial completions. One
114 of the benefits of making memory copy operations asynchronous is that
115 now data transfers between different target endpoints can be initiated
116 in parallel. Use of Intel DSA in SAR protocol is disabled by default
117 and can be enabled using an environment variable. Note that CMA must
118 be disabled, e.g. FI_SHM_DISABLE_CMA=0, in order for DSA to be used.
119 See the RUNTIME PARAMETERS section.
120
121 Compiling with DSA capabilities depends on the accel-config library
122 which can be found here. Running with DSA requires using Linux Kernel
123 5.19.0-rc3 or later.
124
125 DSA devices need to be setup just once before runtime. This configura‐
126 tion file (https://github.com/intel/idxd-config/blob/stable/con‐
127 trib/configs/os_profile.conf) can be used as a template with accel-con‐
128 fig utility to configure the DSA devices.
129
131 The SHM provider has hard-coded maximums for supported queue sizes and
132 data transfers. These values are reflected in the related fabric at‐
133 tribute structures
134
135 EPs must be bound to both RX and TX CQs.
136
137 No support for counters.
138
140 The shm provider checks for the following environment variables:
141
142 FI_SHM_SAR_THRESHOLD
143 Maximum message size to use segmentation protocol before switch‐
144 ing to mmap (only valid when CMA is not available). Default:
145 SIZE_MAX (18446744073709551615)
146
147 FI_SHM_TX_SIZE
148 Maximum number of outstanding tx operations. Default 1024
149
150 FI_SHM_RX_SIZE
151 Maximum number of outstanding rx operations. Default 1024
152
153 FI_SHM_DISABLE_CMA
154 Manually disables CMA. Default false
155
156 FI_SHM_USE_DSA_SAR
157 Enables memory copy offload to Intel DSA in SAR protocol. De‐
158 fault false
159
160 FI_SHM_ENABLE_DSA_PAGE_TOUCH
161 Enables CPU touching of memory pages in a DSA command descriptor
162 when the page fault is reported, so that there is valid address
163 translation for the remaining addresses in the command. This
164 minimizes DSA page faults. Default false # SEE ALSO
165
166 fabric(7), fi_provider(7), fi_getinfo(3)
167
169 OpenFabrics.
170
171
172
173Libfabric Programmer’s Manual 2022-12-11 fi_shm(7)