fi_shm(7)

1fi_shm(7)                      Libfabric v1.18.1                     fi_shm(7)
2
3
4

NAME

6       fi_shm - The SHM Fabric Provider
7

OVERVIEW

9       The  SHM provider is a complete provider that can be used on Linux sys‐
10       tems supporting shared  memory  and  process_vm_readv/process_vm_writev
11       calls.  The provider is intended to provide high-performance communica‐
12       tion between processes on the same system.
13

SUPPORTED FEATURES

15       This release contains an initial implementation  of  the  SHM  provider
16       that offers the following support:
17
18       Endpoint types
19              The provider supports only endpoint type FI_EP_RDM.
20
21       Endpoint capabilities
22              Endpoints  cna  support  any  combinations of the following data
23              transfer capabilities: FI_MSG, FI_TAGGED, FI_RMA,  amd  FI_ATOM‐
24              ICS.   These  capabilities  can  be  further defined by FI_SEND,
25              FI_RECV, FI_READ, FI_WRITE, FI_REMOTE_READ, and  FI_REMOTE_WRITE
26              to limit the direction of operations.
27
28       Modes  The provider does not require the use of any mode bits.
29
30       Progress
31              The SHM provider supports FI_PROGRESS_MANUAL.  Receive side data
32              buffers are not modified outside of completion  processing  rou‐
33              tines.   The  provider  processes messages using three different
34              methods, based on the size of the message.  For messages smaller
35              than  4096 bytes, tx completions are generated immediately after
36              the send.  For larger messages, tx completions are not generated
37              until the receiving side has processed the message.
38
39       Address Format
40              The SHM provider uses the address format FI_ADDR_STR, which fol‐
41              lows the general format pattern “[prefix]://[addr]”.  The appli‐
42              cation  can  provide addresses through the node or hints parame‐
43              ter.  As long as the address is in a  valid  FI_ADDR_STR  format
44              (contains “://”), the address will be used as is.  If the appli‐
45              cation input is incorrectly formatted or no input was  provided,
46              the  SHM provider will resolve it according to the following SHM
47              provider standards:
48
49       (flags & FI_SOURCE) ?  src_addr : dest_addr = - if (node && service)  :
50       “fi_ns://node:service” - if (service) : “fi_ns://service” - if (node &&
51       !service) : “fi_shm://node” - if (!node && !service) : “fi_shm://PID”
52
53       !(flags & FI_SOURCE) - src_addr = “fi_shm://PID”
54
55       In other words, if the application provides a source and/or destination
56       address  in an acceptable FI_ADDR_STR format (contains “://”), the call
57       to util_getinfo will successfully fill in src_addr and  dest_addr  with
58       the  provided  input.   If  the input is not in an ADDR_STR format, the
59       shared memory provider will then create a  proper  FI_ADDR_STR  address
60       with  either  the  “fi_ns://” (node/service format) or “fi_shm://” (shm
61       format) prefixes signaling whether the addr is a “unique”  address  and
62       does or does not need an extra endpoint name identifier appended in or‐
63       der to make it unique.  For the shared memory provider, we assume  that
64       the service (with or without a node) is enough to make it unique, but a
65       node alone is  not  sufficient.   If  only  a  node  is  provided,  the
66       “fi_shm://”  prefix is used to signify that it is not a unique address.
67       If no node or service are provided (and in the case of setting the  src
68       address without FI_SOURCE and no hints), the process ID will be used as
69       a default address.  On endpoint  creation,  if  the  src_addr  has  the
70       “fi_shm://”  prefix,  the  provider  will append “:[uid]:[ep_idx]” as a
71       unique endpoint name (essentially, in place of a service).  In the case
72       of  the  “fi_ns://”  prefix (or any other prefix if one was provided by
73       the application), no supplemental information is required  to  make  it
74       unique  and  it  will remain with only the application-defined address.
75       Note that the actual endpoint name will  not  include  the  FI_ADDR_STR
76       "*://"  prefix  since it cannot be included in any shared memory region
77       names.  The provider will strip off the prefix before setting the  end‐
78       point  name.   As a result, the addresses “fi_prefix1://my_node:my_ser‐
79       vice” and “fi_prefix2://my_node:my_service” would result  in  endpoints
80       and  regions  of  the same name.  The application can also override the
81       endpoint name after creating an endpoint using  setname()  without  any
82       address format restrictions.
83
84       Msg  flags  The  provider currently only supports the FI_REMOTE_CQ_DATA
85       msg flag.
86
87       MR registration mode The  provider  implements  FI_MR_VIRT_ADDR  memory
88       mode.
89
90       Atomic  operations  The  provider supports all combinations of datatype
91       and operations as long as the message is less than 4096 bytes (or  2048
92       for compare operations).
93

DSA

95       Intel  Data Streaming Accelerator (DSA) is an integrated accelerator in
96       Intel Xeon processors starting with Sapphire Rapids generation.  One of
97       the  capabilities  of DSA is to offload memory copy operations from the
98       CPU.  A system may have one or more DSA devices.  Each DSA  device  may
99       have one or more work queues.  The DSA specification can be found here.
100
101       The SAR protocol of SHM provider is enabled to take advantage of DSA to
102       offload memory copy operations into and out of SAR  buffers  in  shared
103       memory regions.  To fully take advantage of the DSA offload capability,
104       memory copy operations are performed  asynchronously.   Copy  initiator
105       thread  constructs the DSA commands and submits to work queues.  A copy
106       operation may consists of more than one DSA commands.   In  such  case,
107       commands  are  spread  across  all available work queues in round robin
108       fashion.  The progress thread checks for DSA command  completions.   If
109       the  copy  command successfully completes, it then notifies the peer to
110       consume the data.  If DSA encountered a page fault during command  exe‐
111       cution,  the  page  fault  is reported via completion records.  In such
112       case, the progress thread accesses the page to resolve the  page  fault
113       and resubmits the command after adjusting for partial completions.  One
114       of the benefits of making memory copy operations asynchronous  is  that
115       now  data transfers between different target endpoints can be initiated
116       in parallel.  Use of Intel DSA in SAR protocol is disabled  by  default
117       and  can  be enabled using an environment variable.  Note that CMA must
118       be disabled, e.g. FI_SHM_DISABLE_CMA=0, in order for DSA  to  be  used.
119       See the RUNTIME PARAMETERS section.
120
121       Compiling  with  DSA  capabilities  depends on the accel-config library
122       which can be found here.  Running with DSA requires using Linux  Kernel
123       5.19.0-rc3 or later.
124
125       DSA devices need to be setup just once before runtime.  This configura‐
126       tion    file     (https://github.com/intel/idxd-config/blob/stable/con‐
127       trib/configs/os_profile.conf) can be used as a template with accel-con‐
128       fig utility to configure the DSA devices.
129

LIMITATIONS

131       The SHM provider has hard-coded maximums for supported queue sizes  and
132       data  transfers.   These values are reflected in the related fabric at‐
133       tribute structures
134
135       EPs must be bound to both RX and TX CQs.
136
137       No support for counters.
138

RUNTIME PARAMETERS

140       The shm provider checks for the following environment variables:
141
142       FI_SHM_SAR_THRESHOLD
143              Maximum message size to use segmentation protocol before switch‐
144              ing  to  mmap  (only valid when CMA is not available).  Default:
145              SIZE_MAX (18446744073709551615)
146
147       FI_SHM_TX_SIZE
148              Maximum number of outstanding tx operations.  Default 1024
149
150       FI_SHM_RX_SIZE
151              Maximum number of outstanding rx operations.  Default 1024
152
153       FI_SHM_DISABLE_CMA
154              Manually disables CMA.  Default false
155
156       FI_SHM_USE_DSA_SAR
157              Enables memory copy offload to Intel DSA in SAR  protocol.   De‐
158              fault false
159
160       FI_SHM_ENABLE_DSA_PAGE_TOUCH
161              Enables CPU touching of memory pages in a DSA command descriptor
162              when the page fault is reported, so that there is valid  address
163              translation  for  the  remaining addresses in the command.  This
164              minimizes DSA page faults.  Default false # SEE ALSO
165
166       fabric(7), fi_provider(7), fi_getinfo(3)
167

AUTHORS

169       OpenFabrics.
170
171
172
173Libfabric Programmer’s Manual     2022-12-09                         fi_shm(7)