fi_rxm(7)

1fi_rxm(7)                      Libfabric v1.14.0                     fi_rxm(7)
2
3
4

NAME

6       fi_rxm - The RxM (RDM over MSG) Utility Provider
7

OVERVIEW

9       The  RxM  provider  (ofi_rxm)  is  an  utility  provider  that supports
10       FI_EP_RDM type endpoint emulated over FI_EP_MSG type endpoint(s) of  an
11       underlying core provider.  FI_EP_RDM endpoints have a reliable datagram
12       interface and RxM emulates this by hiding the connection management  of
13       underlying  FI_EP_MSG  endpoints  from the user.  Additionally, RxM can
14       hide memory registration requirement from a core provider like verbs if
15       the apps don’t support it.
16

REQUIREMENTS

18   Requirements for core provider
19       RxM  provider  requires the core provider to support the following fea‐
20       tures:
21
22       • MSG endpoints (FI_EP_MSG)
23
24       • RMA read/write (FI_RMA) - Used for implementing  rendezvous  protocol
25         for large messages.
26
27       • FI_OPT_CM_DATA_SIZE of at least 24 bytes.
28
29   Requirements for applications
30       Since  RxM  emulates  RDM endpoints by hiding connection management and
31       connections are established only on-demand (when app tries to send  da‐
32       ta), the first several data transfer calls would return EAGAIN.  Appli‐
33       cations should be aware of this and retry until the operation succeeds.
34
35       If an application has chosen manual  progress  for  data  progress,  it
36       should  also read the CQ so that the connection establishment progress‐
37       es.  Not doing so would result in a stall.  See also the ERRORS section
38       in fi_msg(3).
39

SUPPORTED FEATURES

41       The  RxM  provider  currently  supports  FI_MSG,  FI_TAGGED, FI_RMA and
42       FI_ATOMIC capabilities.
43
44       Endpoint types
45              The provider supports only FI_EP_RDM.
46
47       Endpoint capabilities
48              The following data  transfer  interface  is  supported:  FI_MSG,
49              FI_TAGGED, FI_RMA, FI_ATOMIC.
50
51       Progress
52              The   RxM   provider   supports   both   FI_PROGRESS_MANUAL  and
53              FI_PROGRESS_AUTO.  Manual progress in general has better connec‐
54              tion  scale-up  and lower CPU utilization since there’s no sepa‐
55              rate auto-progress thread.
56
57       Addressing Formats
58              FI_SOCKADDR, FI_SOCKADDR_IN
59
60       Memory Region
61              FI_MR_VIRT_ADDR, FI_MR_ALLOCATED, FI_MR_PROV_KEY  MR  mode  bits
62              would  be  required  from  the app in case the core provider re‐
63              quires it.
64

LIMITATIONS

66       When using RxM provider,  some  limitations  from  the  underlying  MSG
67       provider  could  also  show  up.  Please refer to the corresponding MSG
68       provider man pages to find about those limitations.
69
70   Unsupported features
71       RxM provider does not support the following features:
72
73       • op_flags: FI_FENCE.
74
75       • Scalable endpoints
76
77       • Shared contexts
78
79       • FABRIC_DIRECT
80
81       • FI_MR_SCALABLE
82
83       • Authorization keys
84
85       • Application error data buffers
86
87       • Multicast
88
89       • FI_SYNC_ERR
90
91       • Reporting unknown source addr data as part of completions
92
93       • Triggered operations
94
95   Progress limitations
96       When sending large messages, an app doing an sread or waiting on the CQ
97       file  descriptor may not get a completion when reading the CQ after be‐
98       ing woken up from the wait.  The app has to do sread  or  wait  on  the
99       file  descriptor  again.   This is needed because RxM uses a rendezvous
100       protocol for large message sends.  An app would get woken up from wait‐
101       ing  on  CQ  fd when rendezvous protocol request completes but it would
102       have to wait again to get an ACK from the receiver  indicating  comple‐
103       tion of large message transfer by remote RMA read.
104
105   FI_ATOMIC limitations
106       The  FI_ATOMIC  capability  will  only  be listed in the fi_info if the
107       fi_info hints parameter specifies FI_ATOMIC.  If FI_ATOMIC is  request‐
108       ed,  message  order  FI_ORDER_RAR,  FI_ORDER_RAW,  FI_ORDER_WAR, FI_OR‐
109       DER_WAW, FI_ORDER_SAR, and FI_ORDER_SAW can not be supported.
110
111   Miscellaneous limitations
112       • RxM protocol peers should have same endian-ness otherwise connections
113         won’t  successfully  complete.   This  enables  better performance at
114         run-time as byte order translations are avoided.
115

RUNTIME PARAMETERS

117       The ofi_rxm provider checks for the following environment variables.
118
119       FI_OFI_RXM_BUFFER_SIZE
120              Defines the transmit buffer size /  inject  size.   Messages  of
121              size  less  than this would be transmitted via an eager protocol
122              and those above would be transmitted via  a  rendezvous  or  SAR
123              (Segmentation  And Reassembly) protocol.  Transmit data would be
124              copied up to this size (default: ~16k).
125
126       FI_OFI_RXM_COMP_PER_PROGRESS
127              Defines the maximum number of MSG provider CQ entries  (default:
128              1) that would be read per progress (RxM CQ read).
129
130       FI_OFI_RXM_ENABLE_DYN_RBUF
131              Enables  support  for dynamic receive buffering, if available by
132              the message  endpoint  provider.   This  feature  allows  direct
133              placement of received message data into application buffers, by‐
134              passing RxM bounce buffers.  This feature targets providers that
135              provide  internal  network  buffering, such as the tcp provider.
136              (default: false)
137
138       FI_OFI_RXM_SAR_LIMIT
139              Set this environment variable to control the RxM SAR  (Segmenta‐
140              tion  And  Reassembly)  protocol.  Messages of size greater than
141              this (default: 128 Kb) would be transmitted via rendezvous  pro‐
142              tocol.
143
144       FI_OFI_RXM_USE_SRX
145              Set  this  to 1 to use shared receive context from MSG provider,
146              or 0 to disable using shared receive  context.   Shared  receive
147              contexts  reduce  overall memory usage, but may increase in mes‐
148              sage latency.  If not set, verbs will  not  use  shared  receive
149              contexts by default, but the tcp provider will.
150
151       FI_OFI_RXM_TX_SIZE
152              Defines default TX context size (default: 1024)
153
154       FI_OFI_RXM_RX_SIZE
155              Defines default RX context size (default: 1024)
156
157       FI_OFI_RXM_MSG_TX_SIZE
158              Defines  FI_EP_MSG  TX  size  that  would be requested (default:
159              128).
160
161       FI_OFI_RXM_MSG_RX_SIZE
162              Defines FI_EP_MSG RX size  that  would  be  requested  (default:
163              128).
164
165       FI_UNIVERSE_SIZE
166              Defines  the  expected number of ranks / peers an endpoint would
167              communicate with (default: 256).
168
169       FI_OFI_RXM_CM_PROGRESS_INTERVAL
170              Defines the duration of time in microseconds  between  calls  to
171              RxM CM progression functions when using manual progress.  Higher
172              values may provide less noise for calls to fi_cq read functions,
173              but may increase connection setup time (default: 10000)
174
175       FI_OFI_RXM_CQ_EQ_FAIRNESS
176              Defines  the  maximum number of message provider CQ entries that
177              can be consecutively read across progress calls without checking
178              to  see  if  the CM progress interval has been reached (default:
179              128)
180

Tuning

182   Bandwidth
183       To optimize for bandwidth, ensure you use higher  values  than  default
184       for   FI_OFI_RXM_TX_SIZE,  FI_OFI_RXM_RX_SIZE,  FI_OFI_RXM_MSG_TX_SIZE,
185       FI_OFI_RXM_MSG_RX_SIZE subject to memory limits of the system  and  the
186       tx and rx sizes supported by the MSG provider.
187
188       FI_OFI_RXM_SAR_LIMIT  is  another knob that can be experimented with to
189       optimze for bandwidth.
190
191   Memory
192       To conserve memory, ensure FI_UNIVERSE_SIZE set to  what  is  required.
193       Similarly    check    that    FI_OFI_RXM_TX_SIZE,   FI_OFI_RXM_RX_SIZE,
194       FI_OFI_RXM_MSG_TX_SIZE and FI_OFI_RXM_MSG_RX_SIZE env variables are set
195       to only required values.
196

NOTES

198       The data transfer API may return -FI_EAGAIN during on-demand connection
199       setup of the core provider FI_MSG_EP.  See fi_msg(3) for a detailed de‐
200       scription of handling FI_EAGAIN.
201

Troubleshooting / Known issues

203       If  an RxM endpoint is expected to communicate with more peers than the
204       default value of FI_UNIVERSE_SIZE (256) CQ  overruns  can  happen.   To
205       avoid  this  set  a  higher value for FI_UNIVERSE_SIZE.  CQ overrun can
206       make a MSG endpoint unusable.
207
208       At higher # of ranks, there may be connection errors due to a node run‐
209       ning  out  of memory.  The workaround is to use shared receive contexts
210       for the MSG provider (FI_OFI_RXM_USE_SRX=1)  or  reduce  eager  message
211       size  (FI_OFI_RXM_BUFFER_SIZE)  and  MSG  provider  TX/RX  queue  sizes
212       (FI_OFI_RXM_MSG_TX_SIZE / FI_OFI_RXM_MSG_RX_SIZE).
213

AUTHORS

218       OpenFabrics.
219
220
221
222Libfabric Programmer’s Manual     2021-03-22                         fi_rxm(7)