1fi_rxm(7)                      Libfabric v1.10.0                     fi_rxm(7)
2
3
4

NAME

6       fi_rxm - The RxM (RDM over MSG) Utility Provider
7

OVERVIEW

9       The  RxM  provider  (ofi_rxm)  is  an  utility  provider  that supports
10       FI_EP_RDM type endpoint emulated over FI_EP_MSG type endpoint(s) of  an
11       underlying core provider.  FI_EP_RDM endpoints have a reliable datagram
12       interface and RxM emulates this by hiding the connection management  of
13       underlying  FI_EP_MSG  endpoints  from the user.  Additionally, RxM can
14       hide memory registration requirement from a core provider like verbs if
15       the apps don't support it.
16

REQUIREMENTS

18   Requirements for core provider
19       RxM  provider  requires the core provider to support the following fea‐
20       tures:
21
22       · MSG endpoints (FI_EP_MSG)
23
24       · RMA read/write (FI_RMA) - Used for implementing  rendezvous  protocol
25         for large messages.
26
27       · FI_OPT_CM_DATA_SIZE of at least 24 bytes.
28
29   Requirements for applications
30       Since  RxM  emulates  RDM endpoints by hiding connection management and
31       connections are established only on-demand (when app tries to send  da‐
32       ta), the first several data transfer calls would return EAGAIN.  Appli‐
33       cations should be aware of this and retry until the operation succeeds.
34
35       If an application has chosen manual  progress  for  data  progress,  it
36       should  also read the CQ so that the connection establishment progress‐
37       es.  Not doing so would result in a stall.  See also the ERRORS section
38       in fi_msg(3).
39

SUPPORTED FEATURES

41       The  RxM  provider  currently  supports  FI_MSG,  FI_TAGGED, FI_RMA and
42       FI_ATOMIC capabilities.
43
44       Endpoint types
45              The provider supports only FI_EP_RDM.
46
47       Endpoint capabilities
48              The following data  transfer  interface  is  supported:  FI_MSG,
49              FI_TAGGED, FI_RMA, FI_ATOMIC.
50
51       Progress
52              The   RxM   provider   supports   both   FI_PROGRESS_MANUAL  and
53              FI_PROGRESS_AUTO.  Manual progress in general has better connec‐
54              tion  scale-up  and lower CPU utilization since there's no sepa‐
55              rate auto-progress thread.
56
57       Addressing Formats
58              FI_SOCKADDR, FI_SOCKADDR_IN
59
60       Memory Region
61              FI_MR_VIRT_ADDR, FI_MR_ALLOCATED, FI_MR_PROV_KEY  MR  mode  bits
62              would  be  required  from  the app in case the core provider re‐
63              quires it.
64

LIMITATIONS

66       When using RxM provider,  some  limitations  from  the  underlying  MSG
67       provider  could  also  show  up.  Please refer to the corresponding MSG
68       provider man pages to find about those limitations.
69
70   Unsupported features
71       RxM provider does not support the following features:
72
73       · op_flags: FI_FENCE.
74
75       · Scalable endpoints
76
77       · Shared contexts
78
79       · FABRIC_DIRECT
80
81       · FI_MR_SCALABLE
82
83       · Authorization keys
84
85       · Application error data buffers
86
87       · Multicast
88
89       · FI_SYNC_ERR
90
91       · Reporting unknown source addr data as part of completions
92
93       · Triggered operations
94
95   Progress limitations
96       When sending large messages, an app doing an sread or waiting on the CQ
97       file  descriptor may not get a completion when reading the CQ after be‐
98       ing woken up from the wait.  The app has to do sread  or  wait  on  the
99       file  descriptor  again.   This is needed because RxM uses a rendezvous
100       protocol for large message sends.  An app would get woken up from wait‐
101       ing  on  CQ  fd when rendezvous protocol request completes but it would
102       have to wait again to get an ACK from the receiver  indicating  comple‐
103       tion of large message transfer by remote RMA read.
104
105   FI_ATOMIC limitations
106       The  FI_ATOMIC  capability  will  only  be listed in the fi_info if the
107       fi_info hints parameter specifies FI_ATOMIC.  If FI_ATOMIC is  request‐
108       ed,  message  order  FI_ORDER_RAR,  FI_ORDER_RAW,  FI_ORDER_WAR, FI_OR‐
109       DER_WAW, FI_ORDER_SAR, and FI_ORDER_SAW can not be supported.
110
111   Miscellaneous limitations
112       · RxM protocol peers should have same endian-ness otherwise connections
113         won't  successfully  complete.   This  enables  better performance at
114         run-time as byte order translations are avoided.
115

RUNTIME PARAMETERS

117       The ofi_rxm provider checks for the following environment variables.
118
119       FI_OFI_RXM_BUFFER_SIZE
120              Defines the transmit buffer size /  inject  size.   Messages  of
121              size  less  than this would be transmitted via an eager protocol
122              and those above would be transmitted via  a  rendezvous  or  SAR
123              (Segmentation  And Reassembly) protocol.  Transmit data would be
124              copied up to this size (default: ~16k).
125
126       FI_OFI_RXM_COMP_PER_PROGRESS
127              Defines the maximum number of MSG provider CQ entries  (default:
128              1) that would be read per progress (RxM CQ read).
129
130       FI_OFI_RXM_SAR_LIMIT
131              Set  this environment variable to control the RxM SAR (Segmenta‐
132              tion And Reassembly) protocol.  Messages of  size  greater  than
133              this  (default: 128 Kb) would be transmitted via rendezvous pro‐
134              tocol.
135
136       FI_OFI_RXM_USE_SRX
137              Set this to 1 to use shared receive context from  MSG  provider.
138              This  reduces overall memory usage but there may be a slight in‐
139              crease in latency (default: 0).
140
141       FI_OFI_RXM_TX_SIZE
142              Defines default TX context size (default: 1024)
143
144       FI_OFI_RXM_RX_SIZE
145              Defines default RX context size (default: 1024)
146
147       FI_OFI_RXM_MSG_TX_SIZE
148              Defines FI_EP_MSG TX size  that  would  be  requested  (default:
149              128).
150
151       FI_OFI_RXM_MSG_RX_SIZE
152              Defines  FI_EP_MSG  RX  size  that  would be requested (default:
153              128).
154
155       FI_UNIVERSE_SIZE
156              Defines the expected number of ranks / peers an  endpoint  would
157              communicate with (default: 256).
158
159       FI_OFI_RXM_CM_PROGRESS_INTERVAL
160              Defines  the  duration  of time in microseconds between calls to
161              RxM CM progression functions when using manual progress.  Higher
162              values may provide less noise for calls to fi_cq read functions,
163              but may increase connection setup time (default: 10000)
164
165       FI_OFI_RXM_CQ_EQ_FAIRNESS
166              Defines the maximum number of message provider CQ  entries  that
167              can be consecutively read across progress calls without checking
168              to see if the CM progress interval has  been  reached  (default:
169              128)
170

Tuning

172   Bandwidth
173       To  optimize  for  bandwidth, ensure you use higher values than default
174       for  FI_OFI_RXM_TX_SIZE,  FI_OFI_RXM_RX_SIZE,   FI_OFI_RXM_MSG_TX_SIZE,
175       FI_OFI_RXM_MSG_RX_SIZE  subject  to memory limits of the system and the
176       tx and rx sizes supported by the MSG provider.
177
178       FI_OFI_RXM_SAR_LIMIT is another knob that can be experimented  with  to
179       optimze for bandwidth.
180
181   Memory
182       To  conserve  memory,  ensure FI_UNIVERSE_SIZE set to what is required.
183       Similarly   check    that    FI_OFI_RXM_TX_SIZE,    FI_OFI_RXM_RX_SIZE,
184       FI_OFI_RXM_MSG_TX_SIZE and FI_OFI_RXM_MSG_RX_SIZE env variables are set
185       to only required values.
186

NOTES

188       The data transfer API may return -FI_EAGAIN during on-demand connection
189       setup of the core provider FI_MSG_EP.  See fi_msg(3) for a detailed de‐
190       scription of handling FI_EAGAIN.
191

Troubleshooting / Known issues

193       If an RxM endpoint is expected to communicate with more peers than  the
194       default  value  of  FI_UNIVERSE_SIZE  (256) CQ overruns can happen.  To
195       avoid this set a higher value for  FI_UNIVERSE_SIZE.   CQ  overrun  can
196       make a MSG endpoint unusable.
197
198       At higher # of ranks, there may be connection errors due to a node run‐
199       ning out of memory.  The workaround is to use shared  receive  contexts
200       for  the  MSG  provider  (FI_OFI_RXM_USE_SRX=1) or reduce eager message
201       size  (FI_OFI_RXM_BUFFER_SIZE)  and  MSG  provider  TX/RX  queue  sizes
202       (FI_OFI_RXM_MSG_TX_SIZE / FI_OFI_RXM_MSG_RX_SIZE).
203

SEE ALSO

205       fabric(7), fi_provider(7), fi_getinfo(3)
206

AUTHORS

208       OpenFabrics.
209
210
211
212Libfabric Programmer's Manual     2020-04-14                         fi_rxm(7)
Impressum