1RSOCKET(7) Librdmacm Programmer's Manual RSOCKET(7)
2
3
4
6 rsocket - RDMA socket API
7
9 #include <rdma/rsocket.h>
10
12 RDMA socket API and protocol
13
15 Rsockets is a protocol over RDMA that supports a socket-level API for
16 applications. Rsocket APIs are intended to match the behavior of cor‐
17 responding socket calls, except where noted. Rsocket functions match
18 the name and function signature of socket calls, with the exception
19 that all function calls are prefixed with an 'r'.
20
21 The following functions are defined:
22
23 rsocket
24
25 rbind, rlisten, raccept, rconnect
26
27 rshutdown, rclose
28
29 rrecv, rrecvfrom, rrecvmsg, rread, rreadv
30
31 rsend, rsendto, rsendmsg, rwrite, rwritev
32
33 rpoll, rselect
34
35 rgetpeername, rgetsockname
36
37 rsetsockopt, rgetsockopt, rfcntl
38
39 Functions take the same parameters as that used for sockets. The fol‐
40 low capabilities and flags are supported at this time:
41
42 PF_INET, PF_INET6, SOCK_STREAM, SOCK_DGRAM
43
44 SOL_SOCKET - SO_ERROR, SO_KEEPALIVE (flag supported, but ignored),
45 SO_LINGER, SO_OOBINLINE, SO_RCVBUF, SO_REUSEADDR, SO_SNDBUF
46
47 IPPROTO_TCP - TCP_NODELAY, TCP_MAXSEG
48
49 IPPROTO_IPV6 - IPV6_V6ONLY
50
51 MSG_DONTWAIT, MSG_PEEK, O_NONBLOCK
52
53 Rsockets provides extensions beyond normal socket routines that allow
54 for direct placement of data into an application's buffer. This is
55 also known as zero-copy support, since data is sent and received
56 directly, bypassing copies into network controlled buffers. The fol‐
57 lowing calls and options support direct data placement.
58
59 riomap, riounmap, riowrite
60
61 off_t riomap(int socket, void *buf, size_t len, int prot, int flags,
62 off_t offset)
63
64 Riomap registers an application buffer with the RDMA hardware
65 associated with an rsocket. The buffer is registered either for
66 local only access (PROT_NONE) or for remote write access
67 (PROT_WRITE). When registered for remote access, the buffer is
68 mapped to a given offset. The offset is either provided by the
69 user, or if the user selects -1 for the offset, rsockets selects
70 one. The remote peer may access an iomapped buffer directly by
71 specifying the correct offset. The mapping is not guaranteed to
72 be available until after the remote peer receives a data trans‐
73 fer initiated after riomap has completed.
74
75 In order to enable the use of remote IO mapping calls on an rsocket, an
76 application must set the number of IO mappings that are available to
77 the remote peer. This may be done using the rsetsockopt RDMA_IOMAPSIZE
78 option. By default, an rsocket does not support remote IO mappings.
79 riounmap
80
81 int riounmap(int socket, void *buf, size_t len)
82
83 Riounmap removes the mapping between a buffer and an rsocket.
84
85 riowrite
86
87 size_t riowrite(int socket, const void *buf, size_t count, off_t off‐
88 set, int flags)
89
90 Riowrite allows an application to transfer data over an rsocket
91 directly into a remotely iomapped buffer. The remote buffer is
92 specified through an offset parameter, which corresponds to a
93 remote iomapped buffer. From the sender's perspective, riowrite
94 behaves similar to rwrite. From a receiver's view, riowrite
95 transfers are silently redirected into a pre- determined data
96 buffer. Data is received automatically, and the receiver is not
97 informed of the transfer. However, iowrite data is still con‐
98 sidered part of the data stream, such that iowrite data will be
99 written before a subsequent transfer is received. A message
100 sent immediately after initiating an iowrite may be used to
101 notify the receiver of the iowrite.
102
103 In addition to standard socket options, rsockets supports options spe‐
104 cific to RDMA devices and protocols. These options are accessible
105 through rsetsockopt using SOL_RDMA option level.
106
107 RDMA_SQSIZE - Integer size of the underlying send queue.
108
109 RDMA_RQSIZE - Integer size of the underlying receive queue.
110
111 RDMA_INLINE - Integer size of inline data.
112
113 RDMA_IOMAPSIZE - Integer number of remote IO mappings supported
114
115 RDMA_ROUTE - struct ibv_path_data of path record for connection.
116
117 Note that rsockets fd's cannot be passed into non-rsocket calls. For
118 applications which must mix rsocket fd's with standard socket fd's or
119 opened files, rpoll and rselect support polling both rsockets and nor‐
120 mal fd's.
121
122 Existing applications can make use of rsockets through the use of a
123 preload library. Because rsockets implements an end-to-end protocol,
124 both sides of a connection must use rsockets. The rdma_cm library pro‐
125 vides such a preload library, librspreload. To reduce the chance of
126 the preload library intercepting calls without the user's explicit
127 knowledge, the librspreload library is installed into %libdir%/rsocket
128 subdirectory.
129
130 The preload library can be used by setting LD_PRELOAD when running.
131 Note that not all applications will work with rsockets. Support is
132 limited based on the socket options used by the application. Support
133 for fork() is limited, but available. To use rsockets with the preload
134 library for applications that call fork, users must set the environment
135 variable RDMAV_FORK_SAFE=1 on both the client and server side of the
136 connection. In general, fork is supportable for server applications
137 that accept a connection, then fork off a process to handle the new
138 connection.
139
140 rsockets uses configuration files that give an administrator control
141 over the default settings used by rsockets. Use files under %syscon‐
142 fig%/rdma/rsocket as shown:
143
144 mem_default - default size of receive buffer(s)
145
146 wmem_default - default size of send buffer(s)
147
148 sqsize_default - default size of send queue
149
150 rqsize_default - default size of receive queue
151
152 inline_default - default size of inline data
153
154 iomap_size - default size of remote iomapping table
155
156 polling_time - default number of microseconds to poll for data before
157 waiting
158
159 All configuration files should contain a single integer value. Values
160 may be set by issuing a command similar to the following example.
161
162 echo 1000000 > /etc/rdma/rsocket/mem_default
163
164 If configuration files are not available, rsockets uses internal
165 defaults. Applications can override default values programmatically
166 through the rsetsockopt routine.
167
169 rdma_cm(7)
170
171
172
173librdmacm 2013-01-21 RSOCKET(7)