1LIBRPMEM(7) PMDK Programmer's Manual LIBRPMEM(7)
2
3
4
6 librpmem - remote persistent memory support library (EXPERIMENTAL)
7
9 #include <librpmem.h>
10 cc ... -lrpmem
11
12 Library API versioning:
13 const char *rpmem_check_version(
14 unsigned major_required,
15 unsigned minor_required);
16
17 Error handling:
18 const char *rpmem_errormsg(void);
19
20 Other library functions:
21 A description of other librpmem functions can be found on the following
22 manual pages:
23
24 · rpmem_create(3), rpmem_persist(3)
25
27 librpmem provides low-level support for remote access to persistent
28 memory (pmem) utilizing RDMA-capable RNICs. The library can be used to
29 remotely replicate a memory region over the RDMA protocol. It utilizes
30 an appropriate persistency mechanism based on the remote node's plat‐
31 form capabilities. librpmem utilizes the ssh(1) client to authenticate
32 a user on the remote node, and for encryption of the connection's
33 out-of-band configuration data. See SSH, below, for details.
34
35 The maximum replicated memory region size can not be bigger than the
36 maximum locked-in-memory address space limit. See memlock in lim‐
37 its.conf(5) for more details.
38
39 This library is for applications that use remote persistent memory di‐
40 rectly, without the help of any library-supplied transactions or memory
41 allocation. Higher-level libraries that build on libpmem(7) are avail‐
42 able and are recommended for most applications, see:
43
44 · libpmemobj(7), a general use persistent memory API, providing memory
45 allocation and transactional operations on variable-sized objects.
46
48 [<user>@]<hostname>[:<port>]
49
50 The target node address is described by the hostname which the client
51 connects to, with an optional user name. The user must be authorized
52 to authenticate to the remote machine without querying for pass‐
53 word/passphrase. The optional port number is used to establish the SSH
54 connection. The default port number is 22.
55
57 The rpmem_pool_attr structure describes a remote pool and is stored in
58 remote pool's metadata. This structure must be passed to the rp‐
59 mem_create(3) function by caller when creating a pool on remote node.
60 When opening the pool using rpmem_open(3) function the appropriate
61 fields are read from pool's metadata and returned back to the caller.
62
63 #define RPMEM_POOL_HDR_SIG_LEN 8
64 #define RPMEM_POOL_HDR_UUID_LEN 16
65 #define RPMEM_POOL_USER_FLAGS_LEN 16
66
67 struct rpmem_pool_attr {
68 char signature[RPMEM_POOL_HDR_SIG_LEN];
69 uint32_t major;
70 uint32_t compat_features;
71 uint32_t incompat_features;
72 uint32_t ro_compat_features;
73 unsigned char poolset_uuid[RPMEM_POOL_HDR_UUID_LEN];
74 unsigned char uuid[RPMEM_POOL_HDR_UUID_LEN];
75 unsigned char next_uuid[RPMEM_POOL_HDR_UUID_LEN];
76 unsigned char prev_uuid[RPMEM_POOL_HDR_UUID_LEN];
77 unsigned char user_flags[RPMEM_POOL_USER_FLAGS_LEN];
78 };
79
80 The signature field is an 8-byte field which describes the pool's
81 on-media format.
82
83 The major field is a major version number of the pool's on-media for‐
84 mat.
85
86 The compat_features field is a mask describing compatibility of pool's
87 on-media format optional features.
88
89 The incompat_features field is a mask describing compatibility of
90 pool's on-media format required features.
91
92 The ro_compat_features field is a mask describing compatibility of
93 pool's on-media format features. If these features are not available,
94 the pool shall be opened in read-only mode.
95
96 The poolset_uuid field is an UUID of the pool which the remote pool is
97 associated with.
98
99 The uuid field is an UUID of a first part of the remote pool. This
100 field can be used to connect the remote pool with other pools in a
101 list.
102
103 The next_uuid and prev_uuid fields are UUIDs of next and previous
104 replicas respectively. These fields can be used to connect the remote
105 pool with other pools in a list.
106
107 The user_flags field is a 16-byte user-defined flags.
108
110 librpmem utilizes the ssh(1) client to login and execute the rpmemd(1)
111 process on the remote node. By default, ssh(1) is executed with the -4
112 option, which forces using IPv4 addressing.
113
114 For debugging purposes, both the ssh client and the commands executed
115 on the remote node may be overridden by setting the RPMEM_SSH and RP‐
116 MEM_CMD environment variables, respectively. See ENVIRONMENT for de‐
117 tails.
118
120 The ssh(1) client is executed by rpmem_open(3) and rpmem_create(3) af‐
121 ter forking a child process using fork(2). The application must take
122 this into account when using wait(2) and waitpid(2), which may return
123 the PID of the ssh(1) process executed by librpmem.
124
125 If fork(2) support is not enabled in libibverbs, rpmem_open(3) and rp‐
126 mem_create(3) will fail. By default, fabric(7) initializes libibverbs
127 with fork(2) support by calling the ibv_fork_init(3) function. See
128 fi_verbs(7) for more details.
129
131 librpmem relies on the library destructor being called from the main
132 thread. For this reason, all functions that might trigger destruction
133 (e.g. dlclose(3)) should be called in the main thread. Otherwise some
134 of the resources associated with that thread might not be cleaned up
135 properly.
136
137 librpmem registers a pool as a single memory region. A Chelsio T4 and
138 T5 hardware can not handle a memory region greater than or equal to 8GB
139 due to a hardware bug. So pool_size value for rpmem_create(3) and rp‐
140 mem_open(3) using this hardware can not be greater than or equal to
141 8GB.
142
144 This section describes how the library API is versioned, allowing ap‐
145 plications to work with an evolving API.
146
147 The rpmem_check_version() function is used to see if the installed li‐
148 brpmem supports the version of the library API required by an applica‐
149 tion. The easiest way to do this is for the application to supply the
150 compile-time version information, supplied by defines in <librpmem.h>,
151 like this:
152
153 reason = rpmem_check_version(RPMEM_MAJOR_VERSION,
154 RPMEM_MINOR_VERSION);
155 if (reason != NULL) {
156 /* version check failed, reason string tells you why */
157 }
158
159 Any mismatch in the major version number is considered a failure, but a
160 library with a newer minor version number will pass this check since
161 increasing minor versions imply backwards compatibility.
162
163 An application can also check specifically for the existence of an in‐
164 terface by checking for the version where that interface was intro‐
165 duced. These versions are documented in this man page as follows: un‐
166 less otherwise specified, all interfaces described here are available
167 in version 1.0 of the library. Interfaces added after version 1.0 will
168 contain the text introduced in version x.y in the section of this manu‐
169 al describing the feature.
170
171 When the version check performed by rpmem_check_version() is success‐
172 ful, the return value is NULL. Otherwise the return value is a static
173 string describing the reason for failing the version check. The string
174 returned by rpmem_check_version() must not be modified or freed.
175
177 librpmem can change its default behavior based on the following envi‐
178 ronment variables. These are largely intended for testing and are not
179 normally required.
180
181 · RPMEM_SSH=ssh_client
182
183 Setting this environment variable overrides the default ssh(1) client
184 command name.
185
186 · RPMEM_CMD=cmd
187
188 Setting this environment variable overrides the default command execut‐
189 ed on the remote node using either ssh(1) or the alternative remote
190 shell command specified by RPMEM_SSH.
191
192 RPMEM_CMD can contain multiple commands separated by a vertical bar
193 (|). Each consecutive command is executed on the remote node in order
194 read from a pool set file. This environment variable is read when the
195 library is initialized, so RPMEM_CMD must be set prior to application
196 launch (or prior to dlopen(3) if librpmem is being dynamically loaded).
197
198 · RPMEM_ENABLE_SOCKETS=0|1
199
200 Setting this variable to 1 enables using fi_sockets(7) provider for
201 in-band RDMA connection. The sockets provider does not support IPv6.
202 It is required to disable IPv6 system wide if RPMEM_ENABLE_SOCKETS == 1
203 and target == localhost (or any other loopback interface address) and
204 SSH_CONNECTION variable (see ssh(1) for more details) contains IPv6 ad‐
205 dress after ssh to loopback interface. By default the sockets provider
206 is disabled.
207
208 · RPMEM_ENABLE_VERBS=0|1
209
210 Setting this variable to 0 disables using fi_verbs(7) provider for
211 in-band RDMA connection. The verbs provider is enabled by default.
212
213 · RPMEM_MAX_NLANES=num
214
215 Limit the maximum number of lanes to num. See LANES, in rpmem_cre‐
216 ate(3), for details.
217
219 If an error is detected during the call to a librpmem function, the ap‐
220 plication may retrieve an error message describing the reason for the
221 failure from rpmem_errormsg(). This function returns a pointer to a
222 static buffer containing the last error message logged for the current
223 thread. If errno was set, the error message may include a description
224 of the corresponding error code as returned by strerror(3). The error
225 message buffer is thread-local; errors encountered in one thread do not
226 affect its value in other threads. The buffer is never cleared by any
227 library function; its content is significant only when the return value
228 of the immediately preceding call to a librpmem function indicated an
229 error, or if errno was set. The application must not modify or free
230 the error message string, but it may be modified by subsequent calls to
231 other library functions.
232
233 Two versions of librpmem are typically available on a development sys‐
234 tem. The normal version, accessed when a program is linked using the
235 -lrpmem option, is optimized for performance. That version skips
236 checks that impact performance and never logs any trace information or
237 performs any run-time assertions.
238
239 A second version of librpmem, accessed when a program uses the li‐
240 braries under /usr/lib/pmdk_debug, contains run-time assertions and
241 trace points. The typical way to access the debug version is to set
242 the environment variable LD_LIBRARY_PATH to /usr/lib/pmdk_debug or
243 /usr/lib64/pmdk_debug, as appropriate. Debugging output is controlled
244 using the following environment variables. These variables have no ef‐
245 fect on the non-debug version of the library.
246
247 · RPMEM_LOG_LEVEL
248
249 The value of RPMEM_LOG_LEVEL enables trace points in the debug version
250 of the library, as follows:
251
252 · 0 - This is the default level when RPMEM_LOG_LEVEL is not set. No
253 log messages are emitted at this level.
254
255 · 1 - Additional details on any errors detected are logged (in addition
256 to returning the errno-based errors as usual). The same information
257 may be retrieved using rpmem_errormsg().
258
259 · 2 - A trace of basic operations is logged.
260
261 · 3 - Enables a very verbose amount of function call tracing in the li‐
262 brary.
263
264 · 4 - Enables voluminous and fairly obscure tracing information that is
265 likely only useful to the librpmem developers.
266
267 Unless RPMEM_LOG_FILE is set, debugging output is written to stderr.
268
269 · RPMEM_LOG_FILE
270
271 Specifies the name of a file where all logging information should be
272 written. If the last character in the name is “-”, the PID of the cur‐
273 rent process will be appended to the file name when the log file is
274 created. If RPMEM_LOG_FILE is not set, logging output is written to
275 stderr.
276
278 The following example uses librpmem to create a remote pool on given
279 target node identified by given pool set name. The associated local
280 memory pool is zeroed and the data is made persistent on remote node.
281 Upon success the remote pool is closed.
282
283 #include <stdio.h>
284 #include <string.h>
285
286 #include <librpmem.h>
287
288 #define POOL_SIZE (32 * 1024 * 1024)
289 #define NLANES 4
290 unsigned char pool[POOL_SIZE];
291
292 int
293 main(int argc, char *argv[])
294 {
295 int ret;
296 unsigned nlanes = NLANES;
297
298 /* fill pool_attributes */
299 struct rpmem_pool_attr pool_attr;
300 memset(&pool_attr, 0, sizeof(pool_attr));
301
302 /* create a remote pool */
303 RPMEMpool *rpp = rpmem_create("localhost", "pool.set",
304 pool, POOL_SIZE, &nlanes, &pool_attr);
305 if (!rpp) {
306 fprintf(stderr, "rpmem_create: %s\n", rpmem_errormsg());
307 return 1;
308 }
309
310 /* store data on local pool */
311 memset(pool, 0, POOL_SIZE);
312
313 /* make local data persistent on remote node */
314 ret = rpmem_persist(rpp, 0, POOL_SIZE, 0, 0);
315 if (ret) {
316 fprintf(stderr, "rpmem_persist: %s\n", rpmem_errormsg());
317 return 1;
318 }
319
320 /* close the remote pool */
321 ret = rpmem_close(rpp);
322 if (ret) {
323 fprintf(stderr, "rpmem_close: %s\n", rpmem_errormsg());
324 return 1;
325 }
326
327 return 0;
328 }
329
331 The librpmem API is experimental and may be subject to change in the
332 future. However, using the remote replication in libpmemobj(7) is safe
333 and backward compatibility will be preserved.
334
336 librpmem builds on the persistent memory programming model recommended
337 by the SNIA NVM Programming Technical Work Group:
338 <http://snia.org/nvmp>
339
341 rpmemd(1), ssh(1), fork(2), dlclose(3), dlopen(3), ibv_fork_init(3),
342 rpmem_create(3), rpmem_open(3), rpmem_persist(3), strerror(3), lim‐
343 its.conf(5), fabric(7), fi_sockets(7), fi_verbs(7), libpmem(7), libp‐
344 memblk(7), libpmemlog(7), libpmemobj(7) and <http://pmem.io>
345
346
347
348PMDK - rpmem API version 1.2 2019-03-01 LIBRPMEM(7)