1LIBRPMEM(7) PMDK Programmer's Manual LIBRPMEM(7)
2
3
4
6 librpmem - remote persistent memory support library (EXPERIMENTAL)
7
9 #include <librpmem.h>
10 cc ... -lrpmem
11
12 Library API versioning:
13 const char *rpmem_check_version(
14 unsigned major_required,
15 unsigned minor_required);
16
17 Error handling:
18 const char *rpmem_errormsg(void);
19
20 Other library functions:
21 A description of other librpmem functions can be found on the following
22 manual pages:
23
24 • rpmem_create(3), rpmem_persist(3)
25
27 librpmem provides low-level support for remote access to persistent
28 memory (pmem) utilizing RDMA-capable RNICs. The library can be used to
29 remotely replicate a memory region over the RDMA protocol. It utilizes
30 an appropriate persistency mechanism based on the remote node’s plat‐
31 form capabilities. librpmem utilizes the ssh(1) client to authenticate
32 a user on the remote node, and for encryption of the connection’s out-
33 of-band configuration data. See SSH, below, for details.
34
35 The maximum replicated memory region size can not be bigger than the
36 maximum locked-in-memory address space limit. See memlock in lim‐
37 its.conf(5) for more details.
38
39 This library is for applications that use remote persistent memory di‐
40 rectly, without the help of any library-supplied transactions or memory
41 allocation. Higher-level libraries that build on libpmem(7) are avail‐
42 able and are recommended for most applications, see:
43
44 • libpmemobj(7), a general use persistent memory API, providing memory
45 allocation and transactional operations on variable-sized objects.
46
48 [<user>@]<hostname>[:<port>]
49
50 The target node address is described by the hostname which the client
51 connects to, with an optional user name. The user must be authorized
52 to authenticate to the remote machine without querying for pass‐
53 word/passphrase. The optional port number is used to establish the SSH
54 connection. The default port number is 22.
55
57 The rpmem_pool_attr structure describes a remote pool and is stored in
58 remote pool’s metadata. This structure must be passed to the rp‐
59 mem_create(3) function by caller when creating a pool on remote node.
60 When opening the pool using rpmem_open(3) function the appropriate
61 fields are read from pool’s metadata and returned back to the caller.
62
63 #define RPMEM_POOL_HDR_SIG_LEN 8
64 #define RPMEM_POOL_HDR_UUID_LEN 16
65 #define RPMEM_POOL_USER_FLAGS_LEN 16
66
67 struct rpmem_pool_attr {
68 char signature[RPMEM_POOL_HDR_SIG_LEN];
69 uint32_t major;
70 uint32_t compat_features;
71 uint32_t incompat_features;
72 uint32_t ro_compat_features;
73 unsigned char poolset_uuid[RPMEM_POOL_HDR_UUID_LEN];
74 unsigned char uuid[RPMEM_POOL_HDR_UUID_LEN];
75 unsigned char next_uuid[RPMEM_POOL_HDR_UUID_LEN];
76 unsigned char prev_uuid[RPMEM_POOL_HDR_UUID_LEN];
77 unsigned char user_flags[RPMEM_POOL_USER_FLAGS_LEN];
78 };
79
80 The signature field is an 8-byte field which describes the pool’s on-
81 media format.
82
83 The major field is a major version number of the pool’s on-media for‐
84 mat.
85
86 The compat_features field is a mask describing compatibility of pool’s
87 on-media format optional features.
88
89 The incompat_features field is a mask describing compatibility of
90 pool’s on-media format required features.
91
92 The ro_compat_features field is a mask describing compatibility of
93 pool’s on-media format features. If these features are not available,
94 the pool shall be opened in read-only mode.
95
96 The poolset_uuid field is an UUID of the pool which the remote pool is
97 associated with.
98
99 The uuid field is an UUID of a first part of the remote pool. This
100 field can be used to connect the remote pool with other pools in a
101 list.
102
103 The next_uuid and prev_uuid fields are UUIDs of next and previous
104 replicas respectively. These fields can be used to connect the remote
105 pool with other pools in a list.
106
107 The user_flags field is a 16-byte user-defined flags.
108
110 librpmem utilizes the ssh(1) client to login and execute the rpmemd(1)
111 process on the remote node. By default, ssh(1) is executed with the -4
112 option, which forces using IPv4 addressing.
113
114 For debugging purposes, both the ssh client and the commands executed
115 on the remote node may be overridden by setting the RPMEM_SSH and RP‐
116 MEM_CMD environment variables, respectively. See ENVIRONMENT for de‐
117 tails.
118
120 The ssh(1) client is executed by rpmem_open(3) and rpmem_create(3) af‐
121 ter forking a child process using fork(2). The application must take
122 this into account when using wait(2) and waitpid(2), which may return
123 the PID of the ssh(1) process executed by librpmem.
124
125 If fork(2) support is not enabled in libibverbs, rpmem_open(3) and rp‐
126 mem_create(3) will fail. By default, fabric(7) initializes libibverbs
127 with fork(2) support by calling the ibv_fork_init(3) function. See
128 fi_verbs(7) for more details.
129
131 librpmem relies on the library destructor being called from the main
132 thread. For this reason, all functions that might trigger destruction
133 (e.g. dlclose(3)) should be called in the main thread. Otherwise some
134 of the resources associated with that thread might not be cleaned up
135 properly.
136
137 librpmem registers a pool as a single memory region. A Chelsio T4 and
138 T5 hardware can not handle a memory region greater than or equal to 8GB
139 due to a hardware bug. So pool_size value for rpmem_create(3) and rp‐
140 mem_open(3) using this hardware can not be greater than or equal to
141 8GB.
142
144 This section describes how the library API is versioned, allowing ap‐
145 plications to work with an evolving API.
146
147 The rpmem_check_version() function is used to see if the installed li‐
148 brpmem supports the version of the library API required by an applica‐
149 tion. The easiest way to do this is for the application to supply the
150 compile-time version information, supplied by defines in <librpmem.h>,
151 like this:
152
153 reason = rpmem_check_version(RPMEM_MAJOR_VERSION,
154 RPMEM_MINOR_VERSION);
155 if (reason != NULL) {
156 /* version check failed, reason string tells you why */
157 }
158
159 Any mismatch in the major version number is considered a failure, but a
160 library with a newer minor version number will pass this check since
161 increasing minor versions imply backwards compatibility.
162
163 An application can also check specifically for the existence of an in‐
164 terface by checking for the version where that interface was intro‐
165 duced. These versions are documented in this man page as follows: un‐
166 less otherwise specified, all interfaces described here are available
167 in version 1.0 of the library. Interfaces added after version 1.0 will
168 contain the text introduced in version x.y in the section of this manu‐
169 al describing the feature.
170
171 When the version check performed by rpmem_check_version() is success‐
172 ful, the return value is NULL. Otherwise the return value is a static
173 string describing the reason for failing the version check. The string
174 returned by rpmem_check_version() must not be modified or freed.
175
177 librpmem can change its default behavior based on the following envi‐
178 ronment variables. These are largely intended for testing and are not
179 normally required.
180
181 • RPMEM_SSH=ssh_client
182
183 Setting this environment variable overrides the default ssh(1) client
184 command name.
185
186 • RPMEM_CMD=cmd
187
188 Setting this environment variable overrides the default command execut‐
189 ed on the remote node using either ssh(1) or the alternative remote
190 shell command specified by RPMEM_SSH.
191
192 RPMEM_CMD can contain multiple commands separated by a vertical bar
193 (|). Each consecutive command is executed on the remote node in order
194 read from a pool set file. This environment variable is read when the
195 library is initialized, so RPMEM_CMD must be set prior to application
196 launch (or prior to dlopen(3) if librpmem is being dynamically loaded).
197
198 • RPMEM_ENABLE_SOCKETS=0|1
199
200 Setting this variable to 1 enables using fi_sockets(7) provider for in-
201 band RDMA connection. The sockets provider does not support IPv6. It
202 is required to disable IPv6 system wide if RPMEM_ENABLE_SOCKETS == 1
203 and target == localhost (or any other loopback interface address) and
204 SSH_CONNECTION variable (see ssh(1) for more details) contains IPv6 ad‐
205 dress after ssh to loopback interface. By default the sockets provider
206 is disabled.
207
208 • RPMEM_ENABLE_VERBS=0|1
209
210 Setting this variable to 0 disables using fi_verbs(7) provider for in-
211 band RDMA connection. The verbs provider is enabled by default.
212
213 • RPMEM_MAX_NLANES=num
214
215 Limit the maximum number of lanes to num. See LANES, in rpmem_cre‐
216 ate(3), for details.
217
218 • RPMEM_WORK_QUEUE_SIZE=size
219
220 Suggest the work queue size. The effective work queue size can be
221 greater than suggested if librpmem requires it or it can be smaller if
222 underlying hardware does not support the suggested size. The work
223 queue size affects the performance of communication to the remote node.
224 rpmem_flush(3) operations can be added to the work queue up to the size
225 of this queue. When work queue is full any subsequent call has to wait
226 till the work queue will be drained. rpmem_drain(3) and rpmem_per‐
227 sist(3) among other things also drain the work queue.
228
230 If an error is detected during the call to a librpmem function, the ap‐
231 plication may retrieve an error message describing the reason for the
232 failure from rpmem_errormsg(). This function returns a pointer to a
233 static buffer containing the last error message logged for the current
234 thread. If errno was set, the error message may include a description
235 of the corresponding error code as returned by strerror(3). The error
236 message buffer is thread-local; errors encountered in one thread do not
237 affect its value in other threads. The buffer is never cleared by any
238 library function; its content is significant only when the return value
239 of the immediately preceding call to a librpmem function indicated an
240 error, or if errno was set. The application must not modify or free
241 the error message string, but it may be modified by subsequent calls to
242 other library functions.
243
244 Two versions of librpmem are typically available on a development sys‐
245 tem. The normal version, accessed when a program is linked using the
246 -lrpmem option, is optimized for performance. That version skips
247 checks that impact performance and never logs any trace information or
248 performs any run-time assertions.
249
250 A second version of librpmem, accessed when a program uses the li‐
251 braries under /usr/lib/pmdk_debug, contains run-time assertions and
252 trace points. The typical way to access the debug version is to set
253 the environment variable LD_LIBRARY_PATH to /usr/lib/pmdk_debug or
254 /usr/lib64/pmdk_debug, as appropriate. Debugging output is controlled
255 using the following environment variables. These variables have no ef‐
256 fect on the non-debug version of the library.
257
258 • RPMEM_LOG_LEVEL
259
260 The value of RPMEM_LOG_LEVEL enables trace points in the debug version
261 of the library, as follows:
262
263 • 0 - This is the default level when RPMEM_LOG_LEVEL is not set. No
264 log messages are emitted at this level.
265
266 • 1 - Additional details on any errors detected are logged (in addition
267 to returning the errno-based errors as usual). The same information
268 may be retrieved using rpmem_errormsg().
269
270 • 2 - A trace of basic operations is logged.
271
272 • 3 - Enables a very verbose amount of function call tracing in the li‐
273 brary.
274
275 • 4 - Enables voluminous and fairly obscure tracing information that is
276 likely only useful to the librpmem developers.
277
278 Unless RPMEM_LOG_FILE is set, debugging output is written to stderr.
279
280 • RPMEM_LOG_FILE
281
282 Specifies the name of a file where all logging information should be
283 written. If the last character in the name is “-”, the PID of the cur‐
284 rent process will be appended to the file name when the log file is
285 created. If RPMEM_LOG_FILE is not set, logging output is written to
286 stderr.
287
289 The following example uses librpmem to create a remote pool on given
290 target node identified by given pool set name. The associated local
291 memory pool is zeroed and the data is made persistent on remote node.
292 Upon success the remote pool is closed.
293
294 #include <assert.h>
295 #include <unistd.h>
296 #include <stdio.h>
297 #include <stdlib.h>
298 #include <string.h>
299
300 #include <librpmem.h>
301
302 #define POOL_SIGNATURE "MANPAGE"
303 #define POOL_SIZE (32 * 1024 * 1024)
304 #define NLANES 4
305
306 #define DATA_OFF 4096
307 #define DATA_SIZE (POOL_SIZE - DATA_OFF)
308
309 static void
310 parse_args(int argc, char *argv[], const char **target, const char **poolset)
311 {
312 if (argc < 3) {
313 fprintf(stderr, "usage:\t%s <target> <poolset>\n", argv[0]);
314 exit(1);
315 }
316
317 *target = argv[1];
318 *poolset = argv[2];
319 }
320
321 static void *
322 alloc_memory()
323 {
324 long pagesize = sysconf(_SC_PAGESIZE);
325 if (pagesize < 0) {
326 perror("sysconf");
327 exit(1);
328 }
329
330 /* allocate a page size aligned local memory pool */
331 void *mem;
332 int ret = posix_memalign(&mem, pagesize, POOL_SIZE);
333 if (ret) {
334 fprintf(stderr, "posix_memalign: %s\n", strerror(ret));
335 exit(1);
336 }
337
338 assert(mem != NULL);
339
340 return mem;
341 }
342
343 int
344 main(int argc, char *argv[])
345 {
346 const char *target, *poolset;
347 parse_args(argc, argv, &target, &poolset);
348
349 unsigned nlanes = NLANES;
350 void *pool = alloc_memory();
351 int ret;
352
353 /* fill pool_attributes */
354 struct rpmem_pool_attr pool_attr;
355 memset(&pool_attr, 0, sizeof(pool_attr));
356 strncpy(pool_attr.signature, POOL_SIGNATURE, RPMEM_POOL_HDR_SIG_LEN);
357
358 /* create a remote pool */
359 RPMEMpool *rpp = rpmem_create(target, poolset, pool, POOL_SIZE,
360 &nlanes, &pool_attr);
361 if (!rpp) {
362 fprintf(stderr, "rpmem_create: %s\n", rpmem_errormsg());
363 return 1;
364 }
365
366 /* store data on local pool */
367 memset(pool, 0, POOL_SIZE);
368
369 /* make local data persistent on remote node */
370 ret = rpmem_persist(rpp, DATA_OFF, DATA_SIZE, 0, 0);
371 if (ret) {
372 fprintf(stderr, "rpmem_persist: %s\n", rpmem_errormsg());
373 return 1;
374 }
375
376 /* close the remote pool */
377 ret = rpmem_close(rpp);
378 if (ret) {
379 fprintf(stderr, "rpmem_close: %s\n", rpmem_errormsg());
380 return 1;
381 }
382
383 free(pool);
384
385 return 0;
386 }
387
389 The librpmem API is experimental and may be subject to change in the
390 future. However, using the remote replication in libpmemobj(7) is safe
391 and backward compatibility will be preserved.
392
394 librpmem builds on the persistent memory programming model recommended
395 by the SNIA NVM Programming Technical Work Group:
396 <https://snia.org/nvmp>
397
399 rpmemd(1), ssh(1), fork(2), dlclose(3), dlopen(3), ibv_fork_init(3),
400 rpmem_create(3), rpmem_drain(3), rpmem_flush(3), rpmem_open(3), rp‐
401 mem_persist(3), strerror(3), limits.conf(5), fabric(7), fi_sockets(7),
402 fi_verbs(7), libpmem(7), libpmemblk(7), libpmemlog(7), libpmemobj(7)
403 and <https://pmem.io>
404
405
406
407PMDK - rpmem API version 1.3 2021-01-26 LIBRPMEM(7)