1LIBRPMEM(7)                PMDK Programmer's Manual                LIBRPMEM(7)
2
3
4

NAME

6       librpmem - remote persistent memory support library (EXPERIMENTAL)
7

SYNOPSIS

9              #include <librpmem.h>
10              cc ... -lrpmem
11
12   Library API versioning:
13              const char *rpmem_check_version(
14                  unsigned major_required,
15                  unsigned minor_required);
16
17   Error handling:
18              const char *rpmem_errormsg(void);
19
20   Other library functions:
21       A description of other librpmem functions can be found on the following
22       manual pages:
23
24       · rpmem_create(3), rpmem_persist(3)
25

DESCRIPTION

27       librpmem provides low-level support for  remote  access  to  persistent
28       memory (pmem) utilizing RDMA-capable RNICs.  The library can be used to
29       remotely replicate a memory region over the RDMA protocol.  It utilizes
30       an  appropriate  persistency mechanism based on the remote node's plat‐
31       form capabilities.  librpmem utilizes the ssh(1) client to authenticate
32       a  user  on  the  remote  node,  and for encryption of the connection's
33       out-of-band configuration data.  See SSH, below, for details.
34
35       The maximum replicated memory region size can not be  bigger  than  the
36       maximum  locked-in-memory  address  space  limit.   See memlock in lim‐
37       its.conf(5) for more details.
38
39       This library is for applications that use remote persistent memory  di‐
40       rectly, without the help of any library-supplied transactions or memory
41       allocation.  Higher-level libraries that build on libpmem(7) are avail‐
42       able and are recommended for most applications, see:
43
44       · libpmemobj(7),  a general use persistent memory API, providing memory
45         allocation and transactional operations on variable-sized objects.
46

TARGET NODE ADDRESS FORMAT

48              [<user>@]<hostname>[:<port>]
49
50       The target node address is described by the hostname which  the  client
51       connects  to,  with an optional user name.  The user must be authorized
52       to authenticate to  the  remote  machine  without  querying  for  pass‐
53       word/passphrase.  The optional port number is used to establish the SSH
54       connection.  The default port number is 22.
55

REMOTE POOL ATTRIBUTES

57       The rpmem_pool_attr structure describes a remote pool and is stored  in
58       remote  pool's  metadata.   This  structure  must  be passed to the rp‐
59       mem_create(3) function by caller when creating a pool on  remote  node.
60       When  opening  the  pool  using  rpmem_open(3) function the appropriate
61       fields are read from pool's metadata and returned back to the caller.
62
63              #define RPMEM_POOL_HDR_SIG_LEN    8
64              #define RPMEM_POOL_HDR_UUID_LEN   16
65              #define RPMEM_POOL_USER_FLAGS_LEN 16
66
67              struct rpmem_pool_attr {
68                  char signature[RPMEM_POOL_HDR_SIG_LEN];
69                  uint32_t major;
70                  uint32_t compat_features;
71                  uint32_t incompat_features;
72                  uint32_t ro_compat_features;
73                  unsigned char poolset_uuid[RPMEM_POOL_HDR_UUID_LEN];
74                  unsigned char uuid[RPMEM_POOL_HDR_UUID_LEN];
75                  unsigned char next_uuid[RPMEM_POOL_HDR_UUID_LEN];
76                  unsigned char prev_uuid[RPMEM_POOL_HDR_UUID_LEN];
77                  unsigned char user_flags[RPMEM_POOL_USER_FLAGS_LEN];
78              };
79
80       The signature field is an  8-byte  field  which  describes  the  pool's
81       on-media format.
82
83       The  major  field is a major version number of the pool's on-media for‐
84       mat.
85
86       The compat_features field is a mask describing compatibility of  pool's
87       on-media format optional features.
88
89       The  incompat_features  field  is  a  mask  describing compatibility of
90       pool's on-media format required features.
91
92       The ro_compat_features field is  a  mask  describing  compatibility  of
93       pool's  on-media format features.  If these features are not available,
94       the pool shall be opened in read-only mode.
95
96       The poolset_uuid field is an UUID of the pool which the remote pool  is
97       associated with.
98
99       The  uuid  field  is  an UUID of a first part of the remote pool.  This
100       field can be used to connect the remote pool  with  other  pools  in  a
101       list.
102
103       The  next_uuid  and  prev_uuid  fields  are  UUIDs of next and previous
104       replicas respectively.  These fields can be used to connect the  remote
105       pool with other pools in a list.
106
107       The user_flags field is a 16-byte user-defined flags.
108

SSH

110       librpmem  utilizes the ssh(1) client to login and execute the rpmemd(1)
111       process on the remote node.  By default, ssh(1) is executed with the -4
112       option, which forces using IPv4 addressing.
113
114       For  debugging  purposes, both the ssh client and the commands executed
115       on the remote node may be overridden by setting the RPMEM_SSH  and  RP‐
116       MEM_CMD  environment  variables, respectively.  See ENVIRONMENT for de‐
117       tails.
118

FORK

120       The ssh(1) client is executed by rpmem_open(3) and rpmem_create(3)  af‐
121       ter  forking  a child process using fork(2).  The application must take
122       this into account when using wait(2) and waitpid(2), which  may  return
123       the PID of the ssh(1) process executed by librpmem.
124
125       If  fork(2) support is not enabled in libibverbs, rpmem_open(3) and rp‐
126       mem_create(3) will fail.  By default, fabric(7) initializes  libibverbs
127       with  fork(2)  support  by  calling the ibv_fork_init(3) function.  See
128       fi_verbs(7) for more details.
129

CAVEATS

131       librpmem relies on the library destructor being called  from  the  main
132       thread.   For this reason, all functions that might trigger destruction
133       (e.g.  dlclose(3)) should be called in the main thread.  Otherwise some
134       of  the  resources  associated with that thread might not be cleaned up
135       properly.
136
137       librpmem registers a pool as a single memory region.  A Chelsio T4  and
138       T5 hardware can not handle a memory region greater than or equal to 8GB
139       due to a hardware bug.  So pool_size value for rpmem_create(3) and  rp‐
140       mem_open(3)  using  this  hardware  can not be greater than or equal to
141       8GB.
142

LIBRARY API VERSIONING

144       This section describes how the library API is versioned,  allowing  ap‐
145       plications to work with an evolving API.
146
147       The  rpmem_check_version() function is used to see if the installed li‐
148       brpmem supports the version of the library API required by an  applica‐
149       tion.   The easiest way to do this is for the application to supply the
150       compile-time version information, supplied by defines in  <librpmem.h>,
151       like this:
152
153              reason = rpmem_check_version(RPMEM_MAJOR_VERSION,
154                                           RPMEM_MINOR_VERSION);
155              if (reason != NULL) {
156                  /* version check failed, reason string tells you why */
157              }
158
159       Any mismatch in the major version number is considered a failure, but a
160       library with a newer minor version number will pass  this  check  since
161       increasing minor versions imply backwards compatibility.
162
163       An  application can also check specifically for the existence of an in‐
164       terface by checking for the version where  that  interface  was  intro‐
165       duced.   These versions are documented in this man page as follows: un‐
166       less otherwise specified, all interfaces described here  are  available
167       in version 1.0 of the library.  Interfaces added after version 1.0 will
168       contain the text introduced in version x.y in the section of this manu‐
169       al describing the feature.
170
171       When  the  version check performed by rpmem_check_version() is success‐
172       ful, the return value is NULL.  Otherwise the return value is a  static
173       string describing the reason for failing the version check.  The string
174       returned by rpmem_check_version() must not be modified or freed.
175

ENVIRONMENT

177       librpmem can change its default behavior based on the  following  envi‐
178       ronment  variables.  These are largely intended for testing and are not
179       normally required.
180
181       · RPMEM_SSH=ssh_client
182
183       Setting this environment variable overrides the default  ssh(1)  client
184       command name.
185
186       · RPMEM_CMD=cmd
187
188       Setting this environment variable overrides the default command execut‐
189       ed on the remote node using either ssh(1)  or  the  alternative  remote
190       shell command specified by RPMEM_SSH.
191
192       RPMEM_CMD  can  contain  multiple  commands separated by a vertical bar
193       (|).  Each consecutive command is executed on the remote node in  order
194       read  from a pool set file.  This environment variable is read when the
195       library is initialized, so RPMEM_CMD must be set prior  to  application
196       launch (or prior to dlopen(3) if librpmem is being dynamically loaded).
197
198       · RPMEM_ENABLE_SOCKETS=0|1
199
200       Setting  this  variable  to  1 enables using fi_sockets(7) provider for
201       in-band RDMA connection.  The sockets provider does not  support  IPv6.
202       It is required to disable IPv6 system wide if RPMEM_ENABLE_SOCKETS == 1
203       and target == localhost (or any other loopback interface  address)  and
204       SSH_CONNECTION variable (see ssh(1) for more details) contains IPv6 ad‐
205       dress after ssh to loopback interface.  By default the sockets provider
206       is disabled.
207
208       · RPMEM_ENABLE_VERBS=0|1
209
210       Setting  this  variable  to  0  disables using fi_verbs(7) provider for
211       in-band RDMA connection.  The verbs provider is enabled by default.
212
213       · RPMEM_MAX_NLANES=num
214
215       Limit the maximum number of lanes to num.   See  LANES,  in  rpmem_cre‐
216       ate(3), for details.
217
218       · RPMEM_WORK_QUEUE_SIZE=size
219
220       Suggest  the  work  queue  size.   The effective work queue size can be
221       greater than suggested if librpmem requires it or it can be smaller  if
222       underlying  hardware  does  not  support  the suggested size.  The work
223       queue size affects the performance of communication to the remote node.
224       rpmem_flush(3) operations can be added to the work queue up to the size
225       of this queue.  When work queue is full any subsequent call has to wait
226       till  the  work  queue  will be drained.  rpmem_drain(3) and rpmem_per‐
227       sist(3) among other things also drain the work queue.
228

DEBUGGING AND ERROR HANDLING

230       If an error is detected during the call to a librpmem function, the ap‐
231       plication  may  retrieve an error message describing the reason for the
232       failure from rpmem_errormsg().  This function returns a  pointer  to  a
233       static  buffer containing the last error message logged for the current
234       thread.  If errno was set, the error message may include a  description
235       of  the corresponding error code as returned by strerror(3).  The error
236       message buffer is thread-local; errors encountered in one thread do not
237       affect  its value in other threads.  The buffer is never cleared by any
238       library function; its content is significant only when the return value
239       of  the  immediately preceding call to a librpmem function indicated an
240       error, or if errno was set.  The application must not  modify  or  free
241       the error message string, but it may be modified by subsequent calls to
242       other library functions.
243
244       Two versions of librpmem are typically available on a development  sys‐
245       tem.   The  normal version, accessed when a program is linked using the
246       -lrpmem option, is  optimized  for  performance.   That  version  skips
247       checks  that impact performance and never logs any trace information or
248       performs any run-time assertions.
249
250       A second version of librpmem, accessed when  a  program  uses  the  li‐
251       braries  under  /usr/lib/pmdk_debug,  contains  run-time assertions and
252       trace points.  The typical way to access the debug version  is  to  set
253       the  environment  variable  LD_LIBRARY_PATH  to  /usr/lib/pmdk_debug or
254       /usr/lib64/pmdk_debug, as appropriate.  Debugging output is  controlled
255       using the following environment variables.  These variables have no ef‐
256       fect on the non-debug version of the library.
257
258       · RPMEM_LOG_LEVEL
259
260       The value of RPMEM_LOG_LEVEL enables trace points in the debug  version
261       of the library, as follows:
262
263       · 0  -  This  is the default level when RPMEM_LOG_LEVEL is not set.  No
264         log messages are emitted at this level.
265
266       · 1 - Additional details on any errors detected are logged (in addition
267         to  returning the errno-based errors as usual).  The same information
268         may be retrieved using rpmem_errormsg().
269
270       · 2 - A trace of basic operations is logged.
271
272       · 3 - Enables a very verbose amount of function call tracing in the li‐
273         brary.
274
275       · 4 - Enables voluminous and fairly obscure tracing information that is
276         likely only useful to the librpmem developers.
277
278       Unless RPMEM_LOG_FILE is set, debugging output is written to stderr.
279
280       · RPMEM_LOG_FILE
281
282       Specifies the name of a file where all logging  information  should  be
283       written.  If the last character in the name is “-”, the PID of the cur‐
284       rent process will be appended to the file name when  the  log  file  is
285       created.   If  RPMEM_LOG_FILE  is not set, logging output is written to
286       stderr.
287

EXAMPLE

289       The following example uses librpmem to create a remote  pool  on  given
290       target  node  identified  by given pool set name.  The associated local
291       memory pool is zeroed and the data is made persistent on  remote  node.
292       Upon success the remote pool is closed.
293
294              #include <assert.h>
295              #include <unistd.h>
296              #include <stdio.h>
297              #include <stdlib.h>
298              #include <string.h>
299
300              #include <librpmem.h>
301
302              #define POOL_SIGNATURE  "MANPAGE"
303              #define POOL_SIZE   (32 * 1024 * 1024)
304              #define NLANES      4
305
306              #define DATA_OFF    4096
307              #define DATA_SIZE   (POOL_SIZE - DATA_OFF)
308
309              static void
310              parse_args(int argc, char *argv[], const char **target, const char **poolset)
311              {
312                  if (argc < 3) {
313                      fprintf(stderr, "usage:\t%s <target> <poolset>\n", argv[0]);
314                      exit(1);
315                  }
316
317                  *target = argv[1];
318                  *poolset = argv[2];
319              }
320
321              static void *
322              alloc_memory()
323              {
324                  long pagesize = sysconf(_SC_PAGESIZE);
325                  if (pagesize < 0) {
326                      perror("sysconf");
327                      exit(1);
328                  }
329
330                  /* allocate a page size aligned local memory pool */
331                  void *mem;
332                  int ret = posix_memalign(&mem, pagesize, POOL_SIZE);
333                  if (ret) {
334                      fprintf(stderr, "posix_memalign: %s\n", strerror(ret));
335                      exit(1);
336                  }
337
338                  assert(mem != NULL);
339
340                  return mem;
341              }
342
343              int
344              main(int argc, char *argv[])
345              {
346                  const char *target, *poolset;
347                  parse_args(argc, argv, &target, &poolset);
348
349                  unsigned nlanes = NLANES;
350                  void *pool = alloc_memory();
351                  int ret;
352
353                  /* fill pool_attributes */
354                  struct rpmem_pool_attr pool_attr;
355                  memset(&pool_attr, 0, sizeof(pool_attr));
356                  strncpy(pool_attr.signature, POOL_SIGNATURE, RPMEM_POOL_HDR_SIG_LEN);
357
358                  /* create a remote pool */
359                  RPMEMpool *rpp = rpmem_create(target, poolset, pool, POOL_SIZE,
360                          &nlanes, &pool_attr);
361                  if (!rpp) {
362                      fprintf(stderr, "rpmem_create: %s\n", rpmem_errormsg());
363                      return 1;
364                  }
365
366                  /* store data on local pool */
367                  memset(pool, 0, POOL_SIZE);
368
369                  /* make local data persistent on remote node */
370                  ret = rpmem_persist(rpp, DATA_OFF, DATA_SIZE, 0, 0);
371                  if (ret) {
372                      fprintf(stderr, "rpmem_persist: %s\n", rpmem_errormsg());
373                      return 1;
374                  }
375
376                  /* close the remote pool */
377                  ret = rpmem_close(rpp);
378                  if (ret) {
379                      fprintf(stderr, "rpmem_close: %s\n", rpmem_errormsg());
380                      return 1;
381                  }
382
383                  free(pool);
384
385                  return 0;
386              }
387

NOTE

389       The  librpmem  API  is experimental and may be subject to change in the
390       future.  However, using the remote replication in libpmemobj(7) is safe
391       and backward compatibility will be preserved.
392

ACKNOWLEDGEMENTS

394       librpmem  builds on the persistent memory programming model recommended
395       by    the    SNIA    NVM    Programming    Technical    Work     Group:
396       <https://snia.org/nvmp>
397

SEE ALSO

399       rpmemd(1),  ssh(1),  fork(2),  dlclose(3), dlopen(3), ibv_fork_init(3),
400       rpmem_create(3),  rpmem_drain(3),  rpmem_flush(3),  rpmem_open(3),  rp‐
401       mem_persist(3),  strerror(3), limits.conf(5), fabric(7), fi_sockets(7),
402       fi_verbs(7), libpmem(7),  libpmemblk(7),  libpmemlog(7),  libpmemobj(7)
403       and <https://pmem.io>
404
405
406
407PMDK - rpmem API version 1.3      2020-01-31                       LIBRPMEM(7)
Impressum