1PMEMOBJ_TX_BEGIN(3)        PMDK Programmer's Manual        PMEMOBJ_TX_BEGIN(3)
2
3
4

NAME

6       pmemobj_tx_stage(),
7
8       pmemobj_tx_begin(),   pmemobj_tx_lock(),   pmemobj_tx_abort(),   pmemo‐
9       bj_tx_commit(),    pmemobj_tx_end(),     pmemobj_tx_errno(),     pmemo‐
10       bj_tx_process(),
11
12       TX_BEGIN_PARAM(),  TX_BEGIN_CB(),  TX_BEGIN(), TX_ONABORT, TX_ONCOMMIT,
13       TX_FINALLY, TX_END,
14
15       pmemobj_tx_log_append_buffer(),   pmemobj_tx_log_auto_alloc(),   pmemo‐
16       bj_tx_log_snapshots_max_size(),   pmemobj_tx_log_intents_max_size()   -
17       transactional object manipulation
18

SYNOPSIS

20              #include <libpmemobj.h>
21
22              enum tx_stage pmemobj_tx_stage(void);
23
24              int pmemobj_tx_begin(PMEMobjpool *pop, jmp_buf *env, enum pobj_tx_param, ...);
25              int pmemobj_tx_lock(enum tx_lock lock_type, void *lockp);
26              void pmemobj_tx_abort(int errnum);
27              void pmemobj_tx_commit(void);
28              int pmemobj_tx_end(void);
29              int pmemobj_tx_errno(void);
30              void pmemobj_tx_process(void);
31
32              TX_BEGIN_PARAM(PMEMobjpool *pop, ...)
33              TX_BEGIN_CB(PMEMobjpool *pop, cb, arg, ...)
34              TX_BEGIN(PMEMobjpool *pop)
35              TX_ONABORT
36              TX_ONCOMMIT
37              TX_FINALLY
38              TX_END
39
40              int pmemobj_tx_log_append_buffer(enum pobj_log_type type, void *addr, size_t size);
41              int pmemobj_tx_log_auto_alloc(enum pobj_log_type type, int on_off);
42              size_t pmemobj_tx_log_snapshots_max_size(size_t *sizes, size_t nsizes);
43              size_t pmemobj_tx_log_intents_max_size(size_t nintents);
44

DESCRIPTION

46       The non-transactional functions and  macros  described  in  pmemobj_al‐
47       loc(3), pmemobj_list_insert(3) and POBJ_LIST_HEAD(3) only guarantee the
48       atomicity of a single operation on an object.  In case of more  complex
49       changes  involving  multiple operations on an object, or allocation and
50       modification of multiple objects, data consistency and fail-safety  may
51       be provided only by using atomic transactions.
52
53       A  transaction  is defined as series of operations on persistent memory
54       objects that either all occur, or nothing occurs.   In  particular,  if
55       the  execution  of a transaction is interrupted by a power failure or a
56       system crash, it is guaranteed  that  after  system  restart,  all  the
57       changes  made  as  a part of the uncompleted transaction will be rolled
58       back, restoring the consistent state of the memory pool from the moment
59       when the transaction was started.
60
61       Note  that  transactions do not provide atomicity with respect to other
62       threads.  All the modifications performed within the  transactions  are
63       immediately visible to other threads.  Therefore it is the responsibil‐
64       ity of the application to implement  a  proper  thread  synchronization
65       mechanism.
66
67       Each  thread  may  have  only  one transaction open at a time, but that
68       transaction may be nested.  Nested transactions are flattened.  Commit‐
69       ting the nested transaction does not commit the outer transaction; how‐
70       ever, errors in the nested transaction are propagated up to the  outer‐
71       most level, resulting in the interruption of the entire transaction.
72
73       Each  transaction  is  visible only for the thread that started it.  No
74       other threads can add operations, commit or abort the transaction  ini‐
75       tiated  by another thread.  Multiple threads may have transactions open
76       on a given memory pool at the same time.
77
78       Please see the CAVEATS section  below  for  known  limitations  of  the
79       transactional API.
80
81       The  pmemobj_tx_stage()  function returns the current transaction stage
82       for a thread.  Stages are changed only by the pmemobj_tx_*() functions.
83       Transaction stages are defined as follows:
84
85       · TX_STAGE_NONE - no open transaction in this thread
86
87       · TX_STAGE_WORK - transaction in progress
88
89       · TX_STAGE_ONCOMMIT - successfully committed
90
91       · TX_STAGE_ONABORT  -  starting  the  transaction failed or transaction
92         aborted
93
94       · TX_STAGE_FINALLY - ready for clean up
95
96       The pmemobj_tx_begin() function starts a new transaction in the current
97       thread.   If  called  within  an  open  transaction, it starts a nested
98       transaction.  The caller may use the env argument to provide a  pointer
99       to  a  calling environment to be restored in case of transaction abort.
100       This information must be provided by the  caller  using  the  setjmp(3)
101       macro.
102
103       A  new  transaction  may  be  started  only  if  the  current  stage is
104       TX_STAGE_NONE or TX_STAGE_WORK.  If successful, the  transaction  stage
105       changes   to   TX_STAGE_WORK.   Otherwise,  the  stage  is  changed  to
106       TX_STAGE_ONABORT.
107
108       Optionally, a list of parameters for the transaction may  be  provided.
109       Each parameter consists of a type followed by a type-specific number of
110       values.  Currently there are 4 types:
111
112       · TX_PARAM_NONE, used as a termination marker.  No following value.
113
114       · TX_PARAM_MUTEX, followed by one value, a pmem-resident PMEMmutex
115
116       · TX_PARAM_RWLOCK, followed by one value, a pmem-resident PMEMrwlock
117
118       · TX_PARAM_CB, followed by two values:  a  callback  function  of  type
119         pmemobj_tx_callback, and a void pointer
120
121       Using TX_PARAM_MUTEX or TX_PARAM_RWLOCK causes the specified lock to be
122       acquired at the beginning of the transaction.  TX_PARAM_RWLOCK acquires
123       the  lock  for  writing.  It is guaranteed that pmemobj_tx_begin() will
124       acquire all locks prior to successful completion, and they will be held
125       by  the  current  thread  until  the outermost transaction is finished.
126       Locks are taken in order from left to right.  To avoid  deadlocks,  the
127       user is responsible for proper lock ordering.
128
129       TX_PARAM_CB registers the specified callback function to be executed at
130       each transaction stage.  For TX_STAGE_WORK, the  callback  is  executed
131       prior to commit.  For all other stages, the callback is executed as the
132       first operation after a stage change.  It will  also  be  called  after
133       each  transaction;  in  this  case  the  stage parameter will be set to
134       TX_STAGE_NONE.  pmemobj_tx_callback must be compatible with:
135
136       void func(PMEMobjpool *pop, enum pobj_tx_stage stage, void *arg)
137
138       pop is a pool identifier used in pmemobj_tx_begin(), stage is a current
139       transaction  stage  and  arg  is  the  second parameter of TX_PARAM_CB.
140       Without considering transaction nesting, this mechanism can be  consid‐
141       ered  an  alternative method for executing code between stages (instead
142       of TX_ONCOMMIT, TX_ONABORT, etc).  However,  there  are  2  significant
143       differences when nested transactions are used:
144
145       · The  registered  function  is executed only in the outermost transac‐
146         tion, even if registered in an inner transaction.
147
148       · There can be only one callback in the entire  transaction,  that  is,
149         the callback cannot be changed in an inner transaction.
150
151       Note  that  TX_PARAM_CB  does  not replace the TX_ONCOMMIT, TX_ONABORT,
152       etc.  macros.  They can be used together: the callback will be executed
153       before a TX_ONCOMMIT, TX_ONABORT, etc.  section.
154
155       TX_PARAM_CB  can  be  used when the code dealing with transaction stage
156       changes is shared between multiple users or when it  must  be  executed
157       only  in the outer transaction.  For example it can be very useful when
158       the application must synchronize persistent and transient state.
159
160       The  pmemobj_tx_lock()  function  acquires  the  lock  lockp  of   type
161       lock_type  and  adds  it  to the current transaction.  lock_type may be
162       TX_LOCK_MUTEX or TX_LOCK_RWLOCK; lockp must be  of  type  PMEMmutex  or
163       PMEMrwlock,  respectively.   If lock_type is TX_LOCK_RWLOCK the lock is
164       acquired for writing.  If the lock is not  successfully  acquired,  the
165       function  returns an error number.  This function must be called during
166       TX_STAGE_WORK.
167
168       pmemobj_tx_abort() aborts the current transaction and causes a  transi‐
169       tion to TX_STAGE_ONABORT.  If errnum is equal to 0, the transaction er‐
170       ror code is set to ECANCELED; otherwise, it is  set  to  errnum.   This
171       function must be called during TX_STAGE_WORK.
172
173       The  pmemobj_tx_commit()  function commits the current open transaction
174       and causes a transition to TX_STAGE_ONCOMMIT.  If called in the context
175       of  the  outermost  transaction,  all  the changes may be considered as
176       durably written upon successful  completion.   This  function  must  be
177       called during TX_STAGE_WORK.
178
179       The  pmemobj_tx_end() function performs a cleanup of the current trans‐
180       action.  If called in the context of the outermost transaction, it  re‐
181       leases all the locks acquired by pmemobj_tx_begin() for outer and nest‐
182       ed transactions.  If called in the context of a nested transaction,  it
183       returns to the context of the outer transaction in TX_STAGE_WORK, with‐
184       out releasing any locks.  The pmemobj_tx_end() function can  be  called
185       during  TX_STAGE_NONE  if  transitioned  to  this  stage  using  pmemo‐
186       bj_tx_process().  If not already in TX_STAGE_NONE, it causes the  tran‐
187       sition to TX_STAGE_NONE.  pmemobj_tx_end must always be called for each
188       pmemobj_tx_begin(), even if  starting  the  transaction  failed.   This
189       function must not be called during TX_STAGE_WORK.
190
191       The  pmemobj_tx_errno()  function  returns  the  error code of the last
192       transaction.
193
194       The pmemobj_tx_process() function performs the actions associated  with
195       the  current  stage of the transaction, and makes the transition to the
196       next stage.  It must be called in a  transaction.   The  current  stage
197       must  always  be  obtained  by  a  call  to pmemobj_tx_stage().  pmemo‐
198       bj_tx_process() performs the following transitions in  the  transaction
199       stage flow:
200
201       · TX_STAGE_WORK -> TX_STAGE_ONCOMMIT
202
203       · TX_STAGE_ONABORT -> TX_STAGE_FINALLY
204
205       · TX_STAGE_ONCOMMIT -> TX_STAGE_FINALLY
206
207       · TX_STAGE_FINALLY -> TX_STAGE_NONE
208
209       · TX_STAGE_NONE -> TX_STAGE_NONE
210
211       pmemobj_tx_process()  must not be called after calling pmemobj_tx_end()
212       for the outermost transaction.
213
214       In addition to the above API, libpmemobj(7)  offers  a  more  intuitive
215       method  of  building transactions using the set of macros described be‐
216       low.  When using these macros, the complete transaction flow looks like
217       this:
218
219              TX_BEGIN(Pop) {
220                  /* the actual transaction code goes here... */
221              } TX_ONCOMMIT {
222                  /*
223                   * optional - executed only if the above block
224                   * successfully completes
225                   */
226              } TX_ONABORT {
227                  /*
228                   * optional - executed only if starting the transaction fails,
229                   * or if transaction is aborted by an error or a call to
230                   * pmemobj_tx_abort()
231                   */
232              } TX_FINALLY {
233                  /*
234                   * optional - if exists, it is executed after
235                   * TX_ONCOMMIT or TX_ONABORT block
236                   */
237              } TX_END /* mandatory */
238
239              TX_BEGIN_PARAM(PMEMobjpool *pop, ...)
240              TX_BEGIN_CB(PMEMobjpool *pop, cb, arg, ...)
241              TX_BEGIN(PMEMobjpool *pop)
242
243       The  TX_BEGIN_PARAM(),  TX_BEGIN_CB() and TX_BEGIN() macros start a new
244       transaction in the same way as pmemobj_tx_begin(), except that  instead
245       of  the  environment buffer provided by a caller, they set up the local
246       jmp_buf buffer and use it to catch the transaction abort.   The  TX_BE‐
247       GIN()  macro  starts a transaction without any options.  TX_BEGIN_PARAM
248       may be used when there is a need to acquire locks prior to  starting  a
249       transaction (such as for a multi-threaded program) or set up a transac‐
250       tion stage callback.  TX_BEGIN_CB  is  just  a  wrapper  around  TX_BE‐
251       GIN_PARAM  that  validates  the callback signature.  (For compatibility
252       there is also a TX_BEGIN_LOCK macro,  which  is  an  alias  for  TX_BE‐
253       GIN_PARAM).   Each  of these macros must be followed by a block of code
254       with all the operations that are to be performed atomically.
255
256       The TX_ONABORT macro starts a block of code that will be executed  only
257       if  starting  the  transaction  fails due to an error in pmemobj_tx_be‐
258       gin(), or if the transaction is aborted.  This block is  optional,  but
259       in  practice it should not be omitted.  If it is desirable to crash the
260       application when a transaction aborts and there is no  TX_ONABORT  sec‐
261       tion,  the application can define the POBJ_TX_CRASH_ON_NO_ONABORT macro
262       before inclusion of <libpmemobj.h>.  This provides a default TX_ONABORT
263       section which just calls abort(3).
264
265       The TX_ONCOMMIT macro starts a block of code that will be executed only
266       if the transaction is successfully committed, which means that the exe‐
267       cution  of  code in the TX_BEGIN() block has not been interrupted by an
268       error or by a call to pmemobj_tx_abort().  This block is optional.
269
270       The TX_FINALLY macro starts a block of code that will be  executed  re‐
271       gardless  of  whether  the  transaction  is committed or aborted.  This
272       block is optional.
273
274       The TX_END macro cleans up and closes the transaction  started  by  the
275       TX_BEGIN()  / TX_BEGIN_PARAM() / TX_BEGIN_CB() macros.  It is mandatory
276       to terminate each transaction with this macro.  If the transaction  was
277       aborted, errno is set appropriately.
278
279   TRANSACTION LOG TUNING
280       From libpmemobj implementation perspective there are two types of oper‐
281       ations in a transaction:
282
283       · snapshots, where action must be persisted immediately,
284
285       · intents, where action can be  persisted  at  the  transaction  commit
286         phase
287
288       pmemobj_tx_add_range(3)  and  all  its variants belong to the snapshots
289       group.
290
291       pmemobj_tx_alloc(3) (with  its  variants),  pmemobj_tx_free(3),  pmemo‐
292       bj_tx_realloc(3)  (with  its variants) and pmemobj_tx_publish(3) belong
293       to the intents group.  Even though pmemobj_tx_alloc() allocates  memory
294       immediately,  it  modifies only the runtime state and postpones persis‐
295       tent memory modifications to the commit phase.  pmemobj_tx_free(3) can‐
296       not  free the object immediately, because of possible transaction roll‐
297       back, so it postpones both the action and persistent  memory  modifica‐
298       tions to the commit phase.  pmemobj_tx_realloc(3) is just a combination
299       of those two.  pmemobj_tx_publish(3)  postpones  reservations  and  de‐
300       ferred frees to the commit phase.
301
302       Those  two  types  of  operations  (snapshots and intents) require that
303       libpmemobj builds a persistent log of  operations.   Intent  log  (also
304       known  as  a  “redo  log”)  is applied on commit and snapshot log (also
305       known as an “undo log”) is applied on abort.
306
307       When libpmemobj transaction starts, it's not possible  to  predict  how
308       much persistent memory space will be needed for those logs.  This means
309       that libpmemobj must internally allocate this space whenever it's need‐
310       ed.  This has two downsides:
311
312       · when  transaction  snapshots a lot of memory or does a lot of alloca‐
313         tions, libpmemobj may need to do  many  internal  allocations,  which
314         must  be  freed  when transaction ends, adding time overhead when big
315         transactions are frequent,
316
317       · transactions can start to fail due to not enough  space  for  logs  -
318         this  can  be  especially  problematic  for transactions that want to
319         deallocate objects, as those might also fail
320
321       To solve both of these problems libpmemobj exposes the following  func‐
322       tions:
323
324       · pmemobj_tx_log_append_buffer(),
325
326       · pmemobj_tx_log_auto_alloc()
327
328       pmemobj_tx_log_append_buffer()  appends  a given range of memory [addr,
329       addr + size) to the log type of the current transaction.  type  can  be
330       one of the two values (with meanings described above):
331
332       · TX_LOG_TYPE_SNAPSHOT,
333
334       · TX_LOG_TYPE_INTENT
335
336       The  range of memory must belong to the same pool the transaction is on
337       and must not be used by more than one thread at  the  same  time.   The
338       latter  condition can be verified with tx.debug.verify_user_buffers ctl
339       (see pmemobj_ctl_get(3)).
340
341       pmemobj_tx_log_snapshots_max_size calculates the maximum size of a buf‐
342       fer which will be able to hold nsizes snapshots, each of size sizes[i].
343       Application should not expect this function to return  the  same  value
344       between  restarts.   In future versions of libpmemobj this function can
345       return smaller (because of better accuracy or space  optimizations)  or
346       higher  (because  of  higher alignment required for better performance)
347       value.  This function is independent of transaction stage  and  can  be
348       called both inside and outside of transaction.  If the returned value S
349       is greater than PMEMOBJ_MAX_ALLOC_SIZE, the buffer should be split into
350       N chunks of size PMEMOBJ_MAX_ALLOC_SIZE, where N is equal to (S / PMEM‐
351       OBJ_MAX_ALLOC_SIZE) (rounded down) and the last chunk of size (S - (N *
352       PMEMOBJ_MAX_ALLOC_SIZE)).
353
354       pmemobj_tx_log_intents_max_size calculates the maximum size of a buffer
355       which will be able to hold nintents intents.   Just  like  with  pmemo‐
356       bj_tx_log_snapshots_max_size,  application should not expect this func‐
357       tion to return the same value between restarts, for the  same  reasons.
358       This  function  is  independent  of transaction stage and can be called
359       both inside and outside of transaction.
360
361       pmemobj_tx_log_auto_alloc() disables  (on_off  set  to  0)  or  enables
362       (on_off  set to 1) automatic allocation of internal logs of given type.
363       It can be used to verify that the buffer  set  with  pmemobj_tx_log_ap‐
364       pend_buffer()   is  big  enough  to  hold  the  log,  without  reaching
365       out-of-space scenario.
366

RETURN VALUE

368       The pmemobj_tx_stage() function returns the stage of the current trans‐
369       action stage for a thread.
370
371       On  success,  pmemobj_tx_begin() returns 0.  Otherwise, an error number
372       is returned.
373
374       The pmemobj_tx_begin() and pmemobj_tx_lock() functions return  zero  if
375       lockp  is  successfully  added to the transaction.  Otherwise, an error
376       number is returned.
377
378       The pmemobj_tx_abort() and pmemobj_tx_commit() functions return no val‐
379       ue.
380
381       The pmemobj_tx_end() function returns 0 if the transaction was success‐
382       ful.  Otherwise it returns the error code  set  by  pmemobj_tx_abort().
383       Note that pmemobj_tx_abort() can be called internally by the library.
384
385       The  pmemobj_tx_errno()  function  returns  the  error code of the last
386       transaction.
387
388       The pmemobj_tx_process() function returns no value.
389
390       On success, pmemobj_tx_log_append_buffer() returns 0.   Otherwise,  the
391       transaction is aborted and an error number is returned.
392
393       On  success,  pmemobj_tx_log_auto_alloc()  returns  0.   Otherwise, the
394       transaction is aborted and an error number is returned.
395
396       On success, pmemobj_tx_log_snapshots_max_size()  returns  size  of  the
397       buffer.  On failure it returns SIZE_MAX and sets errno appropriately.
398
399       On  success, pmemobj_tx_log_intents_max_size() returns size of the buf‐
400       fer.  On failure it returns SIZE_MAX and sets errno appropriately.
401

CAVEATS

403       Transaction flow control is governed by the  setjmp(3)  and  longjmp(3)
404       macros, and they are used in both the macro and function flavors of the
405       API.  The transaction will longjmp on transaction abort.  This has  one
406       major  drawback,  which  is  described in the ISO C standard subsection
407       7.13.2.1.  It says that the values of objects of automatic storage  du‐
408       ration  that are local to the function containing the setjmp invocation
409       that do not have volatile-qualified type and have been changed  between
410       the setjmp invocation and longjmp call are indeterminate.
411
412       The following example illustrates the issue described above.
413
414              int *bad_example_1 = (int *)0xBAADF00D;
415              int *bad_example_2 = (int *)0xBAADF00D;
416              int *bad_example_3 = (int *)0xBAADF00D;
417              int * volatile good_example = (int *)0xBAADF00D;
418
419              TX_BEGIN(pop) {
420                  bad_example_1 = malloc(sizeof(int));
421                  bad_example_2 = malloc(sizeof(int));
422                  bad_example_3 = malloc(sizeof(int));
423                  good_example = malloc(sizeof(int));
424
425                  /* manual or library abort called here */
426                  pmemobj_tx_abort(EINVAL);
427              } TX_ONCOMMIT {
428                  /*
429                   * This section is longjmp-safe
430                   */
431              } TX_ONABORT {
432                  /*
433                   * This section is not longjmp-safe
434                   */
435                  free(good_example); /* OK */
436                  free(bad_example_1); /* undefined behavior */
437              } TX_FINALLY {
438                  /*
439                   * This section is not longjmp-safe on transaction abort only
440                   */
441                  free(bad_example_2); /* undefined behavior */
442              } TX_END
443
444              free(bad_example_3); /* undefined behavior */
445
446       Objects  which are not volatile-qualified, are of automatic storage du‐
447       ration and have been changed between the invocations of  setjmp(3)  and
448       longjmp(3)  (that also means within the work section of the transaction
449       after TX_BEGIN()) should not be used  after  a  transaction  abort,  or
450       should  be  used  with  utmost care.  This also includes code after the
451       TX_END macro.
452
453       libpmemobj(7) is not cancellation-safe.  The pool will  never  be  cor‐
454       rupted  because of a canceled thread, but other threads may stall wait‐
455       ing on locks taken by that thread.  If the  application  wants  to  use
456       pthread_cancel(3),  it  must  disable  cancellation  before calling any
457       libpmemobj(7) APIs  (see  pthread_setcancelstate(3)  with  PTHREAD_CAN‐
458       CEL_DISABLE),  and  re-enable  it  afterwards.   Deferring cancellation
459       (pthread_setcanceltype(3) with  PTHREAD_CANCEL_DEFERRED)  is  not  safe
460       enough,  because  libpmemobj(7)  internally may call functions that are
461       specified as cancellation points in POSIX.
462
463       libpmemobj(7) relies on the library destructor being  called  from  the
464       main  thread.   For  this  reason, all functions that might trigger de‐
465       struction (e.g.  dlclose(3)) should be called in the main thread.  Oth‐
466       erwise  some  of the resources associated with that thread might not be
467       cleaned up properly.
468

SEE ALSO

470       dlclose(3), longjmp(3),  pmemobj_tx_add_range(3),  pmemobj_tx_alloc(3),
471       pthread_setcancelstate(3),  pthread_setcanceltype(3),  setjmp(3), libp‐
472       memobj(7) and <http://pmem.io>
473
474
475
476PMDK - pmemobj API version 2.3    2019-09-23               PMEMOBJ_TX_BEGIN(3)
Impressum