1PMEMOBJ_TX_BEGIN(3) PMDK Programmer's Manual PMEMOBJ_TX_BEGIN(3)
2
3
4
6 pmemobj_tx_stage(),
7
8 pmemobj_tx_begin(), pmemobj_tx_lock(), pmemobj_tx_xlock(), pmemo‐
9 bj_tx_abort(), pmemobj_tx_commit(), pmemobj_tx_end(), pmemobj_tx_er‐
10 rno(), pmemobj_tx_process(),
11
12 TX_BEGIN_PARAM(), TX_BEGIN_CB(), TX_BEGIN(), TX_ONABORT, TX_ONCOMMIT,
13 TX_FINALLY, TX_END,
14
15 pmemobj_tx_log_append_buffer(), pmemobj_tx_xlog_append_buffer(), pmemo‐
16 bj_tx_log_auto_alloc(), pmemobj_tx_log_snapshots_max_size(), pmemo‐
17 bj_tx_log_intents_max_size(),
18
19 pmemobj_tx_set_user_data(), pmemobj_tx_get_user_data() - transactional
20 object manipulation
21
23 #include <libpmemobj.h>
24
25 enum tx_stage pmemobj_tx_stage(void);
26
27 int pmemobj_tx_begin(PMEMobjpool *pop, jmp_buf *env, enum pobj_tx_param, ...);
28 int pmemobj_tx_lock(enum tx_lock lock_type, void *lockp);
29 int pmemobj_tx_xlock(enum tx_lock lock_type, void *lockp, uint64_t flags);
30 void pmemobj_tx_abort(int errnum);
31 void pmemobj_tx_commit(void);
32 int pmemobj_tx_end(void);
33 int pmemobj_tx_errno(void);
34 void pmemobj_tx_process(void);
35
36 TX_BEGIN_PARAM(PMEMobjpool *pop, ...)
37 TX_BEGIN_CB(PMEMobjpool *pop, cb, arg, ...)
38 TX_BEGIN(PMEMobjpool *pop)
39 TX_ONABORT
40 TX_ONCOMMIT
41 TX_FINALLY
42 TX_END
43
44 int pmemobj_tx_log_append_buffer(enum pobj_log_type type, void *addr, size_t size);
45 int pmemobj_tx_xlog_append_buffer(enum pobj_log_type type, void *addr, size_t size, uint64_t flags);
46 int pmemobj_tx_log_auto_alloc(enum pobj_log_type type, int on_off);
47 size_t pmemobj_tx_log_snapshots_max_size(size_t *sizes, size_t nsizes);
48 size_t pmemobj_tx_log_intents_max_size(size_t nintents);
49
50 void pmemobj_tx_set_user_data(void *data);
51 void *pmemobj_tx_get_user_data(void);
52
54 The non-transactional functions and macros described in pmemobj_al‐
55 loc(3), pmemobj_list_insert(3) and POBJ_LIST_HEAD(3) only guarantee the
56 atomicity of a single operation on an object. In case of more complex
57 changes involving multiple operations on an object, or allocation and
58 modification of multiple objects, data consistency and fail-safety may
59 be provided only by using atomic transactions.
60
61 A transaction is defined as series of operations on persistent memory
62 objects that either all occur, or nothing occurs. In particular, if
63 the execution of a transaction is interrupted by a power failure or a
64 system crash, it is guaranteed that after system restart, all the
65 changes made as a part of the uncompleted transaction will be rolled
66 back, restoring the consistent state of the memory pool from the moment
67 when the transaction was started.
68
69 Note that transactions do not provide atomicity with respect to other
70 threads. All the modifications performed within the transactions are
71 immediately visible to other threads. Therefore it is the responsibil‐
72 ity of the application to implement a proper thread synchronization
73 mechanism.
74
75 Each thread may have only one transaction open at a time, but that
76 transaction may be nested. Nested transactions are flattened. Commit‐
77 ting the nested transaction does not commit the outer transaction; how‐
78 ever, errors in the nested transaction are propagated up to the outer‐
79 most level, resulting in the interruption of the entire transaction.
80
81 Each transaction is visible only for the thread that started it. No
82 other threads can add operations, commit or abort the transaction ini‐
83 tiated by another thread. Multiple threads may have transactions open
84 on a given memory pool at the same time.
85
86 Please see the CAVEATS section below for known limitations of the
87 transactional API.
88
89 The pmemobj_tx_stage() function returns the current transaction stage
90 for a thread. Stages are changed only by the pmemobj_tx_*() functions.
91 Transaction stages are defined as follows:
92
93 · TX_STAGE_NONE - no open transaction in this thread
94
95 · TX_STAGE_WORK - transaction in progress
96
97 · TX_STAGE_ONCOMMIT - successfully committed
98
99 · TX_STAGE_ONABORT - starting the transaction failed or transaction
100 aborted
101
102 · TX_STAGE_FINALLY - ready for clean up
103
104 The pmemobj_tx_begin() function starts a new transaction in the current
105 thread. If called within an open transaction, it starts a nested
106 transaction. The caller may use the env argument to provide a pointer
107 to a calling environment to be restored in case of transaction abort.
108 This information must be provided by the caller using the setjmp(3)
109 macro.
110
111 A new transaction may be started only if the current stage is
112 TX_STAGE_NONE or TX_STAGE_WORK. If successful, the transaction stage
113 changes to TX_STAGE_WORK. Otherwise, the stage is changed to
114 TX_STAGE_ONABORT.
115
116 Optionally, a list of parameters for the transaction may be provided.
117 Each parameter consists of a type followed by a type-specific number of
118 values. Currently there are 4 types:
119
120 · TX_PARAM_NONE, used as a termination marker. No following value.
121
122 · TX_PARAM_MUTEX, followed by one value, a pmem-resident PMEMmutex
123
124 · TX_PARAM_RWLOCK, followed by one value, a pmem-resident PMEMrwlock
125
126 · TX_PARAM_CB, followed by two values: a callback function of type
127 pmemobj_tx_callback, and a void pointer
128
129 Using TX_PARAM_MUTEX or TX_PARAM_RWLOCK causes the specified lock to be
130 acquired at the beginning of the transaction. TX_PARAM_RWLOCK acquires
131 the lock for writing. It is guaranteed that pmemobj_tx_begin() will
132 acquire all locks prior to successful completion, and they will be held
133 by the current thread until the outermost transaction is finished.
134 Locks are taken in order from left to right. To avoid deadlocks, the
135 user is responsible for proper lock ordering.
136
137 TX_PARAM_CB registers the specified callback function to be executed at
138 each transaction stage. For TX_STAGE_WORK, the callback is executed
139 prior to commit. For all other stages, the callback is executed as the
140 first operation after a stage change. It will also be called after
141 each transaction; in this case the stage parameter will be set to
142 TX_STAGE_NONE. pmemobj_tx_callback must be compatible with:
143
144 void func(PMEMobjpool *pop, enum pobj_tx_stage stage, void *arg)
145
146 pop is a pool identifier used in pmemobj_tx_begin(), stage is a current
147 transaction stage and arg is the second parameter of TX_PARAM_CB.
148 Without considering transaction nesting, this mechanism can be consid‐
149 ered an alternative method for executing code between stages (instead
150 of TX_ONCOMMIT, TX_ONABORT, etc). However, there are 2 significant
151 differences when nested transactions are used:
152
153 · The registered function is executed only in the outermost transac‐
154 tion, even if registered in an inner transaction.
155
156 · There can be only one callback in the entire transaction, that is,
157 the callback cannot be changed in an inner transaction.
158
159 Note that TX_PARAM_CB does not replace the TX_ONCOMMIT, TX_ONABORT,
160 etc. macros. They can be used together: the callback will be executed
161 before a TX_ONCOMMIT, TX_ONABORT, etc. section.
162
163 TX_PARAM_CB can be used when the code dealing with transaction stage
164 changes is shared between multiple users or when it must be executed
165 only in the outer transaction. For example it can be very useful when
166 the application must synchronize persistent and transient state.
167
168 The pmemobj_tx_lock() function acquires the lock lockp of type
169 lock_type and adds it to the current transaction. lock_type may be
170 TX_LOCK_MUTEX or TX_LOCK_RWLOCK; lockp must be of type PMEMmutex or
171 PMEMrwlock, respectively. If lock_type is TX_LOCK_RWLOCK the lock is
172 acquired for writing. If the lock is not successfully acquired, the
173 function returns an error number. This function must be called during
174 TX_STAGE_WORK.
175
176 The pmemobj_tx_xlock() function behaves exactly the same as pmemo‐
177 bj_tx_lock() when flags equals POBJ_XLOCK_NO_ABORT. When flags equals
178 0 and if the lock is not successfully acquired,the transaction is
179 aborted. flags is a bitmask of the following values:
180
181 · POBJ_XLOCK_NO_ABORT - if the function does not end successfully, do
182 not abort the transaction.
183
184 pmemobj_tx_abort() aborts the current transaction and causes a transi‐
185 tion to TX_STAGE_ONABORT. If errnum is equal to 0, the transaction er‐
186 ror code is set to ECANCELED; otherwise, it is set to errnum. This
187 function must be called during TX_STAGE_WORK.
188
189 The pmemobj_tx_commit() function commits the current open transaction
190 and causes a transition to TX_STAGE_ONCOMMIT. If called in the context
191 of the outermost transaction, all the changes may be considered as
192 durably written upon successful completion. This function must be
193 called during TX_STAGE_WORK.
194
195 The pmemobj_tx_end() function performs a cleanup of the current trans‐
196 action. If called in the context of the outermost transaction, it re‐
197 leases all the locks acquired by pmemobj_tx_begin() for outer and nest‐
198 ed transactions. If called in the context of a nested transaction, it
199 returns to the context of the outer transaction in TX_STAGE_WORK, with‐
200 out releasing any locks. The pmemobj_tx_end() function can be called
201 during TX_STAGE_NONE if transitioned to this stage using pmemo‐
202 bj_tx_process(). If not already in TX_STAGE_NONE, it causes the tran‐
203 sition to TX_STAGE_NONE. pmemobj_tx_end must always be called for each
204 pmemobj_tx_begin(), even if starting the transaction failed. This
205 function must not be called during TX_STAGE_WORK.
206
207 The pmemobj_tx_errno() function returns the error code of the last
208 transaction.
209
210 The pmemobj_tx_process() function performs the actions associated with
211 the current stage of the transaction, and makes the transition to the
212 next stage. It must be called in a transaction. The current stage
213 must always be obtained by a call to pmemobj_tx_stage(). pmemo‐
214 bj_tx_process() performs the following transitions in the transaction
215 stage flow:
216
217 · TX_STAGE_WORK -> TX_STAGE_ONCOMMIT
218
219 · TX_STAGE_ONABORT -> TX_STAGE_FINALLY
220
221 · TX_STAGE_ONCOMMIT -> TX_STAGE_FINALLY
222
223 · TX_STAGE_FINALLY -> TX_STAGE_NONE
224
225 · TX_STAGE_NONE -> TX_STAGE_NONE
226
227 pmemobj_tx_process() must not be called after calling pmemobj_tx_end()
228 for the outermost transaction.
229
230 In addition to the above API, libpmemobj(7) offers a more intuitive
231 method of building transactions using the set of macros described be‐
232 low. When using these macros, the complete transaction flow looks like
233 this:
234
235 TX_BEGIN(Pop) {
236 /* the actual transaction code goes here... */
237 } TX_ONCOMMIT {
238 /*
239 * optional - executed only if the above block
240 * successfully completes
241 */
242 } TX_ONABORT {
243 /*
244 * optional - executed only if starting the transaction fails,
245 * or if transaction is aborted by an error or a call to
246 * pmemobj_tx_abort()
247 */
248 } TX_FINALLY {
249 /*
250 * optional - if exists, it is executed after
251 * TX_ONCOMMIT or TX_ONABORT block
252 */
253 } TX_END /* mandatory */
254
255 TX_BEGIN_PARAM(PMEMobjpool *pop, ...)
256 TX_BEGIN_CB(PMEMobjpool *pop, cb, arg, ...)
257 TX_BEGIN(PMEMobjpool *pop)
258
259 The TX_BEGIN_PARAM(), TX_BEGIN_CB() and TX_BEGIN() macros start a new
260 transaction in the same way as pmemobj_tx_begin(), except that instead
261 of the environment buffer provided by a caller, they set up the local
262 jmp_buf buffer and use it to catch the transaction abort. The TX_BE‐
263 GIN() macro starts a transaction without any options. TX_BEGIN_PARAM
264 may be used when there is a need to acquire locks prior to starting a
265 transaction (such as for a multi-threaded program) or set up a transac‐
266 tion stage callback. TX_BEGIN_CB is just a wrapper around TX_BE‐
267 GIN_PARAM that validates the callback signature. (For compatibility
268 there is also a TX_BEGIN_LOCK macro, which is an alias for TX_BE‐
269 GIN_PARAM). Each of these macros must be followed by a block of code
270 with all the operations that are to be performed atomically.
271
272 The TX_ONABORT macro starts a block of code that will be executed only
273 if starting the transaction fails due to an error in pmemobj_tx_be‐
274 gin(), or if the transaction is aborted. This block is optional, but
275 in practice it should not be omitted. If it is desirable to crash the
276 application when a transaction aborts and there is no TX_ONABORT sec‐
277 tion, the application can define the POBJ_TX_CRASH_ON_NO_ONABORT macro
278 before inclusion of <libpmemobj.h>. This provides a default TX_ONABORT
279 section which just calls abort(3).
280
281 The TX_ONCOMMIT macro starts a block of code that will be executed only
282 if the transaction is successfully committed, which means that the exe‐
283 cution of code in the TX_BEGIN() block has not been interrupted by an
284 error or by a call to pmemobj_tx_abort(). This block is optional.
285
286 The TX_FINALLY macro starts a block of code that will be executed re‐
287 gardless of whether the transaction is committed or aborted. This
288 block is optional.
289
290 The TX_END macro cleans up and closes the transaction started by the
291 TX_BEGIN() / TX_BEGIN_PARAM() / TX_BEGIN_CB() macros. It is mandatory
292 to terminate each transaction with this macro. If the transaction was
293 aborted, errno is set appropriately.
294
295 TRANSACTION LOG TUNING
296 From libpmemobj implementation perspective there are two types of oper‐
297 ations in a transaction:
298
299 · snapshots, where action must be persisted immediately,
300
301 · intents, where action can be persisted at the transaction commit
302 phase
303
304 pmemobj_tx_add_range(3) and all its variants belong to the snapshots
305 group.
306
307 pmemobj_tx_alloc(3) (with its variants), pmemobj_tx_free(3), pmemo‐
308 bj_tx_realloc(3) (with its variants) and pmemobj_tx_publish(3) belong
309 to the intents group. Even though pmemobj_tx_alloc() allocates memory
310 immediately, it modifies only the runtime state and postpones persis‐
311 tent memory modifications to the commit phase. pmemobj_tx_free(3) can‐
312 not free the object immediately, because of possible transaction roll‐
313 back, so it postpones both the action and persistent memory modifica‐
314 tions to the commit phase. pmemobj_tx_realloc(3) is just a combination
315 of those two. pmemobj_tx_publish(3) postpones reservations and de‐
316 ferred frees to the commit phase.
317
318 Those two types of operations (snapshots and intents) require that
319 libpmemobj builds a persistent log of operations. Intent log (also
320 known as a “redo log”) is applied on commit and snapshot log (also
321 known as an “undo log”) is applied on abort.
322
323 When libpmemobj transaction starts, it's not possible to predict how
324 much persistent memory space will be needed for those logs. This means
325 that libpmemobj must internally allocate this space whenever it's need‐
326 ed. This has two downsides:
327
328 · when transaction snapshots a lot of memory or does a lot of alloca‐
329 tions, libpmemobj may need to do many internal allocations, which
330 must be freed when transaction ends, adding time overhead when big
331 transactions are frequent,
332
333 · transactions can start to fail due to not enough space for logs -
334 this can be especially problematic for transactions that want to
335 deallocate objects, as those might also fail
336
337 To solve both of these problems libpmemobj exposes the following func‐
338 tions:
339
340 · pmemobj_tx_log_append_buffer(),
341
342 · pmemobj_tx_xlog_append_buffer(),
343
344 · pmemobj_tx_log_auto_alloc()
345
346 pmemobj_tx_log_append_buffer() appends a given range of memory [addr,
347 addr + size) to the log type of the current transaction. type can be
348 one of the two values (with meanings described above):
349
350 · TX_LOG_TYPE_SNAPSHOT,
351
352 · TX_LOG_TYPE_INTENT
353
354 The range of memory must belong to the same pool the transaction is on
355 and must not be used by more than one thread at the same time. The
356 latter condition can be verified with tx.debug.verify_user_buffers ctl
357 (see pmemobj_ctl_get(3)).
358
359 The pmemobj_tx_xlog_append_buffer() function behaves exactly the same
360 as pmemobj_tx_log_append_buffer() when flags equals zero. flags is a
361 bitmask of the following values:
362
363 · POBJ_XLOG_APPEND_BUFFER_NO_ABORT - if the function does not end suc‐
364 cessfully, do not abort the transaction.
365
366 pmemobj_tx_log_snapshots_max_size calculates the maximum size of a buf‐
367 fer which will be able to hold nsizes snapshots, each of size sizes[i].
368 Application should not expect this function to return the same value
369 between restarts. In future versions of libpmemobj this function can
370 return smaller (because of better accuracy or space optimizations) or
371 higher (because of higher alignment required for better performance)
372 value. This function is independent of transaction stage and can be
373 called both inside and outside of transaction. If the returned value S
374 is greater than PMEMOBJ_MAX_ALLOC_SIZE, the buffer should be split into
375 N chunks of size PMEMOBJ_MAX_ALLOC_SIZE, where N is equal to (S / PMEM‐
376 OBJ_MAX_ALLOC_SIZE) (rounded down) and the last chunk of size (S - (N *
377 PMEMOBJ_MAX_ALLOC_SIZE)).
378
379 pmemobj_tx_log_intents_max_size calculates the maximum size of a buffer
380 which will be able to hold nintents intents. Just like with pmemo‐
381 bj_tx_log_snapshots_max_size, application should not expect this func‐
382 tion to return the same value between restarts, for the same reasons.
383 This function is independent of transaction stage and can be called
384 both inside and outside of transaction.
385
386 pmemobj_tx_log_auto_alloc() disables (on_off set to 0) or enables
387 (on_off set to 1) automatic allocation of internal logs of given type.
388 It can be used to verify that the buffer set with pmemobj_tx_log_ap‐
389 pend_buffer() is big enough to hold the log, without reaching
390 out-of-space scenario.
391
392 The pmemobj_tx_set_user_data() function associates custom volatile
393 state, represented by pointer data, with the current transaction. This
394 state can later be retrieved using pmemobj_tx_get_user_data() function.
395 If pmemobj_tx_set_user_data() was not called for a current transaction,
396 pmemobj_tx_get_user_data() will return NULL. These functions must be
397 called during TX_STAGE_WORK or TX_STAGE_ONABORT or TX_STAGE_ONCOMMIT or
398 TX_STAGE_FINALLY.
399
401 The pmemobj_tx_stage() function returns the stage of the current trans‐
402 action stage for a thread.
403
404 On success, pmemobj_tx_begin() returns 0. Otherwise, an error number
405 is returned.
406
407 The pmemobj_tx_begin() and pmemobj_tx_lock() functions return zero if
408 lockp is successfully added to the transaction. Otherwise, an error
409 number is returned.
410
411 The pmemobj_tx_xlock() function return zero if lockp is successfully
412 added to the transaction. Otherwise, the error number is returned, er‐
413 rno is set and when flags do not contain POBJ_XLOCK_NO_ABORT, the
414 transaction is aborted.
415
416 The pmemobj_tx_abort() and pmemobj_tx_commit() functions return no val‐
417 ue.
418
419 The pmemobj_tx_end() function returns 0 if the transaction was success‐
420 ful. Otherwise it returns the error code set by pmemobj_tx_abort().
421 Note that pmemobj_tx_abort() can be called internally by the library.
422
423 The pmemobj_tx_errno() function returns the error code of the last
424 transaction.
425
426 The pmemobj_tx_process() function returns no value.
427
428 On success, pmemobj_tx_log_append_buffer() returns 0. Otherwise, the
429 stage is changed to TX_STAGE_ONABORT, errno is set appropriately and
430 transaction is aborted.
431
432 On success, pmemobj_tx_xlog_append_buffer() returns 0. Otherwise, the
433 error number is returned, errno is set and when flags do not contain
434 POBJ_XLOG_NO_ABORT, the transaction is aborted.
435
436 On success, pmemobj_tx_log_auto_alloc() returns 0. Otherwise, the
437 transaction is aborted and an error number is returned.
438
439 On success, pmemobj_tx_log_snapshots_max_size() returns size of the
440 buffer. On failure it returns SIZE_MAX and sets errno appropriately.
441
442 On success, pmemobj_tx_log_intents_max_size() returns size of the buf‐
443 fer. On failure it returns SIZE_MAX and sets errno appropriately.
444
446 Transaction flow control is governed by the setjmp(3) and longjmp(3)
447 macros, and they are used in both the macro and function flavors of the
448 API. The transaction will longjmp on transaction abort. This has one
449 major drawback, which is described in the ISO C standard subsection
450 7.13.2.1. It says that the values of objects of automatic storage du‐
451 ration that are local to the function containing the setjmp invocation
452 that do not have volatile-qualified type and have been changed between
453 the setjmp invocation and longjmp call are indeterminate.
454
455 The following example illustrates the issue described above.
456
457 int *bad_example_1 = (int *)0xBAADF00D;
458 int *bad_example_2 = (int *)0xBAADF00D;
459 int *bad_example_3 = (int *)0xBAADF00D;
460 int * volatile good_example = (int *)0xBAADF00D;
461
462 TX_BEGIN(pop) {
463 bad_example_1 = malloc(sizeof(int));
464 bad_example_2 = malloc(sizeof(int));
465 bad_example_3 = malloc(sizeof(int));
466 good_example = malloc(sizeof(int));
467
468 /* manual or library abort called here */
469 pmemobj_tx_abort(EINVAL);
470 } TX_ONCOMMIT {
471 /*
472 * This section is longjmp-safe
473 */
474 } TX_ONABORT {
475 /*
476 * This section is not longjmp-safe
477 */
478 free(good_example); /* OK */
479 free(bad_example_1); /* undefined behavior */
480 } TX_FINALLY {
481 /*
482 * This section is not longjmp-safe on transaction abort only
483 */
484 free(bad_example_2); /* undefined behavior */
485 } TX_END
486
487 free(bad_example_3); /* undefined behavior */
488
489 Objects which are not volatile-qualified, are of automatic storage du‐
490 ration and have been changed between the invocations of setjmp(3) and
491 longjmp(3) (that also means within the work section of the transaction
492 after TX_BEGIN()) should not be used after a transaction abort, or
493 should be used with utmost care. This also includes code after the
494 TX_END macro.
495
496 libpmemobj(7) is not cancellation-safe. The pool will never be cor‐
497 rupted because of a canceled thread, but other threads may stall wait‐
498 ing on locks taken by that thread. If the application wants to use
499 pthread_cancel(3), it must disable cancellation before calling any
500 libpmemobj(7) APIs (see pthread_setcancelstate(3) with PTHREAD_CAN‐
501 CEL_DISABLE), and re-enable it afterwards. Deferring cancellation
502 (pthread_setcanceltype(3) with PTHREAD_CANCEL_DEFERRED) is not safe
503 enough, because libpmemobj(7) internally may call functions that are
504 specified as cancellation points in POSIX.
505
506 libpmemobj(7) relies on the library destructor being called from the
507 main thread. For this reason, all functions that might trigger de‐
508 struction (e.g. dlclose(3)) should be called in the main thread. Oth‐
509 erwise some of the resources associated with that thread might not be
510 cleaned up properly.
511
513 dlclose(3), longjmp(3), pmemobj_tx_add_range(3), pmemobj_tx_alloc(3),
514 pthread_setcancelstate(3), pthread_setcanceltype(3), setjmp(3), libp‐
515 memobj(7) and <https://pmem.io>
516
517
518
519PMDK - pmemobj API version 2.3 2020-01-31 PMEMOBJ_TX_BEGIN(3)