1PMEMOBJ_TX_BEGIN(3) PMDK Programmer's Manual PMEMOBJ_TX_BEGIN(3)
2
3
4
6 pmemobj_tx_stage(),
7
8 pmemobj_tx_begin(), pmemobj_tx_lock(), pmemobj_tx_abort(), pmemo‐
9 bj_tx_commit(), pmemobj_tx_end(), pmemobj_tx_errno(), pmemo‐
10 bj_tx_process(),
11
12 TX_BEGIN_PARAM(), TX_BEGIN_CB(), TX_BEGIN(), TX_ONABORT, TX_ONCOMMIT,
13 TX_FINALLY, TX_END,
14
15 pmemobj_tx_log_append_buffer(), pmemobj_tx_log_auto_alloc(), pmemo‐
16 bj_tx_log_snapshots_max_size(), pmemobj_tx_log_intents_max_size() -
17 transactional object manipulation
18
20 #include <libpmemobj.h>
21
22 enum tx_stage pmemobj_tx_stage(void);
23
24 int pmemobj_tx_begin(PMEMobjpool *pop, jmp_buf *env, enum pobj_tx_param, ...);
25 int pmemobj_tx_lock(enum tx_lock lock_type, void *lockp);
26 void pmemobj_tx_abort(int errnum);
27 void pmemobj_tx_commit(void);
28 int pmemobj_tx_end(void);
29 int pmemobj_tx_errno(void);
30 void pmemobj_tx_process(void);
31
32 TX_BEGIN_PARAM(PMEMobjpool *pop, ...)
33 TX_BEGIN_CB(PMEMobjpool *pop, cb, arg, ...)
34 TX_BEGIN(PMEMobjpool *pop)
35 TX_ONABORT
36 TX_ONCOMMIT
37 TX_FINALLY
38 TX_END
39
40 int pmemobj_tx_log_append_buffer(enum pobj_log_type type, void *addr, size_t size);
41 int pmemobj_tx_log_auto_alloc(enum pobj_log_type type, int on_off);
42 size_t pmemobj_tx_log_snapshots_max_size(size_t *sizes, size_t nsizes);
43 size_t pmemobj_tx_log_intents_max_size(size_t nintents);
44
46 The non-transactional functions and macros described in pmemobj_al‐
47 loc(3), pmemobj_list_insert(3) and POBJ_LIST_HEAD(3) only guarantee the
48 atomicity of a single operation on an object. In case of more complex
49 changes involving multiple operations on an object, or allocation and
50 modification of multiple objects, data consistency and fail-safety may
51 be provided only by using atomic transactions.
52
53 A transaction is defined as series of operations on persistent memory
54 objects that either all occur, or nothing occurs. In particular, if
55 the execution of a transaction is interrupted by a power failure or a
56 system crash, it is guaranteed that after system restart, all the
57 changes made as a part of the uncompleted transaction will be rolled
58 back, restoring the consistent state of the memory pool from the moment
59 when the transaction was started.
60
61 Note that transactions do not provide atomicity with respect to other
62 threads. All the modifications performed within the transactions are
63 immediately visible to other threads. Therefore it is the responsibil‐
64 ity of the application to implement a proper thread synchronization
65 mechanism.
66
67 Each thread may have only one transaction open at a time, but that
68 transaction may be nested. Nested transactions are flattened. Commit‐
69 ting the nested transaction does not commit the outer transaction; how‐
70 ever, errors in the nested transaction are propagated up to the outer‐
71 most level, resulting in the interruption of the entire transaction.
72
73 Each transaction is visible only for the thread that started it. No
74 other threads can add operations, commit or abort the transaction ini‐
75 tiated by another thread. Multiple threads may have transactions open
76 on a given memory pool at the same time.
77
78 Please see the CAVEATS section below for known limitations of the
79 transactional API.
80
81 The pmemobj_tx_stage() function returns the current transaction stage
82 for a thread. Stages are changed only by the pmemobj_tx_*() functions.
83 Transaction stages are defined as follows:
84
85 · TX_STAGE_NONE - no open transaction in this thread
86
87 · TX_STAGE_WORK - transaction in progress
88
89 · TX_STAGE_ONCOMMIT - successfully committed
90
91 · TX_STAGE_ONABORT - starting the transaction failed or transaction
92 aborted
93
94 · TX_STAGE_FINALLY - ready for clean up
95
96 The pmemobj_tx_begin() function starts a new transaction in the current
97 thread. If called within an open transaction, it starts a nested
98 transaction. The caller may use the env argument to provide a pointer
99 to a calling environment to be restored in case of transaction abort.
100 This information must be provided by the caller using the setjmp(3)
101 macro.
102
103 A new transaction may be started only if the current stage is
104 TX_STAGE_NONE or TX_STAGE_WORK. If successful, the transaction stage
105 changes to TX_STAGE_WORK. Otherwise, the stage is changed to
106 TX_STAGE_ONABORT.
107
108 Optionally, a list of parameters for the transaction may be provided.
109 Each parameter consists of a type followed by a type-specific number of
110 values. Currently there are 4 types:
111
112 · TX_PARAM_NONE, used as a termination marker. No following value.
113
114 · TX_PARAM_MUTEX, followed by one value, a pmem-resident PMEMmutex
115
116 · TX_PARAM_RWLOCK, followed by one value, a pmem-resident PMEMrwlock
117
118 · TX_PARAM_CB, followed by two values: a callback function of type
119 pmemobj_tx_callback, and a void pointer
120
121 Using TX_PARAM_MUTEX or TX_PARAM_RWLOCK causes the specified lock to be
122 acquired at the beginning of the transaction. TX_PARAM_RWLOCK acquires
123 the lock for writing. It is guaranteed that pmemobj_tx_begin() will
124 acquire all locks prior to successful completion, and they will be held
125 by the current thread until the outermost transaction is finished.
126 Locks are taken in order from left to right. To avoid deadlocks, the
127 user is responsible for proper lock ordering.
128
129 TX_PARAM_CB registers the specified callback function to be executed at
130 each transaction stage. For TX_STAGE_WORK, the callback is executed
131 prior to commit. For all other stages, the callback is executed as the
132 first operation after a stage change. It will also be called after
133 each transaction; in this case the stage parameter will be set to
134 TX_STAGE_NONE. pmemobj_tx_callback must be compatible with:
135
136 void func(PMEMobjpool *pop, enum pobj_tx_stage stage, void *arg)
137
138 pop is a pool identifier used in pmemobj_tx_begin(), stage is a current
139 transaction stage and arg is the second parameter of TX_PARAM_CB.
140 Without considering transaction nesting, this mechanism can be consid‐
141 ered an alternative method for executing code between stages (instead
142 of TX_ONCOMMIT, TX_ONABORT, etc). However, there are 2 significant
143 differences when nested transactions are used:
144
145 · The registered function is executed only in the outermost transac‐
146 tion, even if registered in an inner transaction.
147
148 · There can be only one callback in the entire transaction, that is,
149 the callback cannot be changed in an inner transaction.
150
151 Note that TX_PARAM_CB does not replace the TX_ONCOMMIT, TX_ONABORT,
152 etc. macros. They can be used together: the callback will be executed
153 before a TX_ONCOMMIT, TX_ONABORT, etc. section.
154
155 TX_PARAM_CB can be used when the code dealing with transaction stage
156 changes is shared between multiple users or when it must be executed
157 only in the outer transaction. For example it can be very useful when
158 the application must synchronize persistent and transient state.
159
160 The pmemobj_tx_lock() function acquires the lock lockp of type
161 lock_type and adds it to the current transaction. lock_type may be
162 TX_LOCK_MUTEX or TX_LOCK_RWLOCK; lockp must be of type PMEMmutex or
163 PMEMrwlock, respectively. If lock_type is TX_LOCK_RWLOCK the lock is
164 acquired for writing. If the lock is not successfully acquired, the
165 function returns an error number. This function must be called during
166 TX_STAGE_WORK.
167
168 pmemobj_tx_abort() aborts the current transaction and causes a transi‐
169 tion to TX_STAGE_ONABORT. If errnum is equal to 0, the transaction er‐
170 ror code is set to ECANCELED; otherwise, it is set to errnum. This
171 function must be called during TX_STAGE_WORK.
172
173 The pmemobj_tx_commit() function commits the current open transaction
174 and causes a transition to TX_STAGE_ONCOMMIT. If called in the context
175 of the outermost transaction, all the changes may be considered as
176 durably written upon successful completion. This function must be
177 called during TX_STAGE_WORK.
178
179 The pmemobj_tx_end() function performs a cleanup of the current trans‐
180 action. If called in the context of the outermost transaction, it re‐
181 leases all the locks acquired by pmemobj_tx_begin() for outer and nest‐
182 ed transactions. If called in the context of a nested transaction, it
183 returns to the context of the outer transaction in TX_STAGE_WORK, with‐
184 out releasing any locks. The pmemobj_tx_end() function can be called
185 during TX_STAGE_NONE if transitioned to this stage using pmemo‐
186 bj_tx_process(). If not already in TX_STAGE_NONE, it causes the tran‐
187 sition to TX_STAGE_NONE. pmemobj_tx_end must always be called for each
188 pmemobj_tx_begin(), even if starting the transaction failed. This
189 function must not be called during TX_STAGE_WORK.
190
191 The pmemobj_tx_errno() function returns the error code of the last
192 transaction.
193
194 The pmemobj_tx_process() function performs the actions associated with
195 the current stage of the transaction, and makes the transition to the
196 next stage. It must be called in a transaction. The current stage
197 must always be obtained by a call to pmemobj_tx_stage(). pmemo‐
198 bj_tx_process() performs the following transitions in the transaction
199 stage flow:
200
201 · TX_STAGE_WORK -> TX_STAGE_ONCOMMIT
202
203 · TX_STAGE_ONABORT -> TX_STAGE_FINALLY
204
205 · TX_STAGE_ONCOMMIT -> TX_STAGE_FINALLY
206
207 · TX_STAGE_FINALLY -> TX_STAGE_NONE
208
209 · TX_STAGE_NONE -> TX_STAGE_NONE
210
211 pmemobj_tx_process() must not be called after calling pmemobj_tx_end()
212 for the outermost transaction.
213
214 In addition to the above API, libpmemobj(7) offers a more intuitive
215 method of building transactions using the set of macros described be‐
216 low. When using these macros, the complete transaction flow looks like
217 this:
218
219 TX_BEGIN(Pop) {
220 /* the actual transaction code goes here... */
221 } TX_ONCOMMIT {
222 /*
223 * optional - executed only if the above block
224 * successfully completes
225 */
226 } TX_ONABORT {
227 /*
228 * optional - executed only if starting the transaction fails,
229 * or if transaction is aborted by an error or a call to
230 * pmemobj_tx_abort()
231 */
232 } TX_FINALLY {
233 /*
234 * optional - if exists, it is executed after
235 * TX_ONCOMMIT or TX_ONABORT block
236 */
237 } TX_END /* mandatory */
238
239 TX_BEGIN_PARAM(PMEMobjpool *pop, ...)
240 TX_BEGIN_CB(PMEMobjpool *pop, cb, arg, ...)
241 TX_BEGIN(PMEMobjpool *pop)
242
243 The TX_BEGIN_PARAM(), TX_BEGIN_CB() and TX_BEGIN() macros start a new
244 transaction in the same way as pmemobj_tx_begin(), except that instead
245 of the environment buffer provided by a caller, they set up the local
246 jmp_buf buffer and use it to catch the transaction abort. The TX_BE‐
247 GIN() macro starts a transaction without any options. TX_BEGIN_PARAM
248 may be used when there is a need to acquire locks prior to starting a
249 transaction (such as for a multi-threaded program) or set up a transac‐
250 tion stage callback. TX_BEGIN_CB is just a wrapper around TX_BE‐
251 GIN_PARAM that validates the callback signature. (For compatibility
252 there is also a TX_BEGIN_LOCK macro, which is an alias for TX_BE‐
253 GIN_PARAM). Each of these macros must be followed by a block of code
254 with all the operations that are to be performed atomically.
255
256 The TX_ONABORT macro starts a block of code that will be executed only
257 if starting the transaction fails due to an error in pmemobj_tx_be‐
258 gin(), or if the transaction is aborted. This block is optional, but
259 in practice it should not be omitted. If it is desirable to crash the
260 application when a transaction aborts and there is no TX_ONABORT sec‐
261 tion, the application can define the POBJ_TX_CRASH_ON_NO_ONABORT macro
262 before inclusion of <libpmemobj.h>. This provides a default TX_ONABORT
263 section which just calls abort(3).
264
265 The TX_ONCOMMIT macro starts a block of code that will be executed only
266 if the transaction is successfully committed, which means that the exe‐
267 cution of code in the TX_BEGIN() block has not been interrupted by an
268 error or by a call to pmemobj_tx_abort(). This block is optional.
269
270 The TX_FINALLY macro starts a block of code that will be executed re‐
271 gardless of whether the transaction is committed or aborted. This
272 block is optional.
273
274 The TX_END macro cleans up and closes the transaction started by the
275 TX_BEGIN() / TX_BEGIN_PARAM() / TX_BEGIN_CB() macros. It is mandatory
276 to terminate each transaction with this macro. If the transaction was
277 aborted, errno is set appropriately.
278
279 TRANSACTION LOG TUNING
280 From libpmemobj implementation perspective there are two types of oper‐
281 ations in a transaction:
282
283 · snapshots, where action must be persisted immediately,
284
285 · intents, where action can be persisted at the transaction commit
286 phase
287
288 pmemobj_tx_add_range(3) and all its variants belong to the snapshots
289 group.
290
291 pmemobj_tx_alloc(3) (with its variants), pmemobj_tx_free(3), pmemo‐
292 bj_tx_realloc(3) (with its variants) and pmemobj_tx_publish(3) belong
293 to the intents group. Even though pmemobj_tx_alloc() allocates memory
294 immediately, it modifies only the runtime state and postpones persis‐
295 tent memory modifications to the commit phase. pmemobj_tx_free(3) can‐
296 not free the object immediately, because of possible transaction roll‐
297 back, so it postpones both the action and persistent memory modifica‐
298 tions to the commit phase. pmemobj_tx_realloc(3) is just a combination
299 of those two. pmemobj_tx_publish(3) postpones reservations and de‐
300 ferred frees to the commit phase.
301
302 Those two types of operations (snapshots and intents) require that
303 libpmemobj builds a persistent log of operations. Intent log (also
304 known as a “redo log”) is applied on commit and snapshot log (also
305 known as an “undo log”) is applied on abort.
306
307 When libpmemobj transaction starts, it's not possible to predict how
308 much persistent memory space will be needed for those logs. This means
309 that libpmemobj must internally allocate this space whenever it's need‐
310 ed. This has two downsides:
311
312 · when transaction snapshots a lot of memory or does a lot of alloca‐
313 tions, libpmemobj may need to do many internal allocations, which
314 must be freed when transaction ends, adding time overhead when big
315 transactions are frequent,
316
317 · transactions can start to fail due to not enough space for logs -
318 this can be especially problematic for transactions that want to
319 deallocate objects, as those might also fail
320
321 To solve both of these problems libpmemobj exposes the following func‐
322 tions:
323
324 · pmemobj_tx_log_append_buffer(),
325
326 · pmemobj_tx_log_auto_alloc()
327
328 pmemobj_tx_log_append_buffer() appends a given range of memory [addr,
329 addr + size) to the log type of the current transaction. type can be
330 one of the two values (with meanings described above):
331
332 · TX_LOG_TYPE_SNAPSHOT,
333
334 · TX_LOG_TYPE_INTENT
335
336 The range of memory must belong to the same pool the transaction is on
337 and must not be used by more than one thread at the same time. The
338 latter condition can be verified with tx.debug.verify_user_buffers ctl
339 (see pmemobj_ctl_get(3)).
340
341 pmemobj_tx_log_snapshots_max_size calculates the maximum size of a buf‐
342 fer which will be able to hold nsizes snapshots, each of size sizes[i].
343 Application should not expect this function to return the same value
344 between restarts. In future versions of libpmemobj this function can
345 return smaller (because of better accuracy or space optimizations) or
346 higher (because of higher alignment required for better performance)
347 value. This function is independent of transaction stage and can be
348 called both inside and outside of transaction. If the returned value S
349 is greater than PMEMOBJ_MAX_ALLOC_SIZE, the buffer should be split into
350 N chunks of size PMEMOBJ_MAX_ALLOC_SIZE, where N is equal to (S / PMEM‐
351 OBJ_MAX_ALLOC_SIZE) (rounded down) and the last chunk of size (S - (N *
352 PMEMOBJ_MAX_ALLOC_SIZE)).
353
354 pmemobj_tx_log_intents_max_size calculates the maximum size of a buffer
355 which will be able to hold nintents intents. Just like with pmemo‐
356 bj_tx_log_snapshots_max_size, application should not expect this func‐
357 tion to return the same value between restarts, for the same reasons.
358 This function is independent of transaction stage and can be called
359 both inside and outside of transaction.
360
361 pmemobj_tx_log_auto_alloc() disables (on_off set to 0) or enables
362 (on_off set to 1) automatic allocation of internal logs of given type.
363 It can be used to verify that the buffer set with pmemobj_tx_log_ap‐
364 pend_buffer() is big enough to hold the log, without reaching
365 out-of-space scenario.
366
368 The pmemobj_tx_stage() function returns the stage of the current trans‐
369 action stage for a thread.
370
371 On success, pmemobj_tx_begin() returns 0. Otherwise, an error number
372 is returned.
373
374 The pmemobj_tx_begin() and pmemobj_tx_lock() functions return zero if
375 lockp is successfully added to the transaction. Otherwise, an error
376 number is returned.
377
378 The pmemobj_tx_abort() and pmemobj_tx_commit() functions return no val‐
379 ue.
380
381 The pmemobj_tx_end() function returns 0 if the transaction was success‐
382 ful. Otherwise it returns the error code set by pmemobj_tx_abort().
383 Note that pmemobj_tx_abort() can be called internally by the library.
384
385 The pmemobj_tx_errno() function returns the error code of the last
386 transaction.
387
388 The pmemobj_tx_process() function returns no value.
389
390 On success, pmemobj_tx_log_append_buffer() returns 0. Otherwise, the
391 transaction is aborted and an error number is returned.
392
393 On success, pmemobj_tx_log_auto_alloc() returns 0. Otherwise, the
394 transaction is aborted and an error number is returned.
395
396 On success, pmemobj_tx_log_snapshots_max_size() returns size of the
397 buffer. On failure it returns SIZE_MAX and sets errno appropriately.
398
399 On success, pmemobj_tx_log_intents_max_size() returns size of the buf‐
400 fer. On failure it returns SIZE_MAX and sets errno appropriately.
401
403 Transaction flow control is governed by the setjmp(3) and longjmp(3)
404 macros, and they are used in both the macro and function flavors of the
405 API. The transaction will longjmp on transaction abort. This has one
406 major drawback, which is described in the ISO C standard subsection
407 7.13.2.1. It says that the values of objects of automatic storage du‐
408 ration that are local to the function containing the setjmp invocation
409 that do not have volatile-qualified type and have been changed between
410 the setjmp invocation and longjmp call are indeterminate.
411
412 The following example illustrates the issue described above.
413
414 int *bad_example_1 = (int *)0xBAADF00D;
415 int *bad_example_2 = (int *)0xBAADF00D;
416 int *bad_example_3 = (int *)0xBAADF00D;
417 int * volatile good_example = (int *)0xBAADF00D;
418
419 TX_BEGIN(pop) {
420 bad_example_1 = malloc(sizeof(int));
421 bad_example_2 = malloc(sizeof(int));
422 bad_example_3 = malloc(sizeof(int));
423 good_example = malloc(sizeof(int));
424
425 /* manual or library abort called here */
426 pmemobj_tx_abort(EINVAL);
427 } TX_ONCOMMIT {
428 /*
429 * This section is longjmp-safe
430 */
431 } TX_ONABORT {
432 /*
433 * This section is not longjmp-safe
434 */
435 free(good_example); /* OK */
436 free(bad_example_1); /* undefined behavior */
437 } TX_FINALLY {
438 /*
439 * This section is not longjmp-safe on transaction abort only
440 */
441 free(bad_example_2); /* undefined behavior */
442 } TX_END
443
444 free(bad_example_3); /* undefined behavior */
445
446 Objects which are not volatile-qualified, are of automatic storage du‐
447 ration and have been changed between the invocations of setjmp(3) and
448 longjmp(3) (that also means within the work section of the transaction
449 after TX_BEGIN()) should not be used after a transaction abort, or
450 should be used with utmost care. This also includes code after the
451 TX_END macro.
452
453 libpmemobj(7) is not cancellation-safe. The pool will never be cor‐
454 rupted because of a canceled thread, but other threads may stall wait‐
455 ing on locks taken by that thread. If the application wants to use
456 pthread_cancel(3), it must disable cancellation before calling any
457 libpmemobj(7) APIs (see pthread_setcancelstate(3) with PTHREAD_CAN‐
458 CEL_DISABLE), and re-enable it afterwards. Deferring cancellation
459 (pthread_setcanceltype(3) with PTHREAD_CANCEL_DEFERRED) is not safe
460 enough, because libpmemobj(7) internally may call functions that are
461 specified as cancellation points in POSIX.
462
463 libpmemobj(7) relies on the library destructor being called from the
464 main thread. For this reason, all functions that might trigger de‐
465 struction (e.g. dlclose(3)) should be called in the main thread. Oth‐
466 erwise some of the resources associated with that thread might not be
467 cleaned up properly.
468
470 dlclose(3), longjmp(3), pmemobj_tx_add_range(3), pmemobj_tx_alloc(3),
471 pthread_setcancelstate(3), pthread_setcanceltype(3), setjmp(3), libp‐
472 memobj(7) and <http://pmem.io>
473
474
475
476PMDK - pmemobj API version 2.3 2019-09-23 PMEMOBJ_TX_BEGIN(3)