1PMEMOBJ_CTL_GET(3) PMDK Programmer's Manual PMEMOBJ_CTL_GET(3)
2
3
4
6 pmemobj_ctl_get(), pmemobj_ctl_set(), pmemobj_ctl_exec() -- Query and
7 modify libpmemobj internal behavior
8
10 #include <libpmemobj.h>
11
12 int pmemobj_ctl_get(PMEMobjpool *pop, const char *name, void *arg); (EXPERIMENTAL)
13 int pmemobj_ctl_set(PMEMobjpool *pop, const char *name, void *arg); (EXPERIMENTAL)
14 int pmemobj_ctl_exec(PMEMobjpool *pop, const char *name, void *arg); (EXPERIMENTAL)
15
17 The pmemobj_ctl_get(), pmemobj_ctl_set() and pmemobj_ctl_exec() func‐
18 tions provide a uniform interface for querying and modifying the inter‐
19 nal behavior of libpmemobj through the control (CTL) namespace.
20
21 The CTL namespace is organized in a tree structure. Starting from the
22 root, each node can be either internal, containing other elements, or a
23 leaf. Internal nodes themselves can only contain other nodes and can‐
24 not be entry points. There are two types of those nodes: named and in‐
25 dexed. Named nodes have string identifiers. Indexed nodes represent
26 an abstract array index and have an associated string identifier. The
27 index itself is provided by the user. A collection of indexes present
28 on the path of an entry point is provided to the handler functions as
29 name and index pairs.
30
31 The name argument specifies an entry point as defined in the CTL names‐
32 pace specification. The entry point description specifies whether the
33 extra arg is required. Those two parameters together create a CTL
34 query. The pop argument is optional if the entry point resides in a
35 global namespace (i.e., is shared for all the pools). The functions
36 and the entry points are thread-safe unless indicated otherwise below.
37 If there are special conditions for calling an entry point, they are
38 explicitly stated in its description. The functions propagate the re‐
39 turn value of the entry point. If either name or arg is invalid, -1 is
40 returned.
41
42 Entry points are the leaves of the CTL namespace structure. Each entry
43 point can read from the internal state, write to the internal state,
44 exec a function or a combination of these operations.
45
46 The entry points are listed in the following format:
47
48 name | r(ead)w(rite)x(ecute) | global/- | read argument type | write
49 argument type | exec argument type | config argument type
50
51 description...
52
54 prefault.at_create | rw | global | int | int | - | boolean
55
56 If set, every page of the pool will be touched and written to when the
57 pool is created, in order to trigger page allocation and minimize the
58 performance impact of pagefaults. Affects only the pmemobj_create()
59 function.
60
61 Always returns 0.
62
63 prefault.at_open | rw | global | int | int | - | boolean
64
65 If set, every page of the pool will be touched and written to when the
66 pool is opened, in order to trigger page allocation and minimize the
67 performance impact of pagefaults. Affects only the pmemobj_open()
68 function.
69
70 Always returns 0.
71
72 tx.debug.skip_expensive_checks | rw | - | int | int | - | boolean
73
74 Turns off some expensive checks performed by the transaction module in
75 "debug" builds. Ignored in "release" builds.
76
77 tx.cache.size | rw | - | long long | long long | - | integer
78
79 Size in bytes of the transaction snapshot cache. In a larger cache the
80 frequency of persistent allocations is lower, but with higher fixed
81 cost.
82
83 This should be set to roughly the sum of sizes of the snapshotted re‐
84 gions in an average transaction in the pool.
85
86 This value must be a in a range between 0 and PMEMOBJ_MAX_ALLOC_SIZE.
87 If the current threshold is larger than the new cache size, the thresh‐
88 old will be made equal to the new size.
89
90 This entry point is not thread safe and should not be modified if there
91 are any transactions currently running.
92
93 Returns 0 if successful, -1 otherwise.
94
95 tx.cache.threshold | rw | - | long long | long long | - | integer
96
97 Threshold in bytes, below which snapshots will use the cache. All
98 larger snapshots will trigger a persistent allocation.
99
100 This value must be a in a range between 0 and tx.cache.size.
101
102 This entry point is not thread safe and should not be modified if there
103 are any transactions currently running.
104
105 Returns 0 if successful, -1 otherwise.
106
107 tx.post_commit.queue_depth | rw | - | int | int | - | integer
108
109 Controls the depth of the post-commit tasks queue. A post-commit task
110 is the collection of work items that need to be performed on the per‐
111 sistent state after a successfully completed transaction. This in‐
112 cludes freeing no longer needed objects and cleaning up various caches.
113 By default, this queue does not exist and the post-commit task is exe‐
114 cuted synchronously in the same thread that ran the transaction. By
115 changing this parameter, one can offload this task to a separate work‐
116 er. If the queue is full, the algorithm, instead of waiting, performs
117 the post-commit in the current thread.
118
119 The task is performed on a finite resource (lanes, of which there are
120 1024), and if the worker threads that process this queue are unable to
121 keep up with the demand, regular threads might start to block waiting
122 for that resource. This will happen if the queue depth value is too
123 large.
124
125 As a general rule, this value should be set to approximately 1024 minus
126 the average number of threads in the application (not counting the
127 post-commit workers); however, this may vary from workload to workload.
128
129 The queue depth value must also be a power of two.
130
131 This entry point is not thread-safe and must be called when no transac‐
132 tions are currently being executed.
133
134 Returns 0 if successful, -1 otherwise.
135
136 tx.post_commit.worker | r- | - | void * | - | - | -
137
138 The worker function launched in a thread to perform asynchronous pro‐
139 cessing of post-commit tasks. This function returns only after a stop
140 entry point is called. There may be many worker threads at a time. If
141 there is no work to be done, this function sleeps instead of polling.
142
143 Always returns 0.
144
145 tx.post_commit.stop | r- | - | void * | - | - | -
146
147 This function forces all the post-commit worker functions to exit and
148 return control back to the calling thread. This should be called be‐
149 fore the application terminates and the post commit worker threads need
150 to be shutdown.
151
152 After the invocation of this entry point, the post-commit task queue
153 can no longer be used. If worker threads must be restarted after a
154 stop, the tx.post_commit.queue_depth needs to be set again.
155
156 This entry point must be called when no transactions are currently be‐
157 ing executed.
158
159 Always returns 0.
160
161 heap.alloc_class.[class_id].desc | rw | - | struct pobj_alloc_class_de‐
162 sc | struct pobj_alloc_class_desc | - | integer, integer, string
163
164 Describes an allocation class. Allows one to create or view the inter‐
165 nal data structures of the allocator.
166
167 Creating custom allocation classes can be beneficial for both raw allo‐
168 cation throughput, scalability and, most importantly, fragmentation.
169 By carefully constructing allocation classes that match the application
170 workload, one can entirely eliminate external and internal fragmenta‐
171 tion. For example, it is possible to easily construct a slab-like al‐
172 location mechanism for any data structure.
173
174 The [class_id] is an index field. Only values between 0-254 are valid.
175 If setting an allocation class, but the class_id is already taken, the
176 function will return -1. The values between 0-127 are reserved for the
177 default allocation classes of the library and can be used only for
178 reading.
179
180 The recommended method for retrieving information about all allocation
181 classes is to call this entry point for all class ids between 0 and 254
182 and discard those results for which the function returns an error.
183
184 This entry point takes a complex argument.
185
186 struct pobj_alloc_class_desc {
187 size_t unit_size;
188 size_t alignment;
189 unsigned units_per_block;
190 enum pobj_header_type header_type;
191 unsigned class_id;
192 };
193
194 The first field, unit_size, is an 8-byte unsigned integer that defines
195 the allocation class size. While theoretically limited only by PMEMO‐
196 BJ_MAX_ALLOC_SIZE, for most workloads this value should be between 8
197 bytes and 2 megabytes.
198
199 The alignment field is currently unsupported and must be set to 0. All
200 objects have default alignment of 64 bytes, but the user data alignment
201 is affected by the size of the chosen header.
202
203 The units_per_block field defines how many units a single block of mem‐
204 ory contains. This value will be rounded up to match the internal size
205 of the block (256 kilobytes or a multiple thereof). For example, given
206 a class with a unit_size of 512 bytes and a units_per_block of 1000, a
207 single block of memory for that class will have 512 kilobytes. This is
208 relevant because the bigger the block size, the less frequently blocks
209 need to be fetched, resulting in lower contention on global heap state.
210 Keep in mind that object allocation is tracked in a bitmap with a lim‐
211 ited number of entries, making it inefficient to create allocation
212 classes smaller than 128 bytes.
213
214 The header_type field defines the header of objects from the allocation
215 class. There are three types:
216
217 · POBJ_HEADER_LEGACY, string value: legacy. Used for allocation class‐
218 es prior to version 1.3 of the library. Not recommended for use.
219 Incurs a 64 byte metadata overhead for every object. Fully supports
220 all features.
221
222 · POBJ_HEADER_COMPACT, string value: compact. Used as default for all
223 predefined allocation classes. Incurs a 16 byte metadata overhead
224 for every object. Fully supports all features.
225
226 · POBJ_HEADER_NONE, string value: none. Header type that incurs no
227 metadata overhead beyond a single bitmap entry. Can be used for very
228 small allocation classes or when objects must be adjacent to each
229 other. This header type does not support type numbers (type number
230 is always
231
232 0) or allocations that span more than one unit.
233
234 The class_id field is an optional, runtime-only variable that allows
235 the user to retrieve the identifier of the class. This will be equiva‐
236 lent to the provided [class_id]. This field cannot be set from a con‐
237 fig file.
238
239 The allocation classes are a runtime state of the library and must be
240 created after every open. It is highly recommended to use the configu‐
241 ration file to store the classes.
242
243 This structure is declared in the libpmemobj/ctl.h header file. Please
244 refer to this file for an in-depth explanation of the allocation class‐
245 es and relevant algorithms.
246
247 Allocation classes constructed in this way can be leveraged by explic‐
248 itly specifying the class using POBJ_CLASS_ID(id) flag in pmemo‐
249 bj_tx_xalloc()/pmemobj_xalloc() functions.
250
251 Example of a valid alloc class query string:
252
253 heap.alloc_class.128.desc=500,0,1000,compact
254
255 This query, if executed, will create an allocation class with an id of
256 128 that has a unit size of 500 bytes, has at least 1000 units per
257 block and uses a compact header.
258
259 For reading, function returns 0 if successful, if the allocation class
260 does not exist it sets the errno to ENOENT and returns -1;
261
262 For writing, function returns 0 if the allocation class has been suc‐
263 cessfully created, -1 otherwise.
264
265 heap.alloc_class.new.desc | -w | - | - | struct pobj_alloc_class_desc |
266 - | integer, integer, string
267
268 Same as heap.alloc_class.[class_id].desc, but instead of requiring the
269 user to provide the class_id, it automatically creates the allocation
270 class with the first available identifier.
271
272 This should be used when it's impossible to guarantee unique allocation
273 class naming in the application (e.g. when writing a library that uses
274 libpmemobj).
275
276 The required class identifier will be stored in the class_id field of
277 the struct pobj_alloc_class_desc.
278
279 This function returns 0 if the allocation class has been successfully
280 created, -1 otherwise.
281
282 stats.enabled | rw | - | int | int | - | boolean
283
284 Enables or disables runtime collection of statistics. Statistics are
285 not recalculated after enabling; any operations that occur between dis‐
286 abling and re-enabling will not be reflected in subsequent values.
287
288 Statistics are disabled by default. Enabling them may have non-trivial
289 performance impact.
290
291 Always returns 0.
292
293 stats.heap.curr_allocated | r- | - | int | - | - | -
294
295 Returns the number of bytes currently allocated in the heap. If sta‐
296 tistics were disabled at any time in the lifetime of the heap, this
297 value may be inaccurate.
298
299 heap.size.granularity | rw- | - | uint64_t | uint64_t | - | long long
300
301 Reads or modifies the granularity with which the heap grows when OOM.
302 Valid only if the poolset has been defined with directories.
303
304 A granularity of 0 specifies that the pool will not grow automatically.
305
306 This function returns 0 if the granularity value is 0, or is larger
307 than PMEMOBJ_MIN_PART, -1 otherwise.
308
309 heap.size.extend | --x | - | - | - | uint64_t | -
310
311 Extends the heap by the given size. Must be larger than PMEMO‐
312 BJ_MIN_PART.
313
314 This function returns 0 if successful, -1 otherwise.
315
317 In addition to direct function call, each write entry point can also be
318 set using two alternative methods.
319
320 The first method is to load a configuration directly from the PMEMO‐
321 BJ_CONF environment variable. A properly formatted ctl config string
322 is a single-line sequence of queries separated by ';':
323
324 query0;query1;...;queryN
325
326 A single query is constructed from the name of the ctl write entry
327 point and the argument, separated by '=':
328
329 entry_point=entry_point_argument
330
331 The entry point argument type is defined by the entry point itself, but
332 there are three predefined primitives:
333
334 *) integer: represented by a sequence of [0-9] characters that form
335 a single number.
336 *) boolean: represented by a single character: y/n/Y/N/0/1, each
337 corresponds to true or false. If the argument contains any
338 trailing characters, they are ignored.
339 *) string: a simple sequence of characters.
340
341 There are also complex argument types that are formed from the primi‐
342 tives separated by a ',':
343
344 first_arg,second_arg
345
346 In summary, a full configuration sequence looks like this:
347
348 (first_entry_point)=(arguments, ...);...;(last_entry_point)=(arguments, ...);
349
350 As an example, to set both prefault at_open and at_create variables:
351
352 PMEMOBJ_CONF="prefault.at_open=1;prefault.at_create=1"
353
354 The second method of loading an external configuration is to set the
355 PMEMOBJ_CONF_FILE environment variable to point to a file that contains
356 a sequence of ctl queries. The parsing rules are all the same, but the
357 file can also contain white-spaces and comments.
358
359 To create a comment, simply use '#' anywhere in a line and everything
360 afterwards, until a new line '', will be ignored.
361
362 An example configuration file:
363
364 #########################
365 # My pmemobj configuration
366 #########################
367 #
368 # Global settings:
369 prefault. # modify the behavior of pre-faulting
370 at_open = 1; # prefault when the pool is opened
371
372 prefault.
373 at_create = 0; # but don't prefault when it's created
374
375 # Per-pool settings:
376 # ...
377
379 libpmemobj(7) and <http://pmem.io>
380
381
382
383PMDK - pmemobj API version 2.3 2018-03-13 PMEMOBJ_CTL_GET(3)