1LVMCACHE(7) LVMCACHE(7)
2
3
4
6 lvmcache — LVM caching
7
9 lvm(8) includes two kinds of caching that can be used to improve the
10 performance of a Logical Volume (LV). When caching, varying subsets of
11 an LV's data are temporarily stored on a smaller, faster device (e.g.
12 an SSD) to improve the performance of the LV.
13
14 To do this with lvm, a new special LV is first created from the faster
15 device. This LV will hold the cache. Then, the new fast LV is attached
16 to the main LV by way of an lvconvert command. lvconvert inserts one of
17 the device mapper caching targets into the main LV's i/o path. The de‐
18 vice mapper target combines the main LV and fast LV into a hybrid de‐
19 vice that looks like the main LV, but has better performance. While the
20 main LV is being used, portions of its data will be temporarily and
21 transparently stored on the special fast LV.
22
23 The two kinds of caching are:
24
25 • A read and write hot-spot cache, using the dm-cache kernel module.
26 This cache tracks access patterns and adjusts its content deliber‐
27 ately so that commonly used parts of the main LV are likely to be
28 found on the fast storage. LVM refers to this using the LV type
29 cache.
30
31 • A write cache, using the dm-writecache kernel module. This cache can
32 be used with SSD or PMEM devices to speed up all writes to the main
33 LV. Data read from the main LV is not stored in the cache, only newly
34 written data. LVM refers to this using the LV type writecache.
35
37 1. Identify main LV that needs caching
38 The main LV may already exist, and is located on larger, slower de‐
39 vices. A main LV would be created with a command like:
40
41 # lvcreate -n main -L Size vg /dev/slow_hhd
42
43 2. Identify fast LV to use as the cache
44 A fast LV is created using one or more fast devices, like an SSD. This
45 special LV will be used to hold the cache:
46
47 # lvcreate -n fast -L Size vg /dev/fast_ssd
48
49 # lvs -a
50 LV Attr Type Devices
51 fast -wi------- linear /dev/fast_ssd
52 main -wi------- linear /dev/slow_hhd
53
54 3. Start caching the main LV
55 To start caching the main LV, convert the main LV to the desired
56 caching type, and specify the fast LV to use as the cache:
57
58 using dm-cache (with cachepool):
59
60 # lvconvert --type cache --cachepool fast vg/main
61
62 using dm-cache (with cachevol):
63
64 # lvconvert --type cache --cachevol fast vg/main
65
66 using dm-writecache (with cachevol):
67
68 # lvconvert --type writecache --cachevol fast vg/main
69
70 For more alteratives see:
71 dm-cache command shortcut
72 dm-cache with separate data and metadata LVs
73
74 4. Display LVs
75 Once the fast LV has been attached to the main LV, lvm reports the main
76 LV type as either cache or writecache depending on the type used.
77 While attached, the fast LV is hidden, and renamed with a _cvol or
78 _cpool suffix. It is displayed by lvs -a. The _corig or _wcorig LV
79 represents the original LV without the cache.
80
81 using dm-cache (with cachepool):
82
83 # lvs -ao+devices
84 LV Pool Type Devices
85 main [fast_cpool] cache main_corig(0)
86 [fast_cpool] cache-pool fast_pool_cdata(0)
87 [fast_cpool_cdata] linear /dev/fast_ssd
88 [fast_cpool_cmeta] linear /dev/fast_ssd
89 [main_corig] linear /dev/slow_hhd
90
91 using dm-cache (with cachevol):
92
93 # lvs -ao+devices
94
95 LV Pool Type Devices
96 main [fast_cvol] cache main_corig(0)
97 [fast_cvol] linear /dev/fast_ssd
98 [main_corig] linear /dev/slow_hhd
99
100 using dm-writecache (with cachevol):
101
102 # lvs -ao+devices
103
104 LV Pool Type Devices
105 main [fast_cvol] writecache main_wcorig(0)
106 [fast_cvol] linear /dev/fast_ssd
107 [main_wcorig] linear /dev/slow_hhd
108
109 5. Use the main LV
110 Use the LV until the cache is no longer wanted, or needs to be changed.
111
112 6. Stop caching
113 To stop caching the main LV and also remove unneeded cache pool, use
114 the --uncache:
115
116 # lvconvert --uncache vg/main
117
118 # lvs -a
119 LV VG Attr Type Devices
120 main vg -wi------- linear /dev/slow_hhd
121
122 To stop caching the main LV, separate the fast LV from the main LV.
123 This changes the type of the main LV back to what it was before the
124 cache was attached.
125
126 # lvconvert --splitcache vg/main
127
128 # lvs -a
129 LV VG Attr Type Devices
130 fast vg -wi------- linear /dev/fast_ssd
131 main vg -wi------- linear /dev/slow_hhd
132
133 7. Create a new LV with caching
134 A new LV can be created with caching attached at the time of creation
135 using the following command:
136
137 # lvcreate --type cache|writecache -n Name -L Size
138 --cachedevice /dev/fast_ssd vg /dev/slow_hhd
139
140 The main LV is created with the specified Name and Size from the
141 slow_hhd. A hidden fast LV is created on the fast_ssd and is then at‐
142 tached to the new main LV. If the fast_ssd is unused, the entire disk
143 will be used as the cache unless the --cachesize option is used to
144 specify a size for the fast LV. The --cachedevice option can be re‐
145 peated to use multiple disks for the fast LV.
146
148 option args
149 --cachepool CachePoolLV|LV
150
151 Pass this option a cachepool LV or a standard LV. When using a cache
152 pool, lvm places cache data and cache metadata on different LVs. The
153 two LVs together are called a cache pool. This has a bit better per‐
154 formance for dm-cache and permits specific placement and segment type
155 selection for data and metadata volumes. A cache pool is represented
156 as a special type of LV that cannot be used directly. If a standard LV
157 is passed with this option, lvm will first convert it to a cache pool
158 by combining it with another LV to use for metadata. This option can
159 be used with dm-cache.
160
161 --cachevol LV
162
163 Pass this option a fast LV that should be used to hold the cache. With
164 a cachevol, cache data and metadata are stored in different parts of
165 the same fast LV. This option can be used with dm-writecache or dm-
166 cache.
167
168 --cachedevice PV
169
170 This option can be used in place of --cachevol, in which case a
171 cachevol LV will be created using the specified device. This option
172 can be repeated to create a cachevol using multiple devices, or a tag
173 name can be specified in which case the cachevol will be created using
174 any of the devices with the given tag. If a named cache device is un‐
175 used, the entire device will be used to create the cachevol. To create
176 a cachevol of a specific size from the cache devices, include the
177 --cachesize option.
178
179 dm-cache block size
180 A cache pool will have a logical block size of 4096 bytes if it is cre‐
181 ated on a device with a logical block size of 4096 bytes.
182
183 If a main LV has logical block size 512 (with an existing xfs file sys‐
184 tem using that size), then it cannot use a cache pool with a 4096 logi‐
185 cal block size. If the cache pool is attached, the main LV will likely
186 fail to mount.
187
188 To avoid this problem, use a mkfs option to specify a 4096 block size
189 for the file system, or attach the cache pool before running mkfs.
190
191 dm-writecache block size
192 The dm-writecache block size can be 4096 bytes (the default), or 512
193 bytes. The default 4096 has better performance and should be used ex‐
194 cept when 512 is necessary for compatibility. The dm-writecache block
195 size is specified with --cachesettings block_size=4096|512 when caching
196 is started.
197
198 When a file system like xfs already exists on the main LV prior to
199 caching, and the file system is using a block size of 512, then the
200 writecache block size should be set to 512. (The file system will
201 likely fail to mount if writecache block size of 4096 is used in this
202 case.)
203
204 Check the xfs sector size while the fs is mounted:
205
206 # xfs_info /dev/vg/main
207 Look for sectsz=512 or sectsz=4096
208
209 The writecache block size should be chosen to match the xfs sectsz
210 value.
211
212 It is also possible to specify a sector size of 4096 to mkfs.xfs when
213 creating the file system. In this case the writecache block size of
214 4096 can be used.
215
216 The writecache block size is displayed by the command:
217 lvs -o writecacheblocksize VG/LV
218
219 dm-writecache memory usage
220 The amount of main system memory used by dm-writecache can be a factor
221 when selecting the writecache cachevol size and the writecache block
222 size.
223
224 • writecache block size 4096: each 100 GiB of writecache cachevol uses
225 slighly over 2 GiB of system memory.
226
227 • writecache block size 512: each 100 GiB of writecache cachevol uses a
228 little over 16 GiB of system memory.
229
230 dm-writecache settings
231 To specify dm-writecache tunable settings on the command line, use:
232 --cachesettings 'option=N' or
233 --cachesettings 'option1=N option2=N ...'
234
235 For example, --cachesettings 'high_watermark=90 writeback_jobs=4'.
236
237 To include settings when caching is started, run:
238
239 # lvconvert --type writecache --cachevol fast \
240 --cachesettings 'option=N' vg/main
241
242 To change settings for an existing writecache, run:
243
244 # lvchange --cachesettings 'option=N' vg/main
245
246 To clear all settings that have been applied, run:
247
248 # lvchange --cachesettings '' vg/main
249
250 To view the settings that are applied to a writecache LV, run:
251
252 # lvs -o cachesettings vg/main
253
254 Tunable settings are:
255
256 high_watermark = <percent>
257 Start writeback when the writecache usage reaches this percent
258 (0-100).
259
260 low_watermark = <percent>
261 Stop writeback when the writecache usage reaches this percent
262 (0-100).
263
264 writeback_jobs = <count>
265 Limit the number of blocks that are in flight during writeback.
266 Setting this value reduces writeback throughput, but it may im‐
267 prove latency of read requests.
268
269 autocommit_blocks = <count>
270 When the application writes this amount of blocks without issu‐
271 ing the FLUSH request, the blocks are automatically commited.
272
273 autocommit_time = <milliseconds>
274 The data is automatically commited if this time passes and no
275 FLUSH request is received.
276
277 fua = 0|1
278 Use the FUA flag when writing data from persistent memory back
279 to the underlying device. Applicable only to persistent memory.
280
281 nofua = 0|1
282 Don't use the FUA flag when writing back data and send the FLUSH
283 request afterwards. Some underlying devices perform better with
284 fua, some with nofua. Testing is necessary to determine which.
285 Applicable only to persistent memory.
286
287 cleaner = 0|1
288 Setting cleaner=1 enables the writecache cleaner mode in which
289 data is gradually flushed from the cache. If this is done prior
290 to detaching the writecache, then the splitcache command will
291 have little or no flushing to perform. If not done beforehand,
292 the splitcache command enables the cleaner mode and waits for
293 flushing to complete before detaching the writecache. Adding
294 cleaner=0 to the splitcache command will skip the cleaner mode,
295 and any required flushing is performed in device suspend.
296
297 max_age = <milliseconds>
298 Specifies the maximum age of a block in milliseconds. If a block
299 is stored in the cache for too long, it will be written to the
300 underlying device and cleaned up.
301
302 metadata_only = 0|1
303 Only metadata is promoted to the cache. This option improves
304 performance for heavier REQ_META workloads.
305
306 pause_writeback = <milliseconds>
307 Pause writeback if there was some write I/O redirected to the
308 origin volume in the last number of milliseconds.
309
310
311 dm-writecache using metadata profiles
312 In addition to specifying writecache settings on the command line, they
313 can also be set in lvm.conf, or in a profile file, using the alloca‐
314 tion/cache_settings/writecache config structure shown below.
315
316 It's possible to prepare a number of different profile files in the
317 /etc/lvm/profile directory and specify the file name of the profile
318 when starting writecache.
319
320 Example
321 # cat <<EOF > /etc/lvm/profile/cache_writecache.profile
322 allocation {
323 cache_settings {
324 writecache {
325 high_watermark=60
326 writeback_jobs=1024
327 }
328 }
329 }
330 EOF
331
332 # lvcreate -an -L10G --name fast vg /dev/fast_ssd
333 # lvcreate --type writecache -L10G --name main --cachevol fast \
334 --metadataprofile cache_writecache vg /dev/slow_hdd
335
336 dm-cache with separate data and metadata LVs
337 Preferred way of using dm-cache is to place the cache metadata and
338 cache data on separate LVs. To do this, a "cache pool" is created,
339 which is a special LV that references two sub LVs, one for data and one
340 for metadata.
341
342 To create a cache pool of given data size and let lvm2 calculate appro‐
343 priate metadata size:
344
345 # lvcreate --type cache-pool -L DataSize -n fast vg /dev/fast_ssd1
346
347 To create a cache pool from separate LV and let lvm2 calculate appro‐
348 priate cache metadata size:
349
350 # lvcreate -n fast -L DataSize vg /dev/fast_ssd1
351 # lvconvert --type cache-pool vg/fast /dev/fast_ssd1
352
353 To create a cache pool from two separate LVs:
354
355 # lvcreate -n fast -L DataSize vg /dev/fast_ssd1
356 # lvcreate -n fastmeta -L MetadataSize vg /dev/fast_ssd2
357 # lvconvert --type cache-pool --poolmetadata fastmeta vg/fast
358
359 Then use the cache pool LV to start caching the main LV:
360
361 # lvconvert --type cache --cachepool fast vg/main
362
363 A variation of the same procedure automatically creates a cache pool
364 when caching is started. To do this, use a standard LV as the
365 --cachepool (this will hold cache data), and use another standard LV as
366 the --poolmetadata (this will hold cache metadata). LVM will create a
367 cache pool LV from the two specified LVs, and use the cache pool to
368 start caching the main LV.
369
370 # lvcreate -n fast -L DataSize vg /dev/fast_ssd1
371 # lvcreate -n fastmeta -L MetadataSize vg /dev/fast_ssd2
372 # lvconvert --type cache --cachepool fast \
373 --poolmetadata fastmeta vg/main
374
375 dm-cache cache modes
376 The default dm-cache cache mode is "writethrough". Writethrough en‐
377 sures that any data written will be stored both in the cache and on the
378 origin LV. The loss of a device associated with the cache in this case
379 would not mean the loss of any data.
380
381 A second cache mode is "writeback". Writeback delays writing data
382 blocks from the cache back to the origin LV. This mode will increase
383 performance, but the loss of a cache device can result in lost data.
384
385 With the --cachemode option, the cache mode can be set when caching is
386 started, or changed on an LV that is already cached. The current cache
387 mode can be displayed with the cache_mode reporting option:
388
389 lvs -o+cache_mode VG/LV
390
391 lvm.conf(5) allocation/cache_mode
392 defines the default cache mode.
393
394 # lvconvert --type cache --cachemode writethrough \
395 --cachepool fast vg/main
396
397 # lvconvert --type cache --cachemode writethrough \
398 --cachevol fast vg/main
399
400 dm-cache chunk size
401 The size of data blocks managed by dm-cache can be specified with the
402 --chunksize option when caching is started. The default unit is KiB.
403 The value must be a multiple of 32 KiB between 32 KiB and 1 GiB. Cache
404 chunks bigger then 512KiB shall be only used when necessary.
405
406 Using a chunk size that is too large can result in wasteful use of the
407 cache, in which small reads and writes cause large sections of an LV to
408 be stored in the cache. It can also require increasing migration
409 threshold which defaults to 2048 sectors (1 MiB). Lvm2 ensures migra‐
410 tion threshold is at least 8 chunks in size. This may in some cases re‐
411 sult in very high bandwidth load of transfering data between the cache
412 LV and its cache origin LV. However, choosing a chunk size that is too
413 small can result in more overhead trying to manage the numerous chunks
414 that become mapped into the cache. Overhead can include both excessive
415 CPU time searching for chunks, and excessive memory tracking chunks.
416
417 Command to display the chunk size:
418
419 lvs -o+chunksize VG/LV
420
421 lvm.conf(5) allocation/cache_pool_chunk_size
422
423 controls the default chunk size.
424
425 The default value is shown by:
426
427 lvmconfig --type default allocation/cache_pool_chunk_size
428
429 Checking migration threshold (in sectors) of running cached LV:
430 lvs -o+kernel_cache_settings VG/LV
431
432 dm-cache migration threshold
433 Migrating data between the origin and cache LV uses bandwidth. The
434 user can set a throttle to prevent more than a certain amount of migra‐
435 tion occurring at any one time. Currently dm-cache is not taking any
436 account of normal io traffic going to the devices.
437
438 User can set migration threshold via cache policy settings as "migra‐
439 tion_threshold=<#sectors>" to set the maximum number of sectors being
440 migrated, the default being 2048 sectors (1 MiB).
441
442 Command to set migration threshold to 2 MiB (4096 sectors):
443
444 lvcreate --cachepolicy 'migration_threshold=4096' VG/LV
445
446 Command to display the migration threshold:
447
448 lvs -o+kernel_cache_settings,cache_settings VG/LV
449 lvs -o+chunksize VG/LV
450
451 dm-cache cache policy
452 The dm-cache subsystem has additional per-LV parameters: the cache pol‐
453 icy to use, and possibly tunable parameters for the cache policy.
454 Three policies are currently available: "smq" is the default policy,
455 "mq" is an older implementation, and "cleaner" is used to force the
456 cache to write back (flush) all cached writes to the origin LV.
457
458 The older "mq" policy has a number of tunable parameters. The defaults
459 are chosen to be suitable for the majority of systems, but in special
460 circumstances, changing the settings can improve performance.
461
462 With the --cachepolicy and --cachesettings options, the cache policy
463 and settings can be set when caching is started, or changed on an ex‐
464 isting cached LV (both options can be used together). The current
465 cache policy and settings can be displayed with the cache_policy and
466 cache_settings reporting options:
467
468 lvs -o+cache_policy,cache_settings VG/LV
469
470 Change the cache policy and settings of an existing LV.
471 # lvchange --cachepolicy mq --cachesettings \
472 'migration_threshold=2048 random_threshold=4' vg/main
473
474 lvm.conf(5) allocation/cache_policy
475 defines the default cache policy.
476
477 lvm.conf(5) allocation/cache_settings
478 defines the default cache settings.
479
480 dm-cache using metadata profiles
481 Cache pools allows to set a variety of options. Lots of these settings
482 can be specified in lvm.conf or profile settings. You can prepare a
483 number of different profiles in the /etc/lvm/profile directory and just
484 specify the metadata profile file name when caching LV or creating
485 cache-pool. Check the output of lvmconfig --type default --withcom‐
486 ments for a detailed description of all individual cache settings.
487
488 Example
489 # cat <<EOF > /etc/lvm/profile/cache_big_chunk.profile
490 allocation {
491 cache_pool_metadata_require_separate_pvs=0
492 cache_pool_chunk_size=512
493 cache_metadata_format=2
494 cache_mode="writethrough"
495 cache_policy="smq"
496 cache_settings {
497 smq {
498 migration_threshold=8192
499 random_threshold=4096
500 }
501 }
502 }
503 EOF
504
505 # lvcreate --cache -L10G --metadataprofile cache_big_chunk vg/main \
506 /dev/fast_ssd
507 # lvcreate --cache -L10G vg/main --config \
508 'allocation/cache_pool_chunk_size=512' /dev/fast_ssd
509
510 dm-cache spare metadata LV
511 See lvmthin(7) for a description of the "pool metadata spare" LV. The
512 same concept is used for cache pools.
513
514 dm-cache metadata formats
515 There are two disk formats for dm-cache metadata. The metadata format
516 can be specified with --cachemetadataformat when caching is started,
517 and cannot be changed. Format 2 has better performance; it is more
518 compact, and stores dirty bits in a separate btree, which improves the
519 speed of shutting down the cache. With auto, lvm selects the best op‐
520 tion provided by the current dm-cache kernel module.
521
522 RAID1 cache device
523 RAID1 can be used to create the fast LV holding the cache so that it
524 can tolerate a device failure. (When using dm-cache with separate data
525 and metadata LVs, each of the sub-LVs can use RAID1.)
526
527 # lvcreate -n main -L Size vg /dev/slow
528 # lvcreate --type raid1 -m 1 -n fast -L Size vg /dev/ssd1 /dev/ssd2
529 # lvconvert --type cache --cachevol fast vg/main
530
531 dm-cache command shortcut
532 A single command can be used to cache main LV with automatic creation
533 of a cache-pool:
534
535 # lvcreate --cache --size CacheDataSize VG/LV [FastPVs]
536
537 or the longer variant
538
539 # lvcreate --type cache --size CacheDataSize \
540 --name NameCachePool VG/LV [FastPVs]
541
542 In this command, the specified LV already exists, and is the main LV to
543 be cached. The command creates a new cache pool with size and given
544 name or the name is automatically selected from a sequence lvolX_cpool,
545 using the optionally specified fast PV(s) (typically an ssd). Then it
546 attaches the new cache pool to the existing main LV to begin caching.
547
548 (Note: ensure that the specified main LV is a standard LV. If a cache
549 pool LV is mistakenly specified, then the command does something dif‐
550 ferent.)
551
552 (Note: the type option is interpreted differently by this command than
553 by normal lvcreate commands in which --type specifies the type of the
554 newly created LV. In this case, an LV with type cache-pool is being
555 created, and the existing main LV is being converted to type cache.)
556
558 lvm.conf(5), lvchange(8), lvcreate(8), lvdisplay(8), lvextend(8),
559 lvremove(8), lvrename(8), lvresize(8), lvs(8),
560 vgchange(8), vgmerge(8), vgreduce(8), vgsplit(8),
561
562 cache_check(8), cache_dump(8), cache_repair(8)
563
564
565
566Red Hat, Inc LVM TOOLS 2.03.18(2)-git (2022-11-10) LVMCACHE(7)