1BTRFS-BALANCE(8) BTRFS BTRFS-BALANCE(8)
2
3
4
6 btrfs-balance - balance block groups on a btrfs filesystem
7
9 btrfs balance <subcommand> <args>
10
12 The primary purpose of the balance feature is to spread block groups
13 across all devices so they match constraints defined by the respective
14 profiles. See mkfs.btrfs(8) section PROFILES for more details. The
15 scope of the balancing process can be further tuned by use of filters
16 that can select the block groups to process. Balance works only on a
17 mounted filesystem. Extent sharing is preserved and reflinks are not
18 broken. Files are not defragmented nor recompressed, file extents are
19 preserved but the physical location on devices will change.
20
21 The balance operation is cancellable by the user. The on-disk state of
22 the filesystem is always consistent so an unexpected interruption (eg.
23 system crash, reboot) does not corrupt the filesystem. The progress of
24 the balance operation is temporarily stored as an internal state and
25 will be resumed upon mount, unless the mount option skip_balance is
26 specified.
27
28 WARNING:
29 Running balance without filters will take a lot of time as it basi‐
30 cally move data/metadata from the whole filesystem and needs to up‐
31 date all block pointers.
32
33 The filters can be used to perform following actions:
34
35 • convert block group profiles (filter convert)
36
37 • make block group usage more compact (filter usage)
38
39 • perform actions only on a given device (filters devid, drange)
40
41 The filters can be applied to a combination of block group types (data,
42 metadata, system). Note that changing only the system type needs the
43 force option. Otherwise system gets automatically converted whenever
44 metadata profile is converted.
45
46 When metadata redundancy is reduced (eg. from RAID1 to single) the
47 force option is also required and it is noted in system log.
48
49 NOTE:
50 The balance operation needs enough work space, ie. space that is
51 completely unused in the filesystem, otherwise this may lead to
52 ENOSPC reports. See the section ENOSPC for more details.
53
55 NOTE:
56 The balance subcommand also exists under the btrfs filesystem name‐
57 space. This still works for backward compatibility but is depre‐
58 cated and should not be used any more.
59
60 NOTE:
61 A short syntax btrfs balance <path> works due to backward compati‐
62 bility but is deprecated and should not be used any more. Use btrfs
63 balance start command instead.
64
66 Balancing operations are very IO intensive and can also be quite CPU
67 intensive, impacting other ongoing filesystem operations. Typically
68 large amounts of data are copied from one location to another, with
69 corresponding metadata updates.
70
71 Depending upon the block group layout, it can also be seek heavy. Per‐
72 formance on rotational devices is noticeably worse compared to SSDs or
73 fast arrays.
74
76 cancel <path>
77 cancels a running or paused balance, the command will block and
78 wait until the current blockgroup being processed completes
79
80 Since kernel 5.7 the response time of the cancellation is sig‐
81 nificantly improved, on older kernels it might take a long time
82 until currently processed chunk is completely finished.
83
84 pause <path>
85 pause running balance operation, this will store the state of
86 the balance progress and used filters to the filesystem
87
88 resume <path>
89 resume interrupted balance, the balance status must be stored on
90 the filesystem from previous run, eg. after it was paused or
91 forcibly interrupted and mounted again with skip_balance
92
93 start [options] <path>
94 start the balance operation according to the specified filters,
95 without any filters the data and metadata from the whole
96 filesystem are moved. The process runs in the foreground.
97
98 NOTE:
99 The balance command without filters will basically move ev‐
100 erything in the filesystem to a new physical location on de‐
101 vices (ie. it does not affect the logical properties of file
102 extents like offsets within files and extent sharing). The
103 run time is potentially very long, depending on the filesys‐
104 tem size. To prevent starting a full balance by accident, the
105 user is warned and has a few seconds to cancel the operation
106 before it starts. The warning and delay can be skipped with
107 --full-bauance option.
108
109 Please note that the filters must be written together with the
110 -d, -m and -s options, because they're optional and bare -d and
111 -m also work and mean no filters.
112
113 NOTE:
114 When the target profile for conversion filter is raid5 or
115 raid6, there's a safety timeout of 10 seconds to warn users
116 about the status of the feature
117
118 Options
119
120 -d[<filters>]
121 act on data block groups, see FILTERS section for details
122 about filters
123
124 -m[<filters>]
125 act on metadata chunks, see FILTERS section for details
126 about filters
127
128 -s[<filters>]
129 act on system chunks (requires -f), see FILTERS section
130 for details about filters.
131
132 -f force a reduction of metadata integrity, eg. when going
133 from raid1 to single, or skip safety timeout when the
134 target conversion profile is raid5 or raid6
135
136 --background|--bg
137 run the balance operation asynchronously in the back‐
138 ground, uses fork(2) to start the process that calls the
139 kernel ioctl
140
141 --enqueue
142 wait if there's another exclusive operation running, oth‐
143 erwise continue
144
145 -v (deprecated) alias for global '-v' option
146
147 status [-v] <path>
148 Show status of running or paused balance.
149
150 Options
151
152 -v (deprecated) alias for global -v option
153
155 From kernel 3.3 onwards, btrfs balance can limit its action to a subset
156 of the whole filesystem, and can be used to change the replication con‐
157 figuration (e.g. moving data from single to RAID1). This functionality
158 is accessed through the -d, -m or -s options to btrfs balance start,
159 which filter on data, metadata and system blocks respectively.
160
161 A filter has the following structure: type[=params][,type=...]
162
163 The available types are:
164
165 profiles=<profiles>
166 Balances only block groups with the given profiles. Parameters
167 are a list of profile names separated by "|" (pipe).
168
169 usage=<percent>, usage=<range>
170 Balances only block groups with usage under the given percent‐
171 age. The value of 0 is allowed and will clean up completely un‐
172 used block groups, this should not require any new work space
173 allocated. You may want to use usage=0 in case balance is re‐
174 turning ENOSPC and your filesystem is not too full.
175
176 The argument may be a single value or a range. The single value
177 N means at most N percent used, equivalent to ..N range syntax.
178 Kernels prior to 4.4 accept only the single value format. The
179 minimum range boundary is inclusive, maximum is exclusive.
180
181 devid=<id>
182 Balances only block groups which have at least one chunk on the
183 given device. To list devices with ids use btrfs filesystem
184 show.
185
186 drange=<range>
187 Balance only block groups which overlap with the given byte
188 range on any device. Use in conjunction with devid to filter on
189 a specific device. The parameter is a range specified as
190 start..end.
191
192 vrange=<range>
193 Balance only block groups which overlap with the given byte
194 range in the filesystem's internal virtual address space. This
195 is the address space that most reports from btrfs in the kernel
196 log use. The parameter is a range specified as start..end.
197
198 convert=<profile>
199 Convert each selected block group to the given profile name
200 identified by parameters.
201
202 NOTE:
203 Starting with kernel 4.5, the data chunks can be converted
204 to/from the DUP profile on a single device.
205
206 NOTE:
207 Starting with kernel 4.6, all profiles can be converted
208 to/from DUP on multi-device filesystems.
209
210 limit=<number>, limit=<range>
211 Process only given number of chunks, after all filters are ap‐
212 plied. This can be used to specifically target a chunk in con‐
213 nection with other filters (drange, vrange) or just simply limit
214 the amount of work done by a single balance run.
215
216 The argument may be a single value or a range. The single value
217 N means at most N chunks, equivalent to ..N range syntax. Ker‐
218 nels prior to 4.4 accept only the single value format. The
219 range minimum and maximum are inclusive.
220
221 stripes=<range>
222 Balance only block groups which have the given number of
223 stripes. The parameter is a range specified as start..end. Makes
224 sense for block group profiles that utilize striping, ie.
225 RAID0/10/5/6. The range minimum and maximum are inclusive.
226
227 soft Takes no parameters. Only has meaning when converting between
228 profiles. When doing convert from one profile to another and
229 soft mode is on, chunks that already have the target profile are
230 left untouched. This is useful e.g. when half of the filesystem
231 was converted earlier but got cancelled.
232
233 The soft mode switch is (like every other filter) per-type. For
234 example, this means that we can convert metadata chunks the
235 "hard" way while converting data chunks selectively with soft
236 switch.
237
238 Profile names, used in profiles and convert are one of: raid0, raid1,
239 raid1c3, raid1c4, raid10, raid5, raid6, dup, single. The mixed
240 data/metadata profiles can be converted in the same way, but it's con‐
241 version between mixed and non-mixed is not implemented. For the con‐
242 straints of the profiles please refer to mkfs.btrfs(8), section PRO‐
243 FILES.
244
246 The way balance operates, it usually needs to temporarily create a new
247 block group and move the old data there, before the old block group can
248 be removed. For that it needs the work space, otherwise it fails for
249 ENOSPC reasons. This is not the same ENOSPC as if the free space is
250 exhausted. This refers to the space on the level of block groups, which
251 are bigger parts of the filesystem that contain many file extents.
252
253 The free work space can be calculated from the output of the btrfs
254 filesystem show command:
255
256 Label: 'BTRFS' uuid: 8a9d72cd-ead3-469d-b371-9c7203276265
257 Total devices 2 FS bytes used 77.03GiB
258 devid 1 size 53.90GiB used 51.90GiB path /dev/sdc2
259 devid 2 size 53.90GiB used 51.90GiB path /dev/sde1
260
261 size - used = free work space
262
263 53.90GiB - 51.90GiB = 2.00GiB
264
265 An example of a filter that does not require workspace is usage=0. This
266 will scan through all unused block groups of a given type and will re‐
267 claim the space. After that it might be possible to run other filters.
268
269 CONVERSIONS ON MULTIPLE DEVICES
270
271 Conversion to profiles based on striping (RAID0, RAID5/6) require the
272 work space on each device. An interrupted balance may leave partially
273 filled block groups that consume the work space.
274
276 A more comprehensive example when going from one to multiple devices,
277 and back, can be found in section TYPICAL USECASES of btrfs-device(8).
278
279 MAKING BLOCK GROUP LAYOUT MORE COMPACT
280 The layout of block groups is not normally visible; most tools report
281 only summarized numbers of free or used space, but there are still some
282 hints provided.
283
284 Let's use the following real life example and start with the output:
285
286 $ btrfs filesystem df /path
287 Data, single: total=75.81GiB, used=64.44GiB
288 System, RAID1: total=32.00MiB, used=20.00KiB
289 Metadata, RAID1: total=15.87GiB, used=8.84GiB
290 GlobalReserve, single: total=512.00MiB, used=0.00B
291
292 Roughly calculating for data, 75G - 64G = 11G, the used/total ratio is
293 about 85%. How can we can interpret that:
294
295 • chunks are filled by 85% on average, ie. the usage filter with any‐
296 thing smaller than 85 will likely not affect anything
297
298 • in a more realistic scenario, the space is distributed unevenly, we
299 can assume there are completely used chunks and the remaining are
300 partially filled
301
302 Compacting the layout could be used on both. In the former case it
303 would spread data of a given chunk to the others and removing it. Here
304 we can estimate that roughly 850 MiB of data have to be moved (85% of a
305 1 GiB chunk).
306
307 In the latter case, targeting the partially used chunks will have to
308 move less data and thus will be faster. A typical filter command would
309 look like:
310
311 # btrfs balance start -dusage=50 /path
312 Done, had to relocate 2 out of 97 chunks
313
314 $ btrfs filesystem df /path
315 Data, single: total=74.03GiB, used=64.43GiB
316 System, RAID1: total=32.00MiB, used=20.00KiB
317 Metadata, RAID1: total=15.87GiB, used=8.84GiB
318 GlobalReserve, single: total=512.00MiB, used=0.00B
319
320 As you can see, the total amount of data is decreased by just 1 GiB,
321 which is an expected result. Let's see what will happen when we in‐
322 crease the estimated usage filter.
323
324 # btrfs balance start -dusage=85 /path
325 Done, had to relocate 13 out of 95 chunks
326
327 $ btrfs filesystem df /path
328 Data, single: total=68.03GiB, used=64.43GiB
329 System, RAID1: total=32.00MiB, used=20.00KiB
330 Metadata, RAID1: total=15.87GiB, used=8.85GiB
331 GlobalReserve, single: total=512.00MiB, used=0.00B
332
333 Now the used/total ratio is about 94% and we moved about 74G - 68G = 6G
334 of data to the remaining blockgroups, ie. the 6GiB are now free of
335 filesystem structures, and can be reused for new data or metadata block
336 groups.
337
338 We can do a similar exercise with the metadata block groups, but this
339 should not typically be necessary, unless the used/total ratio is re‐
340 ally off. Here the ratio is roughly 50% but the difference as an abso‐
341 lute number is "a few gigabytes", which can be considered normal for a
342 workload with snapshots or reflinks updated frequently.
343
344 # btrfs balance start -musage=50 /path
345 Done, had to relocate 4 out of 89 chunks
346
347 $ btrfs filesystem df /path
348 Data, single: total=68.03GiB, used=64.43GiB
349 System, RAID1: total=32.00MiB, used=20.00KiB
350 Metadata, RAID1: total=14.87GiB, used=8.85GiB
351 GlobalReserve, single: total=512.00MiB, used=0.00B
352
353 Just 1 GiB decrease, which possibly means there are block groups with
354 good utilization. Making the metadata layout more compact would in turn
355 require updating more metadata structures, ie. lots of IO. As running
356 out of metadata space is a more severe problem, it's not necessary to
357 keep the utilization ratio too high. For the purpose of this example,
358 let's see the effects of further compaction:
359
360 # btrfs balance start -musage=70 /path
361 Done, had to relocate 13 out of 88 chunks
362
363 $ btrfs filesystem df .
364 Data, single: total=68.03GiB, used=64.43GiB
365 System, RAID1: total=32.00MiB, used=20.00KiB
366 Metadata, RAID1: total=11.97GiB, used=8.83GiB
367 GlobalReserve, single: total=512.00MiB, used=0.00B
368
369 GETTING RID OF COMPLETELY UNUSED BLOCK GROUPS
370 Normally the balance operation needs a work space, to temporarily move
371 the data before the old block groups gets removed. If there's no work
372 space, it ends with no space left.
373
374 There's a special case when the block groups are completely unused,
375 possibly left after removing lots of files or deleting snapshots. Re‐
376 moving empty block groups is automatic since 3.18. The same can be
377 achieved manually with a notable exception that this operation does not
378 require the work space. Thus it can be used to reclaim unused block
379 groups to make it available.
380
381 # btrfs balance start -dusage=0 /path
382
383 This should lead to decrease in the total numbers in the btrfs filesys‐
384 tem df output.
385
387 Unless indicated otherwise below, all btrfs balance subcommands return
388 a zero exit status if they succeed, and non zero in case of failure.
389
390 The pause, cancel, and resume subcommands exit with a status of 2 if
391 they fail because a balance operation was not running.
392
393 The status subcommand exits with a status of 0 if a balance operation
394 is not running, 1 if the command-line usage is incorrect or a balance
395 operation is still running, and 2 on other errors.
396
398 btrfs is part of btrfs-progs. Please refer to the btrfs wiki
399 http://btrfs.wiki.kernel.org for further details.
400
402 mkfs.btrfs(8), btrfs-device(8)
403
405 2022
406
407
408
409
4105.18 May 25, 2022 BTRFS-BALANCE(8)