1BTRFS-BALANCE(8)                 Btrfs Manual                 BTRFS-BALANCE(8)
2
3
4

NAME

6       btrfs-balance - balance block groups on a btrfs filesystem
7

SYNOPSIS

9       btrfs balance <subcommand> <args>
10

DESCRIPTION

12       The primary purpose of the balance feature is to spread block groups
13       across all devices so they match constraints defined by the respective
14       profiles. See mkfs.btrfs(8) section PROFILES for more details. The
15       scope of the balancing process can be further tuned by use of filters
16       that can select the block groups to process. Balance works only on a
17       mounted filesystem. Extent sharing is preserved and reflinks are not
18       broken. Files are not defragmented nor recompressed, file extents are
19       preserved but the physical location on devices will change.
20
21       The balance operation is cancellable by the user. The on-disk state of
22       the filesystem is always consistent so an unexpected interruption (eg.
23       system crash, reboot) does not corrupt the filesystem. The progress of
24       the balance operation is temporarily stored as an internal state and
25       will be resumed upon mount, unless the mount option skip_balance is
26       specified.
27
28           Warning
29           running balance without filters will take a lot of time as it
30           basically move data/metadata from the whol filesystem and needs to
31           update all block pointers.
32
33       The filters can be used to perform following actions:
34
35       •   convert block group profiles (filter convert)
36
37       •   make block group usage more compact (filter usage)
38
39       •   perform actions only on a given device (filters devid, drange)
40
41       The filters can be applied to a combination of block group types (data,
42       metadata, system). Note that changing only the system type needs the
43       force option. Otherwise system gets automatically converted whenever
44       metadata profile is converted.
45
46       When metadata redundancy is reduced (eg. from RAID1 to single) the
47       force option is also required and it is noted in system log.
48
49           Note
50           the balance operation needs enough work space, ie. space that is
51           completely unused in the filesystem, otherwise this may lead to
52           ENOSPC reports. See the section ENOSPC for more details.
53

COMPATIBILITY

55           Note
56           The balance subcommand also exists under the btrfs filesystem
57           namespace. This still works for backward compatibility but is
58           deprecated and should not be used any more.
59
60           Note
61           A short syntax btrfs balance <path> works due to backward
62           compatibility but is deprecated and should not be used any more.
63           Use btrfs balance start command instead.
64

PERFORMANCE IMPLICATIONS

66       Balancing operations are very IO intensive and can also be quite CPU
67       intensive, impacting other ongoing filesystem operations. Typically
68       large amounts of data are copied from one location to another, with
69       corresponding metadata updates.
70
71       Depending upon the block group layout, it can also be seek heavy.
72       Performance on rotational devices is noticeably worse compared to SSDs
73       or fast arrays.
74

SUBCOMMAND

76       cancel <path>
77           cancels a running or paused balance, the command will block and
78           wait until the current blockgroup being processed completes
79
80           Since kernel 5.7 the response time of the cancellation is
81           significantly improved, on older kernels it might take a long time
82           until currently processed chunk is completely finished.
83
84       pause <path>
85           pause running balance operation, this will store the state of the
86           balance progress and used filters to the filesystem
87
88       resume <path>
89           resume interrupted balance, the balance status must be stored on
90           the filesystem from previous run, eg. after it was paused or
91           forcibly interrupted and mounted again with skip_balance
92
93       start [options] <path>
94           start the balance operation according to the specified filters,
95           without any filters the data and metadata from the whole filesystem
96           are moved. The process runs in the foreground.
97
98               Note
99               the balance command without filters will basically move
100               everything in the filesystem to a new physical location on
101               devices (ie. it does not affect the logical properties of file
102               extents like offsets within files and extent sharing). The run
103               time is potentially very long, depending on the filesystem
104               size. To prevent starting a full balance by accident, the user
105               is warned and has a few seconds to cancel the operation before
106               it starts. The warning and delay can be skipped with
107               --full-balance option.
108           Please note that the filters must be written together with the -d,
109           -m and -s options, because they’re optional and bare -d and -m also
110           work and mean no filters.
111
112               Note
113               when the target profile for conversion filter is raid5 or
114               raid6, there’s a safety timeout of 10 seconds to warn users
115               about the status of the feature
116           Options
117
118           -d[<filters>]
119               act on data block groups, see FILTERS section for details about
120               filters
121
122           -m[<filters>]
123               act on metadata chunks, see FILTERS section for details about
124               filters
125
126           -s[<filters>]
127               act on system chunks (requires -f), see FILTERS section for
128               details about filters.
129
130           -f
131               force a reduction of metadata integrity, eg. when going from
132               raid1 to single, or skip safety timeout when the target
133               conversion profile is raid5 or raid6
134
135           --background|--bg
136               run the balance operation asynchronously in the background,
137               uses fork(2) to start the process that calls the kernel ioctl
138
139           --enqueue
140               wait if there’s another exclusive operation running, otherwise
141               continue
142
143           -v
144               (deprecated) alias for global -v option
145
146       status [-v] <path>
147           Show status of running or paused balance.
148
149           Options
150
151           -v
152               (deprecated) alias for global -v option
153

FILTERS

155       From kernel 3.3 onwards, btrfs balance can limit its action to a subset
156       of the whole filesystem, and can be used to change the replication
157       configuration (e.g. moving data from single to RAID1). This
158       functionality is accessed through the -d, -m or -s options to btrfs
159       balance start, which filter on data, metadata and system blocks
160       respectively.
161
162       A filter has the following structure: type[=params][,type=...]
163
164       The available types are:
165
166       profiles=<profiles>
167           Balances only block groups with the given profiles. Parameters are
168           a list of profile names separated by "|" (pipe).
169
170       usage=<percent>, usage=<range>
171           Balances only block groups with usage under the given percentage.
172           The value of 0 is allowed and will clean up completely unused block
173           groups, this should not require any new work space allocated. You
174           may want to use usage=0 in case balance is returning ENOSPC and
175           your filesystem is not too full.
176
177           The argument may be a single value or a range. The single value N
178           means at most N percent used, equivalent to ..N range syntax.
179           Kernels prior to 4.4 accept only the single value format. The
180           minimum range boundary is inclusive, maximum is exclusive.
181
182       devid=<id>
183           Balances only block groups which have at least one chunk on the
184           given device. To list devices with ids use btrfs filesystem show.
185
186       drange=<range>
187           Balance only block groups which overlap with the given byte range
188           on any device. Use in conjunction with devid to filter on a
189           specific device. The parameter is a range specified as start..end.
190
191       vrange=<range>
192           Balance only block groups which overlap with the given byte range
193           in the filesystem’s internal virtual address space. This is the
194           address space that most reports from btrfs in the kernel log use.
195           The parameter is a range specified as start..end.
196
197       convert=<profile>
198           Convert each selected block group to the given profile name
199           identified by parameters.
200
201               Note
202               starting with kernel 4.5, the data chunks can be converted
203               to/from the DUP profile on a single device.
204
205               Note
206               starting with kernel 4.6, all profiles can be converted to/from
207               DUP on multi-device filesystems.
208
209       limit=<number>, limit=<range>
210           Process only given number of chunks, after all filters are applied.
211           This can be used to specifically target a chunk in connection with
212           other filters (drange, vrange) or just simply limit the amount of
213           work done by a single balance run.
214
215           The argument may be a single value or a range. The single value N
216           means at most N chunks, equivalent to ..N range syntax. Kernels
217           prior to 4.4 accept only the single value format. The range minimum
218           and maximum are inclusive.
219
220       stripes=<range>
221           Balance only block groups which have the given number of stripes.
222           The parameter is a range specified as start..end. Makes sense for
223           block group profiles that utilize striping, ie. RAID0/10/5/6. The
224           range minimum and maximum are inclusive.
225
226       soft
227           Takes no parameters. Only has meaning when converting between
228           profiles. When doing convert from one profile to another and soft
229           mode is on, chunks that already have the target profile are left
230           untouched. This is useful e.g. when half of the filesystem was
231           converted earlier but got cancelled.
232
233           The soft mode switch is (like every other filter) per-type. For
234           example, this means that we can convert metadata chunks the "hard"
235           way while converting data chunks selectively with soft switch.
236
237       Profile names, used in profiles and convert are one of: raid0, raid1,
238       raid1c3, raid1c4, raid10, raid5, raid6, dup, single. The mixed
239       data/metadata profiles can be converted in the same way, but it’s
240       conversion between mixed and non-mixed is not implemented. For the
241       constraints of the profiles please refer to mkfs.btrfs(8), section
242       PROFILES.
243

ENOSPC

245       The way balance operates, it usually needs to temporarily create a new
246       block group and move the old data there, before the old block group can
247       be removed. For that it needs the work space, otherwise it fails for
248       ENOSPC reasons. This is not the same ENOSPC as if the free space is
249       exhausted. This refers to the space on the level of block groups, which
250       are bigger parts of the filesystem that contain many file extents.
251
252       The free work space can be calculated from the output of the btrfs
253       filesystem show command:
254
255              Label: 'BTRFS'  uuid: 8a9d72cd-ead3-469d-b371-9c7203276265
256                      Total devices 2 FS bytes used 77.03GiB
257                      devid    1 size 53.90GiB used 51.90GiB path /dev/sdc2
258                      devid    2 size 53.90GiB used 51.90GiB path /dev/sde1
259
260       size - used = free work space 53.90GiB - 51.90GiB = 2.00GiB
261
262       An example of a filter that does not require workspace is usage=0. This
263       will scan through all unused block groups of a given type and will
264       reclaim the space. After that it might be possible to run other
265       filters.
266
267       CONVERSIONS ON MULTIPLE DEVICES
268
269       Conversion to profiles based on striping (RAID0, RAID5/6) require the
270       work space on each device. An interrupted balance may leave partially
271       filled block groups that consume the work space.
272

EXAMPLES

274       A more comprehensive example when going from one to multiple devices,
275       and back, can be found in section TYPICAL USECASES of btrfs-device(8).
276
277   MAKING BLOCK GROUP LAYOUT MORE COMPACT
278       The layout of block groups is not normally visible; most tools report
279       only summarized numbers of free or used space, but there are still some
280       hints provided.
281
282       Let’s use the following real life example and start with the output:
283
284           $ btrfs filesystem df /path
285           Data, single: total=75.81GiB, used=64.44GiB
286           System, RAID1: total=32.00MiB, used=20.00KiB
287           Metadata, RAID1: total=15.87GiB, used=8.84GiB
288           GlobalReserve, single: total=512.00MiB, used=0.00B
289
290       Roughly calculating for data, 75G - 64G = 11G, the used/total ratio is
291       about 85%. How can we can interpret that:
292
293       •   chunks are filled by 85% on average, ie. the usage filter with
294           anything smaller than 85 will likely not affect anything
295
296       •   in a more realistic scenario, the space is distributed unevenly, we
297           can assume there are completely used chunks and the remaining are
298           partially filled
299
300       Compacting the layout could be used on both. In the former case it
301       would spread data of a given chunk to the others and removing it. Here
302       we can estimate that roughly 850 MiB of data have to be moved (85% of a
303       1 GiB chunk).
304
305       In the latter case, targeting the partially used chunks will have to
306       move less data and thus will be faster. A typical filter command would
307       look like:
308
309           # btrfs balance start -dusage=50 /path
310           Done, had to relocate 2 out of 97 chunks
311
312           $ btrfs filesystem df /path
313           Data, single: total=74.03GiB, used=64.43GiB
314           System, RAID1: total=32.00MiB, used=20.00KiB
315           Metadata, RAID1: total=15.87GiB, used=8.84GiB
316           GlobalReserve, single: total=512.00MiB, used=0.00B
317
318       As you can see, the total amount of data is decreased by just 1 GiB,
319       which is an expected result. Let’s see what will happen when we
320       increase the estimated usage filter.
321
322           # btrfs balance start -dusage=85 /path
323           Done, had to relocate 13 out of 95 chunks
324
325           $ btrfs filesystem df /path
326           Data, single: total=68.03GiB, used=64.43GiB
327           System, RAID1: total=32.00MiB, used=20.00KiB
328           Metadata, RAID1: total=15.87GiB, used=8.85GiB
329           GlobalReserve, single: total=512.00MiB, used=0.00B
330
331       Now the used/total ratio is about 94% and we moved about 74G - 68G = 6G
332       of data to the remaining blockgroups, ie. the 6GiB are now free of
333       filesystem structures, and can be reused for new data or metadata block
334       groups.
335
336       We can do a similar exercise with the metadata block groups, but this
337       should not typically be necessary, unless the used/total ratio is
338       really off. Here the ratio is roughly 50% but the difference as an
339       absolute number is "a few gigabytes", which can be considered normal
340       for a workload with snapshots or reflinks updated frequently.
341
342           # btrfs balance start -musage=50 /path
343           Done, had to relocate 4 out of 89 chunks
344
345           $ btrfs filesystem df /path
346           Data, single: total=68.03GiB, used=64.43GiB
347           System, RAID1: total=32.00MiB, used=20.00KiB
348           Metadata, RAID1: total=14.87GiB, used=8.85GiB
349           GlobalReserve, single: total=512.00MiB, used=0.00B
350
351       Just 1 GiB decrease, which possibly means there are block groups with
352       good utilization. Making the metadata layout more compact would in turn
353       require updating more metadata structures, ie. lots of IO. As running
354       out of metadata space is a more severe problem, it’s not necessary to
355       keep the utilization ratio too high. For the purpose of this example,
356       let’s see the effects of further compaction:
357
358           # btrfs balance start -musage=70 /path
359           Done, had to relocate 13 out of 88 chunks
360
361           $ btrfs filesystem df .
362           Data, single: total=68.03GiB, used=64.43GiB
363           System, RAID1: total=32.00MiB, used=20.00KiB
364           Metadata, RAID1: total=11.97GiB, used=8.83GiB
365           GlobalReserve, single: total=512.00MiB, used=0.00B
366
367   GETTING RID OF COMPLETELY UNUSED BLOCK GROUPS
368       Normally the balance operation needs a work space, to temporarily move
369       the data before the old block groups gets removed. If there’s no work
370       space, it ends with no space left.
371
372       There’s a special case when the block groups are completely unused,
373       possibly left after removing lots of files or deleting snapshots.
374       Removing empty block groups is automatic since 3.18. The same can be
375       achieved manually with a notable exception that this operation does not
376       require the work space. Thus it can be used to reclaim unused block
377       groups to make it available.
378
379           # btrfs balance start -dusage=0 /path
380
381       This should lead to decrease in the total numbers in the btrfs
382       filesystem df output.
383

EXIT STATUS

385       Unless indicated otherwise below, all btrfs balance subcommands return
386       a zero exit status if they succeed, and non zero in case of failure.
387
388       The pause, cancel, and resume subcommands exit with a status of 2 if
389       they fail because a balance operation was not running.
390
391       The status subcommand exits with a status of 0 if a balance operation
392       is not running, 1 if the command-line usage is incorrect or a balance
393       operation is still running, and 2 on other errors.
394

AVAILABILITY

396       btrfs is part of btrfs-progs. Please refer to the btrfs wiki
397       http://btrfs.wiki.kernel.org for further details.
398

SEE ALSO

400       mkfs.btrfs(8), btrfs-device(8)
401
402
403
404Btrfs v5.15.1                     11/22/2021                  BTRFS-BALANCE(8)
Impressum