btrfs-balance(8)

1BTRFS-BALANCE(8)                     BTRFS                    BTRFS-BALANCE(8)
2
3
4

NAME

6       btrfs-balance - balance block groups on a btrfs filesystem
7

SYNOPSIS

9       btrfs balance <subcommand> <args>
10

DESCRIPTION

12       The  primary  purpose  of the balance feature is to spread block groups
13       across all devices so they match constraints defined by the  respective
14       profiles.  See  mkfs.btrfs(8)  section  PROFILES for more details.  The
15       scope of the balancing process can be further tuned by use  of  filters
16       that  can  select  the block groups to process. Balance works only on a
17       mounted filesystem.  Extent sharing is preserved and reflinks  are  not
18       broken.   Files are not defragmented nor recompressed, file extents are
19       preserved but the physical location on devices will change.
20
21       The balance operation is cancellable by the user. The on-disk state  of
22       the filesystem is always consistent so an unexpected interruption (e.g.
23       system crash, reboot) does not corrupt the filesystem. The progress  of
24       the  balance  operation  is temporarily stored as an internal state and
25       will be resumed upon mount, unless the  mount  option  skip_balance  is
26       specified.
27
28       WARNING:
29          Running  balance without filters will take a lot of time as it basi‐
30          cally move data/metadata from the whole filesystem and needs to  up‐
31          date all block pointers.
32
33       The filters can be used to perform following actions:
34
35       • convert block group profiles (filter convert)
36
37       • make block group usage more compact  (filter usage)
38
39       • perform actions only on a given device (filters devid, drange)
40
41       The filters can be applied to a combination of block group types (data,
42       metadata, system). Note that changing only the system  type  needs  the
43       force  option.  Otherwise  system gets automatically converted whenever
44       metadata profile is converted.
45
46       When metadata redundancy is reduced (e.g. from  RAID1  to  single)  the
47       force option is also required and it is noted in system log.
48
49       NOTE:
50          The  balance  operation  needs enough work space, i.e. space that is
51          completely unused in the filesystem,  otherwise  this  may  lead  to
52          ENOSPC reports.  See the section ENOSPC for more details.
53

COMPATIBILITY

55       NOTE:
56          The  balance subcommand also exists under the btrfs filesystem name‐
57          space.  This still works for backward compatibility  but  is  depre‐
58          cated and should not be used any more.
59
60       NOTE:
61          A  short  syntax btrfs balance <path> works due to backward compati‐
62          bility but is deprecated and should not be used any more. Use  btrfs
63          balance start command instead.
64

PERFORMANCE IMPLICATIONS

66       Balancing  operations  are  very IO intensive and can also be quite CPU
67       intensive, impacting other  ongoing  filesystem  operations.  Typically
68       large  amounts  of  data  are copied from one location to another, with
69       corresponding metadata updates.
70
71       Depending upon the block group layout, it can also be seek heavy.  Per‐
72       formance  on rotational devices is noticeably worse compared to SSDs or
73       fast arrays.
74

SUBCOMMAND

76       cancel <path>
77              cancels a running or paused balance, the command will block  and
78              wait until the current block group being processed completes
79
80              Since  kernel  5.7 the response time of the cancellation is sig‐
81              nificantly improved, on older kernels it might take a long  time
82              until currently processed chunk is completely finished.
83
84       pause <path>
85              pause  running  balance  operation, this will store the state of
86              the balance progress and used filters to the filesystem
87
88       resume <path>
89              resume interrupted balance, the balance status must be stored on
90              the  filesystem  from  previous run, e.g. after it was paused or
91              forcibly interrupted and mounted again with skip_balance
92
93       start [options] <path>
94              start the balance operation according to the specified  filters,
95              without  any  filters  the  data  and  metadata  from  the whole
96              filesystem are moved. The process runs in the foreground.
97
98              NOTE:
99                 The balance command without filters will basically  move  ev‐
100                 erything  in the filesystem to a new physical location on de‐
101                 vices (i.e. it does not affect the logical properties of file
102                 extents  like  offsets within files and extent sharing).  The
103                 run time is potentially very long, depending on the  filesys‐
104                 tem size. To prevent starting a full balance by accident, the
105                 user is warned and has a few seconds to cancel the  operation
106                 before  it starts.  The warning and delay can be skipped with
107                 --full-balance option.
108
109              Please note that the filters must be written together  with  the
110              -d,  -m and -s options, because they're optional and bare -d and
111              -m also work and mean no filters.
112
113              NOTE:
114                 When the target profile for conversion  filter  is  raid5  or
115                 raid6,  there's  a safety timeout of 10 seconds to warn users
116                 about the status of the feature
117
118              Options
119
120              -d[<filters>]
121                     act on data block groups, see section FILTERS for details
122                     about filters
123
124              -m[<filters>]
125                     act  on  metadata  chunks,  see FILTERS for details about
126                     filters
127
128              -s[<filters>]
129                     act on system chunks (requires -f), see FILTERS  for  de‐
130                     tails about filters.
131
132              -f     force  a reduction of metadata integrity, e.g. when going
133                     from raid1 to single, or skip  safety  timeout  when  the
134                     target conversion profile is raid5 or raid6
135
136              --background|--bg
137                     run  the  balance  operation  asynchronously in the back‐
138                     ground, uses fork(2) to start the process that calls  the
139                     kernel ioctl
140
141              --enqueue
142                     wait if there's another exclusive operation running, oth‐
143                     erwise continue
144
145              -v     (deprecated) alias for global '-v' option
146
147       status [-v] <path>
148              Show status of running or paused balance.
149
150              Options
151
152              -v     (deprecated) alias for global -v option
153

FILTERS

155       From kernel 3.3 onwards, BTRFS balance can limit its action to a subset
156       of the whole filesystem, and can be used to change the replication con‐
157       figuration (e.g.  convert data from single to RAID1).
158
159       Balance can be limited to a block group profile with the following  op‐
160       tions:
161
162       • -d for data block groups
163
164       • -m for metadata block groups (also implicitly applies to -s)
165
166       • -s for system block groups
167
168       The  options  have an optional parameter which means that the parameter
169       must start right after the option without a space  (this  is  mandatory
170       getopt  syntax), like -dusage=10. Options for all block group types can
171       be specified in one command.
172
173       A filter has the following structure: filter[=params][,filter=...]
174
175       To combine multiple filters use  ,,  without  spaces.  Example:  -dcon‐
176       vert=raid1,soft
177
178       BTRFS  can  have different profiles on a single device or the same pro‐
179       file on multiple device.
180
181       The main reason why you want to have different profiles  for  data  and
182       metadata  is to provide additional protection of the filesystem's meta‐
183       data when devices fail, since a single sector of unrecoverable metadata
184       will  break  the  filesystem, while a single sector of lost data can be
185       trivially recovered by deleting the broken file.
186
187       Before changing profiles, make sure there is enough  unallocated  space
188       on existing drives to create new metadata block groups (for filesystems
189       over 50GiB, this is 1GB * (number_of_devices + 2)).
190
191       Default profiles on BTRFS are:
192
193       • data: single
194
195       •
196
197         metadata:
198
199                • single devices: dup
200
201                • multiple devices: raid1
202
203       The available filter types are:
204
205   Filter types
206       profiles=<profiles>
207              Balances only block groups with the given  profiles.  Parameters
208              are a list of profile names separated by | (pipe).
209
210       usage=<percent>, usage=<range>
211              Balances  only  block groups with usage under the given percent‐
212              age. The value of 0 is allowed and will clean up completely  un‐
213              used  block  groups,  this should not require any new work space
214              allocated. You may want to use usage=0 in case  balance  is  re‐
215              turning ENOSPC and your filesystem is not too full.
216
217              The  argument may be a single value or a range. The single value
218              N means at most N percent used, equivalent to ..N range  syntax.
219              Kernels  prior  to 4.4 accept only the single value format.  The
220              minimum range boundary is inclusive, maximum is exclusive.
221
222       devid=<id>
223              Balances only block groups which have at least one chunk on  the
224              given  device.  To  list  devices  with ids use btrfs filesystem
225              show.
226
227       drange=<range>
228              Balance only block groups which  overlap  with  the  given  byte
229              range  on any device. Use in conjunction with devid to filter on
230              a specific  device.  The  parameter  is  a  range  specified  as
231              start..end.
232
233       vrange=<range>
234              Balance  only  block  groups  which  overlap with the given byte
235              range in the filesystem's internal virtual address  space.  This
236              is  the address space that most reports from btrfs in the kernel
237              log use. The parameter is a range specified as start..end.
238
239       convert=<profile>
240              Convert each selected block group  to  the  given  profile  name
241              identified by parameters.
242
243              NOTE:
244                 Starting  with  kernel  4.5, the data chunks can be converted
245                 to/from the DUP profile on a single device.
246
247              NOTE:
248                 Starting with kernel  4.6,  all  profiles  can  be  converted
249                 to/from DUP on multi-device filesystems.
250
251       limit=<number>, limit=<range>
252              Process  only  given number of chunks, after all filters are ap‐
253              plied. This can be used to specifically target a chunk  in  con‐
254              nection with other filters (drange, vrange) or just simply limit
255              the amount of work done by a single balance run.
256
257              The argument may be a single value or a range. The single  value
258              N  means  at most N chunks, equivalent to ..N range syntax. Ker‐
259              nels prior to 4.4 accept only  the  single  value  format.   The
260              range minimum and maximum are inclusive.
261
262       stripes=<range>
263              Balance  only  block  groups  which  have  the  given  number of
264              stripes. The parameter is a range specified as start..end. Makes
265              sense  for  block  group  profiles  that  utilize striping, i.e.
266              RAID0/10/5/6.  The range minimum and maximum are inclusive.
267
268       soft   Takes no parameters. Only has meaning  when  converting  between
269              profiles,  or When doing convert from one profile to another and
270              soft mode is on, chunks that already have the target profile are
271              left untouched.  This is useful e.g. when half of the filesystem
272              was converted earlier but got cancelled.
273
274              The soft mode switch is (like every other filter) per-type.  For
275              example,  this  means  that  we  can convert metadata chunks the
276              "hard" way while converting data chunks  selectively  with  soft
277              switch.
278
279       Profile names, used in profiles and convert are one of:
280
281       • raid0
282
283       • raid1
284
285       • raid1c3
286
287       • raid1c4
288
289       • raid10
290
291       • raid5
292
293       • raid6
294
295       • dup
296
297       • single
298
299       The  mixed data/metadata profiles can be converted in the same way, but
300       conversion between mixed and non-mixed is not implemented. For the con‐
301       straints   of  the  profiles  please  refer  to  mkfs.btrfs(8)  section
302       PROFILES.
303

ENOSPC

305       The way balance operates, it usually needs to temporarily create a  new
306       block group and move the old data there, before the old block group can
307       be removed.  For that it needs the work space, otherwise it  fails  for
308       ENOSPC  reasons.   This  is not the same ENOSPC as if the free space is
309       exhausted. This refers to the space on the level of block groups, which
310       are bigger parts of the filesystem that contain many file extents.
311
312       The  free  work  space  can  be calculated from the output of the btrfs
313       filesystem show command:
314
315          Label: 'BTRFS'  uuid: 8a9d72cd-ead3-469d-b371-9c7203276265
316                  Total devices 2 FS bytes used 77.03GiB
317                  devid    1 size 53.90GiB used 51.90GiB path /dev/sdc2
318                  devid    2 size 53.90GiB used 51.90GiB path /dev/sde1
319
320       size - used = free work space
321
322       53.90GiB - 51.90GiB = 2.00GiB
323
324       An example of a filter that does not require workspace is usage=0. This
325       will  scan through all unused block groups of a given type and will re‐
326       claim the space. After that it might be possible to run other filters.
327
328       CONVERSIONS ON MULTIPLE DEVICES
329
330       Conversion to profiles based on striping (RAID0, RAID5/6)  require  the
331       work  space  on each device. An interrupted balance may leave partially
332       filled block groups that consume the work space.
333

EXAMPLES

335       A more comprehensive example when going from one to  multiple  devices,
336       and back, can be found in section TYPICAL USECASES of btrfs-device(8).
337
338   MAKING BLOCK GROUP LAYOUT MORE COMPACT
339       The  layout  of block groups is not normally visible; most tools report
340       only summarized numbers of free or used space, but there are still some
341       hints provided.
342
343       Let's use the following real life example and start with the output:
344
345          $ btrfs filesystem df /path
346          Data, single: total=75.81GiB, used=64.44GiB
347          System, RAID1: total=32.00MiB, used=20.00KiB
348          Metadata, RAID1: total=15.87GiB, used=8.84GiB
349          GlobalReserve, single: total=512.00MiB, used=0.00B
350
351       Roughly  calculating for data, 75G - 64G = 11G, the used/total ratio is
352       about 85%. How can we can interpret that:
353
354       • chunks are filled by 85% on average, i.e. the usage filter with  any‐
355         thing smaller than 85 will likely not affect anything
356
357       • in  a  more realistic scenario, the space is distributed unevenly, we
358         can assume there are completely used chunks  and  the  remaining  are
359         partially filled
360
361       Compacting  the  layout  could  be  used on both. In the former case it
362       would spread data of a given chunk to the others and removing it.  Here
363       we can estimate that roughly 850 MiB of data have to be moved (85% of a
364       1 GiB chunk).
365
366       In the latter case, targeting the partially used chunks  will  have  to
367       move  less data and thus will be faster. A typical filter command would
368       look like:
369
370          # btrfs balance start -dusage=50 /path
371          Done, had to relocate 2 out of 97 chunks
372
373          $ btrfs filesystem df /path
374          Data, single: total=74.03GiB, used=64.43GiB
375          System, RAID1: total=32.00MiB, used=20.00KiB
376          Metadata, RAID1: total=15.87GiB, used=8.84GiB
377          GlobalReserve, single: total=512.00MiB, used=0.00B
378
379       As you can see, the total amount of data is decreased by  just  1  GiB,
380       which  is  an  expected  result. Let's see what will happen when we in‐
381       crease the estimated usage filter.
382
383          # btrfs balance start -dusage=85 /path
384          Done, had to relocate 13 out of 95 chunks
385
386          $ btrfs filesystem df /path
387          Data, single: total=68.03GiB, used=64.43GiB
388          System, RAID1: total=32.00MiB, used=20.00KiB
389          Metadata, RAID1: total=15.87GiB, used=8.85GiB
390          GlobalReserve, single: total=512.00MiB, used=0.00B
391
392       Now the used/total ratio is about 94% and we moved about 74G - 68G = 6G
393       of  data  to  the remaining block groups, i.e. the 6GiB are now free of
394       filesystem structures, and can be reused for new data or metadata block
395       groups.
396
397       We  can  do a similar exercise with the metadata block groups, but this
398       should not typically be necessary, unless the used/total ratio  is  re‐
399       ally  off. Here the ratio is roughly 50% but the difference as an abso‐
400       lute number is "a few gigabytes", which can be considered normal for  a
401       workload with snapshots or reflinks updated frequently.
402
403          # btrfs balance start -musage=50 /path
404          Done, had to relocate 4 out of 89 chunks
405
406          $ btrfs filesystem df /path
407          Data, single: total=68.03GiB, used=64.43GiB
408          System, RAID1: total=32.00MiB, used=20.00KiB
409          Metadata, RAID1: total=14.87GiB, used=8.85GiB
410          GlobalReserve, single: total=512.00MiB, used=0.00B
411
412       Just  1  GiB decrease, which possibly means there are block groups with
413       good utilization. Making the metadata layout more compact would in turn
414       require  updating more metadata structures, i.e. lots of IO. As running
415       out of metadata space is a more severe problem, it's not  necessary  to
416       keep  the  utilization ratio too high. For the purpose of this example,
417       let's see the effects of further compaction:
418
419          # btrfs balance start -musage=70 /path
420          Done, had to relocate 13 out of 88 chunks
421
422          $ btrfs filesystem df .
423          Data, single: total=68.03GiB, used=64.43GiB
424          System, RAID1: total=32.00MiB, used=20.00KiB
425          Metadata, RAID1: total=11.97GiB, used=8.83GiB
426          GlobalReserve, single: total=512.00MiB, used=0.00B
427
428   GETTING RID OF COMPLETELY UNUSED BLOCK GROUPS
429       Normally the balance operation needs a work space, to temporarily  move
430       the  data  before the old block groups gets removed. If there's no work
431       space, it ends with no space left.
432
433       There's a special case when the block  groups  are  completely  unused,
434       possibly  left  after removing lots of files or deleting snapshots. Re‐
435       moving empty block groups is automatic since  3.18.  The  same  can  be
436       achieved manually with a notable exception that this operation does not
437       require the work space. Thus it can be used  to  reclaim  unused  block
438       groups to make it available.
439
440          # btrfs balance start -dusage=0 /path
441
442       This should lead to decrease in the total numbers in the btrfs filesys‐
443       tem df output.
444

EXIT STATUS

446       Unless indicated otherwise below, all btrfs balance subcommands  return
447       a zero exit status if they succeed, and non zero in case of failure.
448
449       The  pause,  cancel,  and resume subcommands exit with a status of 2 if
450       they fail because a balance operation was not running.
451
452       The status subcommand exits with a status of 0 if a  balance  operation
453       is  not  running, 1 if the command-line usage is incorrect or a balance
454       operation is still running, and 2 on other errors.
455

AVAILABILITY

457       btrfs is part of btrfs-progs.  Please refer  to  the  documentation  at
458       https://btrfs.readthedocs.io.
459