1BTRFS-QUOTA(8) BTRFS BTRFS-QUOTA(8)
2
3
4
6 btrfs-quota - control the global quota status of a btrfs filesystem
7
9 btrfs quota <subcommand> <args>
10
12 The commands under btrfs quota are used to affect the global status of
13 quotas of a btrfs filesystem. The quota groups (qgroups) are managed by
14 the subcommand btrfs-qgroup(8).
15
16 NOTE:
17 Qgroups are different than the traditional user quotas and designed
18 to track shared and exclusive data per-subvolume. Please refer to
19 the section HIERARCHICAL QUOTA GROUP CONCEPTS for a detailed de‐
20 scription.
21
22 PERFORMANCE IMPLICATIONS
23 When quotas are activated, they affect all extent processing, which
24 takes a performance hit. Activation of qgroups is not recommended un‐
25 less the user intends to actually use them.
26
27 STABILITY STATUS
28 The qgroup implementation has turned out to be quite difficult as it
29 affects the core of the filesystem operation. Qgroup users have hit
30 various corner cases over time, such as incorrect accounting or system
31 instability. The situation is gradually improving and issues found and
32 fixed.
33
35 The concept of quota has a long-standing tradition in the Unix world.
36 Ever since computers allow multiple users to work simultaneously in one
37 filesystem, there is the need to prevent one user from using up the en‐
38 tire space. Every user should get his fair share of the available re‐
39 sources.
40
41 In case of files, the solution is quite straightforward. Each file has
42 an owner recorded along with it, and it has a size. Traditional quota
43 just restricts the total size of all files that are owned by a user.
44 The concept is quite flexible: if a user hits his quota limit, the ad‐
45 ministrator can raise it on the fly.
46
47 On the other hand, the traditional approach has only a poor solution to
48 restrict directories. At installation time, the harddisk can be parti‐
49 tioned so that every directory (e.g. /usr, /var, ...) that needs a
50 limit gets its own partition. The obvious problem is that those limits
51 cannot be changed without a reinstallation. The btrfs subvolume fea‐
52 ture builds a bridge. Subvolumes correspond in many ways to parti‐
53 tions, as every subvolume looks like its own filesystem. With subvol‐
54 ume quota, it is now possible to restrict each subvolume like a parti‐
55 tion, but keep the flexibility of quota. The space for each subvolume
56 can be expanded or restricted on the fly.
57
58 As subvolumes are the basis for snapshots, interesting questions arise
59 as to how to account used space in the presence of snapshots. If you
60 have a file shared between a subvolume and a snapshot, whom to account
61 the file to? The creator? Both? What if the file gets modified in the
62 snapshot, should only these changes be accounted to it? But wait, both
63 the snapshot and the subvolume belong to the same user home. I just
64 want to limit the total space used by both! But somebody else might not
65 want to charge the snapshots to the users.
66
67 Btrfs subvolume quota solves these problems by introducing groups of
68 subvolumes and let the user put limits on them. It is even possible to
69 have groups of groups. In the following, we refer to them as qgroups.
70
71 Each qgroup primarily tracks two numbers, the amount of total refer‐
72 enced space and the amount of exclusively referenced space.
73
74 referenced
75 space is the amount of data that can be reached from any of the
76 subvolumes contained in the qgroup, while
77
78 exclusive
79 is the amount of data where all references to this data can be
80 reached from within this qgroup.
81
82 Subvolume quota groups
83 The basic notion of the Subvolume Quota feature is the quota group,
84 short qgroup. Qgroups are notated as level/id, e.g. the qgroup 3/2 is
85 a qgroup of level 3. For level 0, the leading 0/ can be omitted.
86 Qgroups of level 0 get created automatically when a subvolume/snapshot
87 gets created. The ID of the qgroup corresponds to the ID of the sub‐
88 volume, so 0/5 is the qgroup for the root subvolume. For the btrfs
89 qgroup command, the path to the subvolume can also be used instead of
90 0/ID. For all higher levels, the ID can be chosen freely.
91
92 Each qgroup can contain a set of lower level qgroups, thus creating a
93 hierarchy of qgroups. Figure 1 shows an example qgroup tree.
94
95 +---+
96 |2/1|
97 +---+
98 / \
99 +---+/ \+---+
100 |1/1| |1/2|
101 +---+ +---+
102 / \ / \
103 +---+/ \+---+/ \+---+
104 qgroups |0/1| |0/2| |0/3|
105 +-+-+ +---+ +---+
106 | / \ / \
107 | / \ / \
108 | / \ / \
109 extents 1 2 3 4
110
111 Figure 1: Sample qgroup hierarchy
112
113 At the bottom, some extents are depicted showing which qgroups refer‐
114 ence which extents. It is important to understand the notion of refer‐
115 enced vs exclusive. In the example, qgroup 0/2 references extents 2
116 and 3, while 1/2 references extents 2-4, 2/1 references all extents.
117
118 On the other hand, extent 1 is exclusive to 0/1, extent 2 is exclusive
119 to 0/2, while extent 3 is neither exclusive to 0/2 nor to 0/3. But be‐
120 cause both references can be reached from 1/2, extent 3 is exclusive to
121 1/2. All extents are exclusive to 2/1.
122
123 So exclusive does not mean there is no other way to reach the extent,
124 but it does mean that if you delete all subvolumes contained in a
125 qgroup, the extent will get deleted.
126
127 Exclusive of a qgroup conveys the useful information how much space
128 will be freed in case all subvolumes of the qgroup get deleted.
129
130 All data extents are accounted this way. Metadata that belongs to a
131 specific subvolume (i.e. its filesystem tree) is also accounted.
132 Checksums and extent allocation information are not accounted.
133
134 In turn, the referenced count of a qgroup can be limited. All writes
135 beyond this limit will lead to a 'Quota Exceeded' error.
136
137 Inheritance
138 Things get a bit more complicated when new subvolumes or snapshots are
139 created. The case of (empty) subvolumes is still quite easy. If a
140 subvolume should be part of a qgroup, it has to be added to the qgroup
141 at creation time. To add it at a later time, it would be necessary to
142 at least rescan the full subvolume for a proper accounting.
143
144 Creation of a snapshot is the hard case. Obviously, the snapshot will
145 reference the exact amount of space as its source, and both source and
146 destination now have an exclusive count of 0 (the filesystem nodesize
147 to be precise, as the roots of the trees are not shared). But what
148 about qgroups of higher levels? If the qgroup contains both the source
149 and the destination, nothing changes. If the qgroup contains only the
150 source, it might lose some exclusive.
151
152 But how much? The tempting answer is, subtract all exclusive of the
153 source from the qgroup, but that is wrong, or at least not enough.
154 There could have been an extent that is referenced from the source and
155 another subvolume from that qgroup. This extent would have been exclu‐
156 sive to the qgroup, but not to the source subvolume. With the creation
157 of the snapshot, the qgroup would also lose this extent from its exclu‐
158 sive set.
159
160 So how can this problem be solved? In the instant the snapshot gets
161 created, we already have to know the correct exclusive count. We need
162 to have a second qgroup that contains all the subvolumes as the first
163 qgroup, except the subvolume we want to snapshot. The moment we create
164 the snapshot, the exclusive count from the second qgroup needs to be
165 copied to the first qgroup, as it represents the correct value. The
166 second qgroup is called a tracking qgroup. It is only there in case a
167 snapshot is needed.
168
169 Use cases
170 Below are some use cases that do not mean to be extensive. You can find
171 your own way how to integrate qgroups.
172
173 Single-user machine
174 Replacement for partitions. The simplest use case is to use qgroups as
175 simple replacement for partitions. Btrfs takes the disk as a whole,
176 and /, /usr, /var, etc. are created as subvolumes. As each subvolume
177 gets it own qgroup automatically, they can simply be restricted. No
178 hierarchy is needed for that.
179
180 Track usage of snapshots. When a snapshot is taken, a qgroup for it
181 will automatically be created with the correct values. Referenced will
182 show how much is in it, possibly shared with other subvolumes. Exclu‐
183 sive will be the amount of space that gets freed when the subvolume is
184 deleted.
185
186 Multi-user machine
187 Restricting homes. When you have several users on a machine, with home
188 directories probably under /home, you might want to restrict /home as a
189 whole, while restricting every user to an individual limit as well.
190 This is easily accomplished by creating a qgroup for /home , e.g. 1/1,
191 and assigning all user subvolumes to it. Restricting this qgroup will
192 limit /home, while every user subvolume can get its own (lower) limit.
193
194 Accounting snapshots to the user. Let's say the user is allowed to
195 create snapshots via some mechanism. It would only be fair to account
196 space used by the snapshots to the user. This does not mean the user
197 doubles his usage as soon as he takes a snapshot. Of course, files
198 that are present in his home and the snapshot should only be accounted
199 once. This can be accomplished by creating a qgroup for each user, say
200 1/UID. The user home and all snapshots are assigned to this qgroup.
201 Limiting it will extend the limit to all snapshots, counting files only
202 once. To limit /home as a whole, a higher level group 2/1 replacing
203 1/1 from the previous example is needed, with all user qgroups assigned
204 to it.
205
206 Do not account snapshots. On the other hand, when the snapshots get
207 created automatically, the user has no chance to control them, so the
208 space used by them should not be accounted to him. This is already the
209 case when creating snapshots in the example from the previous section.
210
211 Snapshots for backup purposes. This scenario is a mixture of the pre‐
212 vious two. The user can create snapshots, but some snapshots for
213 backup purposes are being created by the system. The user's snapshots
214 should be accounted to the user, not the system. The solution is simi‐
215 lar to the one from section Accounting snapshots to the user, but do
216 not assign system snapshots to user's qgroup.
217
218 Simple quotas (squota)
219 As detailed in this document, qgroups can handle many complex extent
220 sharing and unsharing scenarios while maintaining an accurate count of
221 exclusive and shared usage. However, this flexibility comes at a cost:
222 many of the computations are global, in the sense that we must count up
223 the number of trees referring to an extent after its references change.
224 This can slow down transaction commits and lead to unacceptable laten‐
225 cies, especially in cases where snapshots scale up.
226
227 To work around this limitation of qgroups, btrfs also supports a second
228 set of quota semantics: simple quotas or squotas. Squotas fully share
229 the qgroups API and hierarchical model, but do not track shared vs. ex‐
230 clusive usage. Instead, they account all extents to the subvolume that
231 first allocated it. With a bit of new bookkeeping, this allows all ac‐
232 counting decisions to be local to the allocation or freeing operation
233 that deals with the extents themselves, and fully avoids the complex
234 and costly back-reference resolutions.
235
236 Example
237
238 To illustrate the difference between squotas and qgroups, consider the
239 following basic example assuming a nodesize of 16KiB.
240
241 1. create subvolume 256
242
243 2. rack up 1GiB of data and metadata usage in 256
244
245 3. snapshot 256, creating subvolume 257
246
247 4. COW 512MiB of the data and metadata in 257
248
249 5. delete everything in 256
250
251 At each step, qgroups would have the following accounting:
252
253 1. 0/256: 16KiB excl 0 shared
254
255 2. 0/256: 1GiB excl 0 shared
256
257 3. 0/256: 0 excl 1GiB shared; 0/257: 0 excl 1GiB shared
258
259 4. 0/256: 512MiB excl 512MiB shared; 0/257: 512MiB excl 512MiB shared
260
261 5. 0/256: 16KiB excl 0 shared; 0/257: 1GiB excl 0 shared
262
263 Whereas under squotas, the accounting would look like:
264
265 1. 0/256: 16KiB excl 16KiB shared
266
267 2. 0/256: 1GiB excl 1GiB shared
268
269 3. 0/256: 1GiB excl 1GiB shared; 0/257: 16KiB excl 16KiB shared
270
271 4. 0/256: 1GiB excl 1GiB shared; 0/257: 512MiB excl 512MiB shared
272
273 5. 0/256: 512MiB excl 512MiB shared; 0/257: 512MiB excl 512MiB shared
274
275 Note that since the original snapshotted 512MiB are still referenced by
276 257, they cannot be freed from 256, even after 256 is emptied, or even
277 deleted.
278
279 Summary
280
281 If you want some of power and flexibility of quotas for tracking and
282 limiting subvolume usage, but want to avoid the performance penalty of
283 accurately tracking extent ownership life cycles, then squotas can be a
284 useful option.
285
286 Furthermore, squotas is targeted at use cases where the original extent
287 is immutable, like image snapshotting for container startup, in which
288 case we avoid these awkward scenarios where a subvolume is empty or
289 deleted but still has significant extents accounted to it. However, as
290 long as you are aware of the accounting semantics, they can handle mu‐
291 table original extents.
292
294 disable <path>
295 Disable subvolume quota support for a filesystem.
296
297 enable [options] <path>
298 Enable subvolume quota support for a filesystem. At this point
299 it's possible the two modes of accounting. The full means that
300 extent ownership by subvolumes will be tracked all the time,
301 simple will account everything to the first owner. See the sec‐
302 tion for more details.
303
304 Options
305
306 -s|--simple
307 use simple quotas (squotas) instead of full qgroup ac‐
308 counting
309
310 rescan [options] <path>
311 Trash all qgroup numbers and scan the metadata again with the
312 current config.
313
314 Options
315
316 -s|--status
317 show status of a running rescan operation.
318
319 -w|--wait
320 start rescan and wait for it to finish (can be already in
321 progress)
322
323 -W|--wait-norescan
324 wait for rescan to finish without starting it
325
327 btrfs quota returns a zero exit status if it succeeds. Non zero is re‐
328 turned in case of failure.
329
331 btrfs is part of btrfs-progs. Please refer to the documentation at
332 https://btrfs.readthedocs.io.
333
335 btrfs-qgroup(8), btrfs-subvolume(8), mkfs.btrfs(8)
336
337
338
339
3406.6.2 Nov 24, 2023 BTRFS-QUOTA(8)