1gfs2(5) File Formats Manual gfs2(5)
2
3
4
6 gfs2 - GFS2 reference guide
7
8
10 Overview of the GFS2 filesystem
11
12
14 GFS2 is a clustered filesystem, designed for sharing data between mul‐
15 tiple nodes connected to a common shared storage device. It can also be
16 used as a local filesystem on a single node, however since the design
17 is aimed at clusters, that will usually result in lower performance
18 than using a filesystem designed specifically for single node use.
19
20 GFS2 is a journaling filesystem and one journal is required for each
21 node that will mount the filesystem. The one exception to that is spec‐
22 tator mounts which are equivalent to mounting a read-only block device
23 and as such can neither recover a journal or write to the filesystem,
24 so do not require a journal assigned to them.
25
26
28 lockproto=LockProtoName
29 This specifies which inter-node lock protocol is used by the
30 GFS2 filesystem for this mount, overriding the default lock pro‐
31 tocol name stored in the filesystem's on-disk superblock.
32
33 The LockProtoName must be one of the supported locking proto‐
34 cols, currently these are lock_nolock and lock_dlm.
35
36 The default lock protocol name is written to disk initially when
37 creating the filesystem with mkfs.gfs2(8), -p option. It can be
38 changed on-disk by using the gfs2_tool(8) utility's sb proto
39 command.
40
41 The lockproto mount option should be used only under special
42 circumstances in which you want to temporarily use a different
43 lock protocol without changing the on-disk default. Using the
44 incorrect lock protocol on a cluster filesystem mounted from
45 more than one node will almost certainly result in filesystem
46 corruption.
47
48 locktable=LockTableName
49 This specifies the identity of the cluster and of the filesystem
50 for this mount, overriding the default cluster/filesystem iden‐
51 tify stored in the filesystem's on-disk superblock. The clus‐
52 ter/filesystem name is recognized globally throughout the clus‐
53 ter, and establishes a unique namespace for the inter-node lock‐
54 ing system, enabling the mounting of multiple GFS2 filesystems.
55
56 The format of LockTableName is lock-module-specific. For
57 lock_dlm, the format is clustername:fsname. For lock_nolock,
58 the field is ignored.
59
60 The default cluster/filesystem name is written to disk initially
61 when creating the filesystem with mkfs.gfs2(8), -t option. It
62 can be changed on-disk by using the gfs2_tool(8) utility's sb
63 table command.
64
65 The locktable mount option should be used only under special
66 circumstances in which you want to mount the filesystem in a
67 different cluster, or mount it as a different filesystem name,
68 without changing the on-disk default.
69
70 localflocks
71 This flag tells GFS2 that it is running as a local (not clus‐
72 tered) filesystem, so it can allow the kernel VFS layer to do
73 all flock and fcntl file locking. When running in cluster mode,
74 these file locks require inter-node locks, and require the sup‐
75 port of GFS2. When running locally, better performance is
76 achieved by letting VFS handle the whole job.
77
78 This is turned on automatically by the lock_nolock module.
79
80 errors=[panic|withdraw]
81 Setting errors=panic causes GFS2 to oops when encountering an
82 error that would otherwise cause the mount to withdraw or print
83 an assertion warning. The default setting is errors=withdraw.
84 This option should not be used in a production system. It
85 replaces the earlier debug option on kernel versions 2.6.31 and
86 above.
87
88 acl Enables POSIX Access Control List acl(5) support within GFS2.
89
90 spectator
91 Mount this filesystem using a special form of read-only mount.
92 The mount does not use one of the filesystem's journals. The
93 node is unable to recover journals for other nodes.
94
95 norecovery
96 A synonym for spectator
97
98 suiddir
99 Sets owner of any newly created file or directory to be that of
100 parent directory, if parent directory has S_ISUID permission
101 attribute bit set. Sets S_ISUID in any new directory, if its
102 parent directory's S_ISUID is set. Strips all execution bits on
103 a new file, if parent directory owner is different from owner of
104 process creating the file. Set this option only if you know why
105 you are setting it.
106
107 quota=[off/account/on]
108 Turns quotas on or off for a filesystem. Setting the quotas to
109 be in the "account" state causes the per UID/GID usage statis‐
110 tics to be correctly maintained by the filesystem, limit and
111 warn values are ignored. The default value is "off".
112
113 discard
114 Causes GFS2 to generate "discard" I/O requests for blocks which
115 have been freed. These can be used by suitable hardware to
116 implement thin-provisioning and similar schemes. This feature is
117 supported in kernel version 2.6.30 and above.
118
119 barrier
120 This option, which defaults to on, causes GFS2 to send I/O bar‐
121 riers when flushing the journal. The option is automatically
122 turned off if the underlying device does not support I/O barri‐
123 ers. We highly recommend the use of I/O barriers with GFS2 at
124 all times unless the block device is designed so that it cannot
125 lose its write cache content (e.g. its on a UPS, or it doesn't
126 have a write cache)
127
128 commit=secs
129 This is similar to the ext3 commit= option in that it sets the
130 maximum number of seconds between journal commits if there is
131 dirty data in the journal. The default is 60 seconds. This
132 option is only provided in kernel versions 2.6.31 and above.
133
134 data=[ordered|writeback]
135 When data=ordered is set, the user data modified by a transac‐
136 tion is flushed to the disk before the transaction is committed
137 to disk. This should prevent the user from seeing uninitialized
138 blocks in a file after a crash. Data=writeback mode writes the
139 user data to the disk at any time after it's dirtied. This
140 doesn't provide the same consistency guarantee as ordered mode,
141 but it should be slightly faster for some workloads. The
142 default is ordered mode.
143
144 meta This option results in selecting the meta filesystem root rather
145 than the normal filesystem root. This option is normally only
146 used by the GFS2 utility functions. Altering any file on the
147 GFS2 meta filesystem may render the filesystem unusable, so only
148 experts in the GFS2 on-disk layout should use this option.
149
150 quota_quantum=secs
151 This sets the number of seconds for which a change in the quota
152 information may sit on one node before being written to the
153 quota file. This is the preferred way to set this parameter. The
154 value is an integer number of seconds greater than zero. The
155 default is 60 seconds. Shorter settings result in faster updates
156 of the lazy quota information and less likelihood of someone
157 exceeding their quota. Longer settings make filesystem opera‐
158 tions involving quotas faster and more efficient.
159
160 statfs_quantum=secs
161 Setting statfs_quantum to 0 is the preferred way to set the slow
162 version of statfs. The default value is 30 secs which sets the
163 maximum time period before statfs changes will be syned to the
164 master statfs file. This can be adjusted to allow for faster,
165 less accurate statfs values or slower more accurate values. When
166 set to 0, statfs will always report the true values.
167
168 statfs_percent=value
169 This setting provides a bound on the maximum percentage change
170 in the statfs information on a local basis before it is synced
171 back to the master statfs file, even if the time period has not
172 expired. If the setting of statfs_quantum is 0, then this set‐
173 ting is ignored.
174
175 rgrplvb
176 This flag tells gfs2 to look for information about a resource
177 group's free space and unlinked inodes in its glock lock value
178 block. This keeps gfs2 from having to read in the resource group
179 data from disk, speeding up allocations in some cases. This
180 option was added in the 3.6 Linux kernel. Prior to this kernel,
181 no information was saved to the resource group lvb. Note: To
182 safely turn on this option, all nodes mounting the filesystem
183 must be running at least a 3.6 Linux kernel. If any nodes had
184 previously mounted the filesystem using older kernels, the
185 filesystem must be unmounted on all nodes before it can be
186 mounted with this option enabled. This option does not need to
187 be enabled on all nodes using a filesystem.
188
189 loccookie
190 This flag tells gfs2 to use location based readdir cookies,
191 instead of its usual filename hash readdir cookies. The file‐
192 name hash cookies are not guaranteed to be unique, and as the
193 number of files in a directory increases, so does the likelihood
194 of a collision. NFS requires readdir cookies to be unique,
195 which can cause problems with very large directories (over
196 100,000 files). With this flag set, gfs2 will try to give out
197 location based cookies. Since the cookie is 31 bits, gfs2 will
198 eventually run out of unique cookies, and will fail back to
199 using hash cookies. The maximum number of files that could have
200 unique location cookies assuming perfectly even hashing and
201 names of 8 or fewer characters is 1,073,741,824. An average
202 directory should be able to give out well over half a billion
203 location based cookies. This option was added in the 4.5 Linux
204 kernel. Prior to this kernel, gfs2 did not add directory entries
205 in a way that allowed it to use location based readdir cookies.
206 Note: To safely turn on this option, all nodes mounting the
207 filesystem must be running at least a 4.5 Linux kernel. If this
208 option is only enabled on some of the nodes mounting a filesys‐
209 tem, the cookies returned by nodes using this option will not be
210 valid on nodes that are not using this option, and vice versa.
211 Finally, when first enabling this option on a filesystem that
212 had been previously mounted without it, you must make sure that
213 there are no outstanding cookies being cached by other software,
214 such as NFS.
215
216
218 GFS2 doesn't support errors=remount-ro or data=journal. It is not pos‐
219 sible to switch support for user and group quotas on and off indepen‐
220 dently of each other. Some of the error messages are rather cryptic, if
221 you encounter one of these messages check firstly that gfs_controld is
222 running and secondly that you have enough journals on the filesystem
223 for the number of nodes in use.
224
225
227 mount(8) for general mount options, chmod(1) and chmod(2) for access
228 permission flags, acl(5) for access control lists, lvm(8) for volume
229 management, ccs(7) for cluster management, umount(8), initrd(4).
230
231 The GFS2 documentation has been split into a number of sections:
232
233 gfs2_edit(8) A GFS2 debug tool (use with caution) fsck.gfs2(8) The GFS2
234 file system checker gfs2_grow(8) Growing a GFS2 file system
235 gfs2_jadd(8) Adding a journal to a GFS2 file system mkfs.gfs2(8) Make a
236 GFS2 file system gfs2_quota(8) Manipulate GFS2 disk quotas gfs2_tool(8)
237 Tool to manipulate a GFS2 file system (obsolete) tunegfs2(8) Tool to
238 manipulate GFS2 superblocks
239
240
242 GFS2 clustering is driven by the dlm, which depends on dlm_controld to
243 provide clustering from userspace. dlm_controld clustering is built on
244 corosync cluster/group membership and messaging.
245
246 Follow these steps to manually configure and run gfs2/dlm/corosync.
247
248 1. create /etc/corosync/corosync.conf and copy to all nodes
249
250 In this sample, replace cluster_name and IP addresses, and add nodes as
251 needed. If using only two nodes, uncomment the two_node line. See
252 corosync.conf(5) for more information.
253
254 totem {
255 version: 2
256 secauth: off
257 cluster_name: abc
258 }
259
260 nodelist {
261 node {
262 ring0_addr: 10.10.10.1
263 nodeid: 1
264 }
265 node {
266 ring0_addr: 10.10.10.2
267 nodeid: 2
268 }
269 node {
270 ring0_addr: 10.10.10.3
271 nodeid: 3
272 }
273 }
274
275 quorum {
276 provider: corosync_votequorum
277 # two_node: 1
278 }
279
280 logging {
281 to_syslog: yes
282 }
283
284
285 2. start corosync on all nodes
286
287 systemctl start corosync
288
289 Run corosync-quorumtool to verify that all nodes are listed.
290
291
292 3. create /etc/dlm/dlm.conf and copy to all nodes
293
294 * To use no fencing, use this line:
295
296 enable_fencing=0
297
298 * To use no fencing, but exercise fencing functions, use this line:
299
300 fence_all /bin/true
301
302 The "true" binary will be executed for all nodes and will succeed (exit
303 0) immediately.
304
305 * To use manual fencing, use this line:
306
307 fence_all /bin/false
308
309 The "false" binary will be executed for all nodes and will fail (exit
310 1) immediately.
311
312 When a node fails, manually run: dlm_tool fence_ack <nodeid>
313
314 * To use stonith/pacemaker for fencing, use this line:
315
316 fence_all /usr/sbin/dlm_stonith
317
318 The "dlm_stonith" binary will be executed for all nodes. If
319 stonith/pacemaker systems are not available, dlm_stonith will fail and
320 this config becomes the equivalent of the previous /bin/false config.
321
322 * To use an APC power switch, use these lines:
323
324 device apc /usr/sbin/fence_apc ipaddr=1.1.1.1 login=admin password=pw
325 connect apc node=1 port=1
326 connect apc node=2 port=2
327 connect apc node=3 port=3
328
329 Other network switch based agents are configured similarly.
330
331 * To use sanlock/watchdog fencing, use these lines:
332
333 device wd /usr/sbin/fence_sanlock path=/dev/fence/leases
334 connect wd node=1 host_id=1
335 connect wd node=2 host_id=2
336 unfence wd
337
338 See fence_sanlock(8) for more information.
339
340 * For other fencing configurations see dlm.conf(5) man page.
341
342
343 4. start dlm_controld on all nodes
344
345 systemctl start dlm
346
347 Run "dlm_tool status" to verify that all nodes are listed.
348
349
350 5. if using clvm, start clvmd on all nodes
351
352 systemctl clvmd start
353
354
355 6. make new gfs2 file systems
356
357 mkfs.gfs2 -p lock_dlm -t cluster_name:fs_name -j num /path/to/storage
358
359 The cluster_name must match the name used in step 1 above. The fs_name
360 must be a unique name in the cluster. The -j option is the number of
361 journals to create, there must be one for each node that will mount the
362 fs.
363
364
365 7. mount gfs2 file systems
366
367 mount /path/to/storage /mountpoint
368
369 Run "dlm_tool ls" to verify the nodes that have each fs mounted.
370
371
372 8. shut down
373
374 umount -a -t gfs2
375 systemctl clvmd stop
376 systemctl dlm stop
377 systemctl corosync stop
378
379
380 More setup information:
381 dlm_controld(8),
382 dlm_tool(8),
383 dlm.conf(5),
384 corosync(8),
385 corosync.conf(5)
386
387
388
389 gfs2(5)