drbd.conf(5)

1DRBD.CONF(5)                  Configuration Files                 DRBD.CONF(5)
2
3
4

NAME

6       drbd.conf - Configuration file for DRBD's devices .
7

INTRODUCTION

9       The file /etc/drbd.conf is read by drbdadm.
10
11       The file format was designed as to allow to have a verbatim copy of the
12       file on both nodes of the cluster. It is highly recommended to do so in
13       order to keep your configuration manageable. The file /etc/drbd.conf
14       should be the same on both nodes of the cluster. Changes to
15       /etc/drbd.conf do not apply immediately.
16
17       In this example, there is a single DRBD resource (called r0) which uses
18       protocol C for the connection between its devices. The device which
19       runs on host alice uses /dev/drbd1 as devices for its application, and
20       /dev/sda7 as low-level storage for the data. The IP addresses are used
21       to specify the networking interfaces to be used. An eventually running
22       resync process should use about 10MByte/second of IO bandwidth.
23
24       There may be multiple resource sections in a single drbd.conf file. For
25       more examples, please have a look at the DRBD User's Guide[1].
26

FILE FORMAT

28       The file consists of sections and parameters. A section begins with a
29       keyword, sometimes an additional name, and an opening brace (“{”). A
30       section ends with a closing brace (“}”. The braces enclose the
31       parameters.
32
33       section [name] { parameter value; [...] }
34
35       A parameter starts with the identifier of the parameter followed by
36       whitespace. Every subsequent character is considered as part of the
37       parameter's value. A special case are Boolean parameters which consist
38       only of the identifier. Parameters are terminated by a semicolon (“;”).
39
40       Some parameter values have default units which might be overruled by K,
41       M or G. These units are defined in the usual way (K = 2^10 = 1024, M =
42       1024 K, G = 1024 M).
43
44       Comments may be placed into the configuration file and must begin with
45       a hash sign (“#”). Subsequent characters are ignored until the end of
46       the line.
47
48   Sections
49       skip
50           Comments out chunks of text, even spanning more than one line.
51           Characters between the keyword skip and the opening brace (“{”) are
52           ignored. Everything enclosed by the braces is skipped. This comes
53           in handy, if you just want to comment out some 'resource [name]
54           {...}' section: just precede it with '“skip”'.
55
56       global
57           Configures some global parameters. Currently only minor-count,
58           dialog-refresh, disable-ip-verification and usage-count are allowed
59           here. You may only have one global section, preferably as the first
60           section.
61
62       common
63           All resources inherit the options set in this section. The common
64           section might have a startup, a syncer, a handlers, a net and a
65           disk section.
66
67       resource name
68           Configures a DRBD resource. Each resource section needs to have two
69           (or more) on host sections and may have a startup, a syncer, a
70           handlers, a net and a disk section. Required parameter in this
71           section: protocol.
72
73       on host-name
74           Carries the necessary configuration parameters for a DRBD device of
75           the enclosing resource.  host-name is mandatory and must match the
76           Linux host name (uname -n) of one of the nodes. You may list more
77           than one host name here, in case you want to use the same
78           parameters on several hosts (you'd have to move the IP around
79           usually). Or you may list more than two such sections.
80
81                    resource r1 {
82                         protocol C;
83                         device minor 1;
84                         meta-disk internal;
85
86                         on alice bob {
87                              address 10.2.2.100:7801;
88                              disk /dev/mapper/some-san;
89                         }
90                         on charlie {
91                              address 10.2.2.101:7801;
92                              disk /dev/mapper/other-san;
93                         }
94                         on daisy {
95                              address 10.2.2.103:7801;
96                              disk /dev/mapper/other-san-as-seen-from-daisy;
97                         }
98                    }
99
100
101           See also the floating section keyword. Required parameters in this
102           section: device, disk, address, meta-disk, flexible-meta-disk.
103
104       stacked-on-top-of resource
105           For a stacked DRBD setup (3 or 4 nodes), a stacked-on-top-of is
106           used instead of an on section. Required parameters in this section:
107           device and address.
108
109       floating AF addr:port
110           Carries the necessary configuration parameters for a DRBD device of
111           the enclosing resource. This section is very similar to the on
112           section. The difference to the on section is that the matching of
113           the host sections to machines is done by the IP-address instead of
114           the node name. Required parameters in this section: device, disk,
115           meta-disk, flexible-meta-disk, all of which may be inherited from
116           the resource section, in which case you may shorten this section
117           down to just the address identifier.
118
119                    resource r2 {
120                         protocol C;
121                         device minor 2;
122                         disk      /dev/sda7;
123                         meta-disk internal;
124
125                         # short form, device, disk and meta-disk inherited
126                         floating 10.1.1.31:7802;
127
128                         # longer form, only device inherited
129                         floating 10.1.1.32:7802 {
130                              disk /dev/sdb;
131                              meta-disk /dev/sdc8;
132                         }
133                    }
134
135
136
137       disk
138           This section is used to fine tune DRBD's properties in respect to
139           the low level storage. Please refer to drbdsetup(8) for detailed
140           description of the parameters. Optional parameters: on-io-error,
141           size, fencing, use-bmbv, no-disk-barrier, no-disk-flushes,
142           no-disk-drain, no-md-flushes, max-bio-bvecs.
143
144       net
145           This section is used to fine tune DRBD's properties. Please refer
146           to drbdsetup(8) for a detailed description of this section's
147           parameters. Optional parameters: sndbuf-size, rcvbuf-size, timeout,
148           connect-int, ping-int, ping-timeout, max-buffers, max-epoch-size,
149           ko-count, allow-two-primaries, cram-hmac-alg, shared-secret,
150           after-sb-0pri, after-sb-1pri, after-sb-2pri, data-integrity-alg,
151           no-tcp-cork
152
153       startup
154           This section is used to fine tune DRBD's properties. Please refer
155           to drbdsetup(8) for a detailed description of this section's
156           parameters. Optional parameters: wfc-timeout, degr-wfc-timeout,
157           outdated-wfc-timeout, wait-after-sb, stacked-timeouts and
158           become-primary-on.
159
160       syncer
161           This section is used to fine tune the synchronization daemon for
162           the device. Please refer to drbdsetup(8) for a detailed description
163           of this section's parameters. Optional parameters: rate, after,
164           al-extents, use-rle, cpu-mask, verify-alg, csums-alg,
165           delay-probe-volume, delay-probe-interval, throttle-threshold and
166           hold-off-threshold.
167
168       handlers
169           In this section you can define handlers (executables) that are
170           started by the DRBD system in response to certain events. Optional
171           parameters: pri-on-incon-degr, pri-lost-after-sb, pri-lost,
172           fence-peer (formerly oudate-peer), local-io-error,
173           initial-split-brain, split-brain, before-resync-target,
174           after-resync-target.
175
176           The interface is done via environment variables:
177
178           DRBD_PEER is deprecated.
179
180           Please note that not all of these might be set for all handlers,
181           and that some values might not be useable for a floating
182           definition.
183
184   Parameters
185       minor-count count
186           may be a number from 1 to 255.
187
188           Use minor-count if you want to define massively more resources
189           later without reloading the DRBD kernel module. Per default the
190           module loads with 11 more resources than you have currently in your
191           config but at least 32.
192
193       dialog-refresh time
194           may be 0 or a positive number.
195
196           The user dialog redraws the second count every time seconds (or
197           does no redraws if time is 0). The default value is 1.
198
199       disable-ip-verification
200           Use disable-ip-verification if, for some obscure reasons, drbdadm
201           can/might not use ip or ifconfig to do a sanity check for the IP
202           address. You can disable the IP verification with this option.
203
204       usage-count val
205           Please participate in DRBD's online usage counter[2]. The most
206           convenient way to do so is to set this option to yes. Valid options
207           are: yes, no and ask.
208
209       protocol prot-id
210           On the TCP/IP link the specified protocol is used. Valid protocol
211           specifiers are A, B, and C.
212
213           Protocol A: write IO is reported as completed, if it has reached
214           local disk and local TCP send buffer.
215
216           Protocol B: write IO is reported as completed, if it has reached
217           local disk and remote buffer cache.
218
219           Protocol C: write IO is reported as completed, if it has reached
220           both local and remote disk.
221
222       device name minor nr
223           The name of the block device node of the resource being described.
224           You must use this device with your application (file system) and
225           you must not use the low level block device which is specified with
226           the disk parameter.
227
228           One can ether omit the name or minor and the minor number. If you
229           omit the name a default of /dev/drbdminor will be used.
230
231           Udev will create additional symlinks in /dev/drbd/by-res and
232           /dev/drbd/by-disk.
233
234       disk name
235           DRBD uses this block device to actually store and retrieve the
236           data. Never access such a device while DRBD is running on top of
237           it. This also holds true for dumpe2fs(8) and similar commands.
238
239       address AF addr:port
240           A resource needs one IP address per device, which is used to wait
241           for incoming connections from the partner device respectively to
242           reach the partner device.  AF must be one of ipv4, ipv6, ssocks or
243           sdp (for compatibility reasons sci is an alias for ssocks). It may
244           be omited for IPv4 addresses. The actual IPv6 address that follows
245           the ipv6 keyword must be placed inside brackets: ipv6
246           [fd01:2345:6789:abcd::1]:7800.
247
248           Each DRBD resource needs a TCP port which is used to connect to the
249           node's partner device. Two different DRBD resources may not use the
250           same addr:port combination on the same node.
251
252       meta-disk internal, flexible-meta-disk internal, meta-disk device
253       [index], flexible-meta-disk device
254           Internal means that the last part of the backing device is used to
255           store the meta-data. You must not use [index] with internal. Note:
256           Regardless of whether you use the meta-disk or the
257           flexible-meta-disk keyword, it will always be of the size needed
258           for the remaining storage size.
259
260           You can use a single block device to store meta-data of multiple
261           DRBD devices. E.g. use meta-disk /dev/sde6[0]; and meta-disk
262           /dev/sde6[1]; for two different resources. In this case the
263           meta-disk would need to be at least 256 MB in size.
264
265           With the flexible-meta-disk keyword you specify a block device as
266           meta-data storage. You usually use this with LVM, which allows you
267           to have many variable sized block devices. The required size of the
268           meta-disk block device is 36kB + Backing-Storage-size / 32k. Round
269           this number to the next 4kb boundary up and you have the exact
270           size. Rule of the thumb: 32kByte per 1GByte of storage, round up to
271           the next MB.
272
273       on-io-error handler
274           is taken, if the lower level device reports io-errors to the upper
275           layers.
276
277           handler may be pass_on, call-local-io-error or detach.
278
279           pass_on: Report the io-error to the upper layers. On Primary report
280           it to the mounted file system. On Secondary ignore it.
281
282           call-local-io-error: Call the handler script local-io-error.
283
284           detach: The node drops its low level device, and continues in
285           diskless mode.
286
287       fencing fencing_policy
288           By fencing we understand preventive measures to avoid situations
289           where both nodes are primary and disconnected (AKA split brain).
290
291           Valid fencing policies are:
292
293           dont-care
294               This is the default policy. No fencing actions are taken.
295
296           resource-only
297               If a node becomes a disconnected primary, it tries to fence the
298               peer's disk. This is done by calling the fence-peer handler.
299               The handler is supposed to reach the other node over
300               alternative communication paths and call 'drbdadm outdate res'
301               there.
302
303           resource-and-stonith
304               If a node becomes a disconnected primary, it freezes all its IO
305               operations and calls its fence-peer handler. The fence-peer
306               handler is supposed to reach the peer over alternative
307               communication paths and call 'drbdadm outdate res' there. In
308               case it cannot reach the peer it should stonith the peer. IO is
309               resumed as soon as the situation is resolved. In case your
310               handler fails, you can resume IO with the resume-io command.
311
312       use-bmbv
313           In case the backing storage's driver has a merge_bvec_fn()
314           function, DRBD has to pretend that it can only process IO requests
315           in units not larger than 4KiB. (At the time of writing the only
316           known drivers which have such a function are: md (software raid
317           driver), dm (device mapper - LVM) and DRBD itself).
318
319           To get the best performance out of DRBD on top of software RAID (or
320           any other driver with a merge_bvec_fn() function) you might enable
321           this function, if you know for sure that the merge_bvec_fn()
322           function will deliver the same results on all nodes of your
323           cluster. I.e. the physical disks of the software RAID are of
324           exactly the same type.  Use this option only if you know what you
325           are doing.
326
327       no-disk-barrier, no-disk-flushes, no-disk-drain
328           DRBD has four implementations to express write-after-write
329           dependencies to its backing storage device. DRBD will use the first
330           method that is supported by the backing storage device and that is
331           not disabled by the user.
332
333           When selecting the method you should not only base your decision on
334           the measurable performance. In case your backing storage device has
335           a volatile write cache (plain disks, RAID of plain disks) you
336           should use one of the first two. In case your backing storage
337           device has battery-backed write cache you may go with option 3 or
338           4. Option 4 will deliver the best performance on such devices.
339
340           Unfortunately device mapper (LVM) might not support barriers.
341
342           The letter after "wo:" in /proc/drbd indicates with method is
343           currently in use for a device: b, f, d, n. The implementations are:
344
345           barrier
346               The first requires that the driver of the backing storage
347               device support barriers (called 'tagged command queuing' in
348               SCSI and 'native command queuing' in SATA speak). The use of
349               this method can be disabled by the no-disk-barrier option.
350
351           flush
352               The second requires that the backing device support disk
353               flushes (called 'force unit access' in the drive vendors
354               speak). The use of this method can be disabled using the
355               no-disk-flushes option.
356
357           drain
358               The third method is simply to let write requests drain before
359               write requests of a new reordering domain are issued. This was
360               the only implementation before 8.0.9. You can disable this
361               method by using the no-disk-drain option.
362
363           none
364               The fourth method is to not express write-after-write
365               dependencies to the backing store at all.
366
367       no-md-flushes
368           Disables the use of disk flushes and barrier BIOs when accessing
369           the meta data device. See the notes on no-disk-flushes.
370
371       max-bio-bvecs
372           In some special circumstances the device mapper stack manages to
373           pass BIOs to DRBD that violate the constraints that are set forth
374           by DRBD's merge_bvec() function and which have more than one bvec.
375           A known example is: phys-disk -> DRBD -> LVM -> Xen -> misaligned
376           partition (63) -> DomU FS. Then you might see "bio would need to,
377           but cannot, be split:" in the Dom0's kernel log.
378
379           The best workaround is to proper align the partition within the VM
380           (E.g. start it at sector 1024). This costs 480 KiB of storage.
381           Unfortunately the default of most Linux partitioning tools is to
382           start the first partition at an odd number (63). Therefore most
383           distribution's install helpers for virtual linux machines will end
384           up with misaligned partitions. The second best workaround is to
385           limit DRBD's max bvecs per BIO (= max-bio-bvecs) to 1, but that
386           might cost performance.
387
388           The default value of max-bio-bvecs is 0, which means that there is
389           no user imposed limitation.
390
391       sndbuf-size size
392           is the size of the TCP socket send buffer. The default value is 0,
393           i.e. autotune. You can specify smaller or larger values. Larger
394           values are appropriate for reasonable write throughput with
395           protocol A over high latency networks. Values below 32K do not make
396           sense. Since 8.0.13 resp. 8.2.7, setting the size value to 0 means
397           that the kernel should autotune this.
398
399       rcvbuf-size size
400           is the size of the TCP socket receive buffer. The default value is
401           0, i.e. autotune. You can specify smaller or larger values. Usually
402           this should be left at its default. Setting the size value to 0
403           means that the kernel should autotune this.
404
405       timeout time
406           If the partner node fails to send an expected response packet
407           within time tenths of a second, the partner node is considered dead
408           and therefore the TCP/IP connection is abandoned. This must be
409           lower than connect-int and ping-int. The default value is 60 = 6
410           seconds, the unit 0.1 seconds.
411
412       connect-int time
413           In case it is not possible to connect to the remote DRBD device
414           immediately, DRBD keeps on trying to connect. With this option you
415           can set the time between two retries. The default value is 10
416           seconds, the unit is 1 second.
417
418       ping-int time
419           If the TCP/IP connection linking a DRBD device pair is idle for
420           more than time seconds, DRBD will generate a keep-alive packet to
421           check if its partner is still alive. The default is 10 seconds, the
422           unit is 1 second.
423
424       ping-timeout time
425           The time the peer has time to answer to a keep-alive packet. In
426           case the peer's reply is not received within this time period, it
427           is considered as dead. The default value is 500ms, the default unit
428           are tenths of a second.
429
430       max-buffers number
431           Maximum number of requests to be allocated by DRBD. Unit is
432           PAGE_SIZE, which is 4 KiB on most systems. The minimum is hard
433           coded to 32 (=128 KiB). For high-performance installations it might
434           help if you increase that number. These buffers are used to hold
435           data blocks while they are written to disk.
436
437       ko-count number
438           In case the secondary node fails to complete a single write request
439           for count times the timeout, it is expelled from the cluster. (I.e.
440           the primary node goes into StandAlone mode.) The default value is
441           0, which disables this feature.
442
443       max-epoch-size number
444           The highest number of data blocks between two write barriers. If
445           you set this smaller than 10, you might decrease your performance.
446
447       allow-two-primaries
448           With this option set you may assign the primary role to both nodes.
449           You only should use this option if you use a shared storage file
450           system on top of DRBD. At the time of writing the only ones are:
451           OCFS2 and GFS. If you use this option with any other file system,
452           you are going to crash your nodes and to corrupt your data!
453
454       unplug-watermark number
455           When the number of pending write requests on the standby
456           (secondary) node exceeds the unplug-watermark, we trigger the
457           request processing of our backing storage device. Some storage
458           controllers deliver better performance with small values, others
459           deliver best performance when the value is set to the same value as
460           max-buffers. Minimum 16, default 128, maximum 131072.
461
462       cram-hmac-alg
463           You need to specify the HMAC algorithm to enable peer
464           authentication at all. You are strongly encouraged to use peer
465           authentication. The HMAC algorithm will be used for the challenge
466           response authentication of the peer. You may specify any digest
467           algorithm that is named in /proc/crypto.
468
469       shared-secret
470           The shared secret used in peer authentication. May be up to 64
471           characters. Note that peer authentication is disabled as long as no
472           cram-hmac-alg (see above) is specified.
473
474       after-sb-0pri  policy
475           possible policies are:
476
477           disconnect
478               No automatic resynchronization, simply disconnect.
479
480           discard-younger-primary
481               Auto sync from the node that was primary before the split-brain
482               situation happened.
483
484           discard-older-primary
485               Auto sync from the node that became primary as second during
486               the split-brain situation.
487
488           discard-zero-changes
489               In case one node did not write anything since the split brain
490               became evident, sync from the node that wrote something to the
491               node that did not write anything. In case none wrote anything
492               this policy uses a random decision to perform a "resync" of 0
493               blocks. In case both have written something this policy
494               disconnects the nodes.
495
496           discard-least-changes
497               Auto sync from the node that touched more blocks during the
498               split brain situation.
499
500           discard-node-NODENAME
501               Auto sync to the named node.
502
503       after-sb-1pri  policy
504           possible policies are:
505
506           disconnect
507               No automatic resynchronization, simply disconnect.
508
509           consensus
510               Discard the version of the secondary if the outcome of the
511               after-sb-0pri algorithm would also destroy the current
512               secondary's data. Otherwise disconnect.
513
514           violently-as0p
515               Always take the decision of the after-sb-0pri algorithm, even
516               if that causes an erratic change of the primary's view of the
517               data. This is only useful if you use a one-node FS (i.e. not
518               OCFS2 or GFS) with the allow-two-primaries flag, AND if you
519               really know what you are doing. This is DANGEROUS and MAY CRASH
520               YOUR MACHINE if you have an FS mounted on the primary node.
521
522           discard-secondary
523               Discard the secondary's version.
524
525           call-pri-lost-after-sb
526               Always honor the outcome of the after-sb-0pri algorithm. In
527               case it decides the current secondary has the right data, it
528               calls the "pri-lost-after-sb" handler on the current primary.
529
530       after-sb-2pri  policy
531           possible policies are:
532
533           disconnect
534               No automatic resynchronization, simply disconnect.
535
536           violently-as0p
537               Always take the decision of the after-sb-0pri algorithm, even
538               if that causes an erratic change of the primary's view of the
539               data. This is only useful if you use a one-node FS (i.e. not
540               OCFS2 or GFS) with the allow-two-primaries flag, AND if you
541               really know what you are doing. This is DANGEROUS and MAY CRASH
542               YOUR MACHINE if you have an FS mounted on the primary node.
543
544           call-pri-lost-after-sb
545               Call the "pri-lost-after-sb" helper program on one of the
546               machines. This program is expected to reboot the machine, i.e.
547               make it secondary.
548
549       always-asbp
550           Normally the automatic after-split-brain policies are only used if
551           current states of the UUIDs do not indicate the presence of a third
552           node.
553
554           With this option you request that the automatic after-split-brain
555           policies are used as long as the data sets of the nodes are somehow
556           related. This might cause a full sync, if the UUIDs indicate the
557           presence of a third node. (Or double faults led to strange UUID
558           sets.)
559
560       rr-conflict  policy
561           This option helps to solve the cases when the outcome of the resync
562           decision is incompatible with the current role assignment in the
563           cluster.
564
565           disconnect
566               No automatic resynchronization, simply disconnect.
567
568           violently
569               Sync to the primary node is allowed, violating the assumption
570               that data on a block device are stable for one of the nodes.
571               Dangerous, do not use.
572
573           call-pri-lost
574               Call the "pri-lost" helper program on one of the machines. This
575               program is expected to reboot the machine, i.e. make it
576               secondary.
577
578       data-integrity-alg  alg
579           DRBD can ensure the data integrity of the user's data on the
580           network by comparing hash values. Normally this is ensured by the
581           16 bit checksums in the headers of TCP/IP packets.
582
583           This option can be set to any of the kernel's data digest
584           algorithms. In a typical kernel configuration you should have at
585           least one of md5, sha1, and crc32c available. By default this is
586           not enabled.
587
588           See also the notes on data integrity.
589
590       no-tcp-cork
591           DRBD usually uses the TCP socket option TCP_CORK to hint to the
592           network stack when it can expect more data, and when it should
593           flush out what it has in its send queue. It turned out that there
594           is at least one network stack that performs worse when one uses
595           this hinting method. Therefore we introducted this option, which
596           disables the setting and clearing of the TCP_CORK socket option by
597           DRBD.
598
599       wfc-timeout time
600           Wait for connection timeout.  The init script drbd(8) blocks the
601           boot process until the DRBD resources are connected. When the
602           cluster manager starts later, it does not see a resource with
603           internal split-brain. In case you want to limit the wait time, do
604           it here. Default is 0, which means unlimited. The unit is seconds.
605
606       degr-wfc-timeout time
607           Wait for connection timeout, if this node was a degraded cluster.
608           In case a degraded cluster (= cluster with only one node left) is
609           rebooted, this timeout value is used instead of wfc-timeout,
610           because the peer is less likely to show up in time, if it had been
611           dead before. Value 0 means unlimited.
612
613       outdated-wfc-timeout time
614           Wait for connection timeout, if the peer was outdated. In case a
615           degraded cluster (= cluster with only one node left) with an
616           outdated peer disk is rebooted, this timeout value is used instead
617           of wfc-timeout, because the peer is not allowed to become primary
618           in the meantime. Value 0 means unlimited.
619
620       wait-after-sb
621           By setting this option you can make the init script to continue to
622           wait even if the device pair had a split brain situation and
623           therefore refuses to connect.
624
625       become-primary-on node-name
626           Sets on which node the device should be promoted to primary role by
627           the init script. The node-name might either be a host name or the
628           keyword both. When this option is not set the devices stay in
629           secondary role on both nodes. Usually one delegates the role
630           assignment to a cluster manager (e.g. heartbeat).
631
632       stacked-timeouts
633           Usually wfc-timeout and degr-wfc-timeout are ignored for stacked
634           devices, instead twice the amount of connect-int is used for the
635           connection timeouts. With the stacked-timeouts keyword you disable
636           this, and force DRBD to mind the wfc-timeout and degr-wfc-timeout
637           statements. Only do that if the peer of the stacked resource is
638           usually not available or will usually not become primary. By using
639           this option incorrectly, you run the risk of causing unexpected
640           split brain.
641
642       rate rate
643           To ensure a smooth operation of the application on top of DRBD, it
644           is possible to limit the bandwidth which may be used by background
645           synchronizations. The default is 250 KB/sec, the default unit is
646           KB/sec. Optional suffixes K, M, G are allowed.
647
648       use-rle
649           During resync-handshake, the dirty-bitmaps of the nodes are
650           exchanged and merged (using bit-or), so the nodes will have the
651           same understanding of which blocks are dirty. On large devices, the
652           fine grained dirty-bitmap can become large as well, and the bitmap
653           exchange can take quite some time on low-bandwidth links.
654
655           Because the bitmap typically contains compact areas where all bits
656           are unset (clean) or set (dirty), a simple run-length encoding
657           scheme can considerably reduce the network traffic necessary for
658           the bitmap exchange.
659
660           For backward compatibilty reasons, and because on fast links this
661           possibly does not improve transfer time but consumes cpu cycles,
662           this defaults to off.
663
664       after res-name
665           By default, resynchronization of all devices would run in parallel.
666           By defining a sync-after dependency, the resynchronization of this
667           resource will start only if the resource res-name is already in
668           connected state (i.e., has finished its resynchronization).
669
670       al-extents extents
671           DRBD automatically performs hot area detection. With this parameter
672           you control how big the hot area (= active set) can get. Each
673           extent marks 4M of the backing storage (= low-level device). In
674           case a primary node leaves the cluster unexpectedly, the areas
675           covered by the active set must be resynced upon rejoining of the
676           failed node. The data structure is stored in the meta-data area,
677           therefore each change of the active set is a write operation to the
678           meta-data device. A higher number of extents gives longer resync
679           times but less updates to the meta-data. The default number of
680           extents is 127. (Minimum: 7, Maximum: 3843)
681
682       verify-alg hash-alg
683           During online verification (as initiated by the verify
684           sub-command), rather than doing a bit-wise comparison, DRBD applies
685           a hash function to the contents of every block being verified, and
686           compares that hash with the peer. This option defines the hash
687           algorithm being used for that purpose. It can be set to any of the
688           kernel's data digest algorithms. In a typical kernel configuration
689           you should have at least one of md5, sha1, and crc32c available. By
690           default this is not enabled; you must set this option explicitly in
691           order to be able to use on-line device verification.
692
693           See also the notes on data integrity.
694
695       csums-alg hash-alg
696           A resync process sends all marked data blocks from the source to
697           the destination node, as long as no csums-alg is given. When one is
698           specified the resync process exchanges hash values of all marked
699           blocks first, and sends only those data blocks that have different
700           hash values.
701
702           This setting is useful for DRBD setups with low bandwidth links.
703           During the restart of a crashed primary node, all blocks covered by
704           the activity log are marked for resync. But a large part of those
705           will actually be still in sync, therefore using csums-alg will
706           lower the required bandwidth in exchange for CPU cycles.
707
708       delay-probe-volume bytes, delay-probe-interval interval,
709       throttle-threshold throttle_delay, hold-off-threshold hold_off_delay
710           During resync at least every bytes of data and at least every
711           interval * 100ms a pair of delay probes get inserted in DRBD's
712           packet stream. Those packets are used to measure the delay of
713           packts on the data socket caused by queuing in various network
714           components along the path.
715
716           If the delay on the data socket becomes greater than throttle_delay
717           DRBD will slow down the resync in order to keep the delay small.
718           The resync speed gets linearly slowed down it reaches 0 at a delay
719           of hold_off_delay.
720
721       cpu-mask cpu-mask
722           Sets the cpu-affinity-mask for DRBD's kernel threads of this
723           device. The default value of cpu-mask is 0, which means that DRBD's
724           kernel threads should be spread over all CPUs of the machine. This
725           value must be given in hexadecimal notation. If it is too big it
726           will be truncated.
727
728       pri-on-incon-degr cmd
729           This handler is called if the node is primary, degraded and if the
730           local copy of the data is inconsistent.
731
732       pri-lost-after-sb cmd
733           The node is currently primary, but lost the after-split-brain auto
734           recovery procedure. As as consequence, it should be abandoned.
735
736       pri-lost cmd
737           The node is currently primary, but DRBD's algorithm thinks that it
738           should become sync target. As a consequence it should give up its
739           primary role.
740
741       fence-peer cmd
742           The handler is part of the fencing mechanism. This handler is
743           called in case the node needs to fence the peer's disk. It should
744           use other communication paths than DRBD's network link.
745
746       local-io-error cmd
747           DRBD got an IO error from the local IO subsystem.
748
749       initial-split-brain cmd
750           DRBD has connected and detected a split brain situation. This
751           handler can alert someone in all cases of split brain, not just
752           those that go unresolved.
753
754       split-brain cmd
755           DRBD detected a split brain situation but remains unresolved.
756           Manual recovery is necessary. This handler should alert someone on
757           duty.
758
759       before-resync-target cmd
760           DRBD calls this handler just before a resync begins on the node
761           that becomes resync target. It might be used to take a snapshot of
762           the backing block device.
763
764       after-resync-target cmd
765           DRBD calls this handler just after a resync operation finished on
766           the node whose disk just became consistent after being inconsistent
767           for the duration of the resync. It might be used to remove a
768           snapshot of the backing device that was created by the
769           before-resync-target handler.
770
771   Other Keywords
772       include file-pattern
773           Include all files matching the wildcard pattern file-pattern. The
774           include statement is only allowed on the top level, i.e. it is not
775           allowed inside any section.
776

NOTES ON DATA INTEGRITY

778       There are two independent methods in DRBD to ensure the integrity of
779       the mirrored data. The online-verify mechanism and the
780       data-integrity-alg of the network section.
781
782       Both mechanisms might deliver false positives if the user of DRBD
783       modifies the data which gets written to disk while the transfer goes
784       on. Currently the swap code and ReiserFS are known to do so. In both
785       cases this is not a problem, because when the initiator of the data
786       transfer does this it already knows that that data block will not be
787       part of an on disk data structure.
788
789       The most recent (2007) example of systematically corruption was an
790       issue with the TCP offloading engine and the driver of a certain type
791       of GBit NIC. The actual corruption happened on the DMA transfer from
792       core memory to the card. Since the TCP checksum gets calculated on the
793       card this type of corruption stays undetected as long as you do not use
794       either the online verify or the data-integrity-alg.
795
796       We suggest to use the data-integrity-alg only during a pre-production
797       phase due to its CPU costs. Further we suggest to do online verify runs
798       regularly e.g. once a month during a low load period.
799

VERSION

801       This document was revised for version 8.3.2 of the DRBD distribution.
802

AUTHOR

804       Written by Philipp Reisner philipp.reisner@linbit.com and Lars
805       Ellenberg lars.ellenberg@linbit.com.
806

REPORTING BUGS

808       Report bugs to drbd-user@lists.linbit.com.
809

COPYRIGHT

811       Copyright 2001-2008 LINBIT Information Technologies, Philipp Reisner,
812       Lars Ellenberg. This is free software; see the source for copying
813       conditions. There is NO warranty; not even for MERCHANTABILITY or
814       FITNESS FOR A PARTICULAR PURPOSE.
815

NOTES

821        1. DRBD User's Guide
822           http://www.drbd.org/users-guide/
823
824        2. DRBD's online usage counter
825           http://usage.drbd.org
826
827        3. DRBD web site
828           http://www.drbd.org/
829
830
831
832DRBD 8.3.2                        5 Dec 2008                      DRBD.CONF(5)