1DRBD.CONF(5) Configuration Files DRBD.CONF(5)
2
3
4
6 drbd.conf - DRBD Configuration Files
7
9 DRBD implements block devices which replicate their data to all nodes
10 of a cluster. The actual data and associated metadata are usually
11 stored redundantly on "ordinary" block devices on each cluster node.
12
13 Replicated block devices are called /dev/drbdminor by default. They are
14 grouped into resources, with one or more devices per resource.
15 Replication among the devices in a resource takes place in
16 chronological order. With DRBD, we refer to the devices inside a
17 resource as volumes.
18
19 In DRBD 9, a resource can be replicated between two or more cluster
20 nodes. The connections between cluster nodes are point-to-point links,
21 and use TCP or a TCP-like protocol. All nodes must be directly
22 connected.
23
24 DRBD consists of low-level user-space components which interact with
25 the kernel and perform basic operations (drbdsetup, drbdmeta), a
26 high-level user-space component which understands and processes the
27 DRBD configuration and translates it into basic operations of the
28 low-level components (drbdadm), and a kernel component.
29
30 The default DRBD configuration consists of /etc/drbd.conf and of
31 additional files included from there, usually global_common.conf and
32 all *.res files inside /etc/drbd.d/. It has turned out to be useful to
33 define each resource in a separate *.res file.
34
35 The configuration files are designed so that each cluster node can
36 contain an identical copy of the entire cluster configuration. The host
37 name of each node determines which parts of the configuration apply
38 (uname -n). It is highly recommended to keep the cluster configuration
39 on all nodes in sync by manually copying it to all nodes, or by
40 automating the process with csync2 or a similar tool.
41
43 global {
44 usage-count yes;
45 udev-always-use-vnr;
46 }
47 resource r0 {
48 net {
49 cram-hmac-alg sha1;
50 shared-secret "FooFunFactory";
51 }
52 volume 0 {
53 device "/dev/drbd1";
54 disk "/dev/sda7";
55 meta-disk internal;
56 }
57 on "alice" {
58 node-id 0;
59 address 10.1.1.31:7000;
60 }
61 on "bob" {
62 node-id 1;
63 address 10.1.1.32:7000;
64 }
65 connection {
66 host "alice" port 7000;
67 host "bob" port 7000;
68 net {
69 protocol C;
70 }
71 }
72 }
73
74 This example defines a resource r0 which contains a single replicated
75 device with volume number 0. The resource is replicated among hosts
76 alice and bob, which have the IPv4 addresses 10.1.1.31 and 10.1.1.32
77 and the node identifiers 0 and 1, respectively. On both hosts, the
78 replicated device is called /dev/drbd1, and the actual data and
79 metadata are stored on the lower-level device /dev/sda7. The connection
80 between the hosts uses protocol C.
81
82 Enclose strings within double-quotation marks (") to differentiate them
83 from resource keywords. Please refer to the DRBD User's Guide[1] for
84 more examples.
85
87 DRBD configuration files consist of sections, which contain other
88 sections and parameters depending on the section types. Each section
89 consists of one or more keywords, sometimes a section name, an opening
90 brace (“{”), the section's contents, and a closing brace (“}”).
91 Parameters inside a section consist of a keyword, followed by one or
92 more keywords or values, and a semicolon (“;”).
93
94 Some parameter values have a default scale which applies when a plain
95 number is specified (for example Kilo, or 1024 times the numeric
96 value). Such default scales can be overridden by using a suffix (for
97 example, M for Mega). The common suffixes K = 2^10 = 1024, M = 1024 K,
98 and G = 1024 M are supported.
99
100 Comments start with a hash sign (“#”) and extend to the end of the
101 line. In addition, any section can be prefixed with the keyword skip,
102 which causes the section and any sub-sections to be ignored.
103
104 Additional files can be included with the include file-pattern
105 statement (see glob(7) for the expressions supported in file-pattern).
106 Include statements are only allowed outside of sections.
107
108 The following sections are defined (indentation indicates in which
109 context):
110
111 common
112 [disk]
113 [handlers]
114 [net]
115 [options]
116 [startup]
117 global
118 [require-drbd-module-version-{eq,ne,gt,ge,lt,le}]
119 resource
120 connection
121 multiple path | 2 host
122 [net]
123 [volume]
124 [peer-device-options]
125 [peer-device-options]
126 connection-mesh
127 [net]
128 [disk]
129 floating
130 handlers
131 [net]
132 on
133 volume
134 disk
135 [disk]
136 options
137 stacked-on-top-of
138 startup
139
140 Sections in brackets affect other parts of the configuration: inside
141 the common section, they apply to all resources. A disk section inside
142 a resource or on section applies to all volumes of that resource, and a
143 net section inside a resource section applies to all connections of
144 that resource. This allows to avoid repeating identical options for
145 each resource, connection, or volume. Options can be overridden in a
146 more specific resource, connection, on, or volume section.
147
148 peer-device-options are resync-rate, c-plan-ahead, c-delay-target,
149 c-fill-target, c-max-rate and c-min-rate. Due to backward
150 comapatibility they can be specified in any disk options section as
151 well. They are inherited into all relevant connections. If they are
152 given on connection level they are inherited to all volumes on that
153 connection. A peer-device-options section is started with the disk
154 keyword.
155
156 Sections
157 common
158
159 This section can contain each a disk, handlers, net, options, and
160 startup section. All resources inherit the parameters in these
161 sections as their default values.
162
163 connection [name]
164
165 Define a connection between two hosts. This section must contain
166 two host parameters or multiple path sections. The optional name is
167 used to refer to the connection in the system log and in other
168 messages. If no name is specified, the peer's host name is used
169 instead.
170
171 path
172
173 Define a path between two hosts. This section must contain two host
174 parameters.
175
176 connection-mesh
177
178 Define a connection mesh between multiple hosts. This section must
179 contain a hosts parameter, which has the host names as arguments.
180 This section is a shortcut to define many connections which share
181 the same network options.
182
183 disk
184
185 Define parameters for a volume. All parameters in this section are
186 optional.
187
188 floating [address-family] addr:port
189
190 Like the on section, except that instead of the host name a network
191 address is used to determine if it matches a floating section.
192
193 The node-id parameter in this section is required. If the address
194 parameter is not provided, no connections to peers will be created
195 by default. The device, disk, and meta-disk parameters must be
196 defined in, or inherited by, this section.
197
198 global
199
200 Define some global parameters. All parameters in this section are
201 optional. Only one global section is allowed in the configuration.
202
203 require-drbd-module-version-{eq,ne,gt,ge,lt,le}
204
205 This statement contains one of the valid forms and a three digit
206 version number (e.g., require-drbd-module-version-eq 9.0.16;). If
207 the currently loaded DRBD kernel module does not match the
208 specification, parsing is aborted. Comparison operator names have
209 same semantic as in test(1).
210
211 handlers
212
213 Define handlers to be invoked when certain events occur. The kernel
214 passes the resource name in the first command-line argument and
215 sets the following environment variables depending on the event's
216 context:
217
218 • For events related to a particular device: the device's minor
219 number in DRBD_MINOR, the device's volume number in
220 DRBD_VOLUME.
221
222 • For events related to a particular device on a particular peer:
223 the connection endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF,
224 DRBD_PEER_ADDRESS, and DRBD_PEER_AF; the device's local minor
225 number in DRBD_MINOR, and the device's volume number in
226 DRBD_VOLUME.
227
228 • For events related to a particular connection: the connection
229 endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF, DRBD_PEER_ADDRESS,
230 and DRBD_PEER_AF; and, for each device defined for that
231 connection: the device's minor number in
232 DRBD_MINOR_volume-number.
233
234 • For events that identify a device, if a lower-level device is
235 attached, the lower-level device's device name is passed in
236 DRBD_BACKING_DEV (or DRBD_BACKING_DEV_volume-number).
237
238 All parameters in this section are optional. Only a single handler
239 can be defined for each event; if no handler is defined, nothing
240 will happen.
241
242 net
243
244 Define parameters for a connection. All parameters in this section
245 are optional.
246
247 on host-name [...]
248
249 Define the properties of a resource on a particular host or set of
250 hosts. Specifying more than one host name can make sense in a setup
251 with IP address failover, for example. The host-name argument must
252 match the Linux host name (uname -n).
253
254 Usually contains or inherits at least one volume section. The
255 node-id and address parameters must be defined in this section. The
256 device, disk, and meta-disk parameters must be defined in, or
257 inherited by, this section.
258
259 A normal configuration file contains two or more on sections for
260 each resource. Also see the floating section.
261
262 options
263
264 Define parameters for a resource. All parameters in this section
265 are optional.
266
267 resource name
268
269 Define a resource. Usually contains at least two on sections and at
270 least one connection section.
271
272 stacked-on-top-of resource
273
274 Used instead of an on section for configuring a stacked resource
275 with three to four nodes.
276
277 Starting with DRBD 9, stacking is deprecated. It is advised to use
278 resources which are replicated among more than two nodes instead.
279
280 startup
281
282 The parameters in this section determine the behavior of a resource
283 at startup time.
284
285 volume volume-number
286
287 Define a volume within a resource. The volume numbers in the
288 various volume sections of a resource define which devices on which
289 hosts form a replicated device.
290
291 Section connection Parameters
292 host name [address [address-family] address] [port port-number]
293
294 Defines an endpoint for a connection. Each host statement refers to
295 an on section in a resource. If a port number is defined, this
296 endpoint will use the specified port instead of the port defined in
297 the on section. Each connection section must contain exactly two
298 host parameters. Instead of two host parameters the connection may
299 contain multiple path sections.
300
301 Section path Parameters
302 host name [address [address-family] address] [port port-number]
303
304 Defines an endpoint for a connection. Each host statement refers to
305 an on section in a resource. If a port number is defined, this
306 endpoint will use the specified port instead of the port defined in
307 the on section. Each path section must contain exactly two host
308 parameters.
309
310 Section connection-mesh Parameters
311 hosts name...
312
313 Defines all nodes of a mesh. Each name refers to an on section in a
314 resource. The port that is defined in the on section will be used.
315
316 Section disk Parameters
317 al-extents extents
318
319 DRBD automatically maintains a "hot" or "active" disk area likely
320 to be written to again soon based on the recent write activity. The
321 "active" disk area can be written to immediately, while "inactive"
322 disk areas must be "activated" first, which requires a meta-data
323 write. We also refer to this active disk area as the "activity
324 log".
325
326 The activity log saves meta-data writes, but the whole log must be
327 resynced upon recovery of a failed node. The size of the activity
328 log is a major factor of how long a resync will take and how fast a
329 replicated disk will become consistent after a crash.
330
331 The activity log consists of a number of 4-Megabyte segments; the
332 al-extents parameter determines how many of those segments can be
333 active at the same time. The default value for al-extents is 1237,
334 with a minimum of 7 and a maximum of 65536.
335
336 Note that the effective maximum may be smaller, depending on how
337 you created the device meta data, see also drbdmeta(8) The
338 effective maximum is 919 * (available on-disk activity-log
339 ring-buffer area/4kB -1), the default 32kB ring-buffer effects a
340 maximum of 6433 (covers more than 25 GiB of data) We recommend to
341 keep this well within the amount your backend storage and
342 replication link are able to resync inside of about 5 minutes.
343
344 al-updates {yes | no}
345
346 With this parameter, the activity log can be turned off entirely
347 (see the al-extents parameter). This will speed up writes because
348 fewer meta-data writes will be necessary, but the entire device
349 needs to be resynchronized opon recovery of a failed primary node.
350 The default value for al-updates is yes.
351
352 disk-barrier,
353 disk-flushes,
354 disk-drain
355 DRBD has three methods of handling the ordering of dependent write
356 requests:
357
358 disk-barrier
359 Use disk barriers to make sure that requests are written to
360 disk in the right order. Barriers ensure that all requests
361 submitted before a barrier make it to the disk before any
362 requests submitted after the barrier. This is implemented using
363 'tagged command queuing' on SCSI devices and 'native command
364 queuing' on SATA devices. Only some devices and device stacks
365 support this method. The device mapper (LVM) only supports
366 barriers in some configurations.
367
368 Note that on systems which do not support disk barriers,
369 enabling this option can lead to data loss or corruption. Until
370 DRBD 8.4.1, disk-barrier was turned on if the I/O stack below
371 DRBD did support barriers. Kernels since linux-2.6.36 (or
372 2.6.32 RHEL6) no longer allow to detect if barriers are
373 supported. Since drbd-8.4.2, this option is off by default and
374 needs to be enabled explicitly.
375
376 disk-flushes
377 Use disk flushes between dependent write requests, also
378 referred to as 'force unit access' by drive vendors. This
379 forces all data to disk. This option is enabled by default.
380
381 disk-drain
382 Wait for the request queue to "drain" (that is, wait for the
383 requests to finish) before submitting a dependent write
384 request. This method requires that requests are stable on disk
385 when they finish. Before DRBD 8.0.9, this was the only method
386 implemented. This option is enabled by default. Do not disable
387 in production environments.
388
389 From these three methods, drbd will use the first that is enabled
390 and supported by the backing storage device. If all three of these
391 options are turned off, DRBD will submit write requests without
392 bothering about dependencies. Depending on the I/O stack, write
393 requests can be reordered, and they can be submitted in a different
394 order on different cluster nodes. This can result in data loss or
395 corruption. Therefore, turning off all three methods of controlling
396 write ordering is strongly discouraged.
397
398 A general guideline for configuring write ordering is to use disk
399 barriers or disk flushes when using ordinary disks (or an ordinary
400 disk array) with a volatile write cache. On storage without cache
401 or with a battery backed write cache, disk draining can be a
402 reasonable choice.
403
404 disk-timeout
405 If the lower-level device on which a DRBD device stores its data
406 does not finish an I/O request within the defined disk-timeout,
407 DRBD treats this as a failure. The lower-level device is detached,
408 and the device's disk state advances to Diskless. If DRBD is
409 connected to one or more peers, the failed request is passed on to
410 one of them.
411
412 This option is dangerous and may lead to kernel panic!
413
414 "Aborting" requests, or force-detaching the disk, is intended for
415 completely blocked/hung local backing devices which do no longer
416 complete requests at all, not even do error completions. In this
417 situation, usually a hard-reset and failover is the only way out.
418
419 By "aborting", basically faking a local error-completion, we allow
420 for a more graceful swichover by cleanly migrating services. Still
421 the affected node has to be rebooted "soon".
422
423 By completing these requests, we allow the upper layers to re-use
424 the associated data pages.
425
426 If later the local backing device "recovers", and now DMAs some
427 data from disk into the original request pages, in the best case it
428 will just put random data into unused pages; but typically it will
429 corrupt meanwhile completely unrelated data, causing all sorts of
430 damage.
431
432 Which means delayed successful completion, especially for READ
433 requests, is a reason to panic(). We assume that a delayed *error*
434 completion is OK, though we still will complain noisily about it.
435
436 The default value of disk-timeout is 0, which stands for an
437 infinite timeout. Timeouts are specified in units of 0.1 seconds.
438 This option is available since DRBD 8.3.12.
439
440 md-flushes
441 Enable disk flushes and disk barriers on the meta-data device. This
442 option is enabled by default. See the disk-flushes parameter.
443
444 on-io-error handler
445
446 Configure how DRBD reacts to I/O errors on a lower-level device.
447 The following policies are defined:
448
449 pass_on
450 Change the disk status to Inconsistent, mark the failed block
451 as inconsistent in the bitmap, and retry the I/O operation on a
452 remote cluster node.
453
454 call-local-io-error
455 Call the local-io-error handler (see the handlers section).
456
457 detach
458 Detach the lower-level device and continue in diskless mode.
459
460
461 read-balancing policy
462 Distribute read requests among cluster nodes as defined by policy.
463 The supported policies are prefer-local (the default),
464 prefer-remote, round-robin, least-pending, when-congested-remote,
465 32K-striping, 64K-striping, 128K-striping, 256K-striping,
466 512K-striping and 1M-striping.
467
468 This option is available since DRBD 8.4.1.
469
470 resync-after res-name/volume
471
472 Define that a device should only resynchronize after the specified
473 other device. By default, no order between devices is defined, and
474 all devices will resynchronize in parallel. Depending on the
475 configuration of the lower-level devices, and the available network
476 and disk bandwidth, this can slow down the overall resync process.
477 This option can be used to form a chain or tree of dependencies
478 among devices.
479
480 rs-discard-granularity byte
481 When rs-discard-granularity is set to a non zero, positive value
482 then DRBD tries to do a resync operation in requests of this size.
483 In case such a block contains only zero bytes on the sync source
484 node, the sync target node will issue a discard/trim/unmap command
485 for the area.
486
487 The value is constrained by the discard granularity of the backing
488 block device. In case rs-discard-granularity is not a multiplier of
489 the discard granularity of the backing block device DRBD rounds it
490 up. The feature only gets active if the backing block device reads
491 back zeroes after a discard command.
492
493 The usage of rs-discard-granularity may cause c-max-rate to be
494 exceeded. In particular, the resync rate may reach 10x the value of
495 rs-discard-granularity per second.
496
497 The default value of rs-discard-granularity is 0. This option is
498 available since 8.4.7.
499
500 discard-zeroes-if-aligned {yes | no}
501
502 There are several aspects to discard/trim/unmap support on linux
503 block devices. Even if discard is supported in general, it may fail
504 silently, or may partially ignore discard requests. Devices also
505 announce whether reading from unmapped blocks returns defined data
506 (usually zeroes), or undefined data (possibly old data, possibly
507 garbage).
508
509 If on different nodes, DRBD is backed by devices with differing
510 discard characteristics, discards may lead to data divergence (old
511 data or garbage left over on one backend, zeroes due to unmapped
512 areas on the other backend). Online verify would now potentially
513 report tons of spurious differences. While probably harmless for
514 most use cases (fstrim on a file system), DRBD cannot have that.
515
516 To play safe, we have to disable discard support, if our local
517 backend (on a Primary) does not support "discard_zeroes_data=true".
518 We also have to translate discards to explicit zero-out on the
519 receiving side, unless the receiving side (Secondary) supports
520 "discard_zeroes_data=true", thereby allocating areas what were
521 supposed to be unmapped.
522
523 There are some devices (notably the LVM/DM thin provisioning) that
524 are capable of discard, but announce discard_zeroes_data=false. In
525 the case of DM-thin, discards aligned to the chunk size will be
526 unmapped, and reading from unmapped sectors will return zeroes.
527 However, unaligned partial head or tail areas of discard requests
528 will be silently ignored.
529
530 If we now add a helper to explicitly zero-out these unaligned
531 partial areas, while passing on the discard of the aligned full
532 chunks, we effectively achieve discard_zeroes_data=true on such
533 devices.
534
535 Setting discard-zeroes-if-aligned to yes will allow DRBD to use
536 discards, and to announce discard_zeroes_data=true, even on
537 backends that announce discard_zeroes_data=false.
538
539 Setting discard-zeroes-if-aligned to no will cause DRBD to always
540 fall-back to zero-out on the receiving side, and to not even
541 announce discard capabilities on the Primary, if the respective
542 backend announces discard_zeroes_data=false.
543
544 We used to ignore the discard_zeroes_data setting completely. To
545 not break established and expected behaviour, and suddenly cause
546 fstrim on thin-provisioned LVs to run out-of-space instead of
547 freeing up space, the default value is yes.
548
549 This option is available since 8.4.7.
550
551 disable-write-same {yes | no}
552
553 Some disks announce WRITE_SAME support to the kernel but fail with
554 an I/O error upon actually receiving such a request. This mostly
555 happens when using virtualized disks -- notably, this behavior has
556 been observed with VMware's virtual disks.
557
558 When disable-write-same is set to yes, WRITE_SAME detection is
559 manually overriden and support is disabled.
560
561 The default value of disable-write-same is no. This option is
562 available since 8.4.7.
563
564 Section peer-device-options Parameters
565 Please note that you open the section with the disk keyword.
566
567 c-delay-target delay_target,
568 c-fill-target fill_target,
569 c-max-rate max_rate,
570 c-plan-ahead plan_time
571 Dynamically control the resync speed. The following modes are
572 available:
573
574 • Dynamic control with fill target (default). Enabled when
575 c-plan-ahead is non-zero and c-fill-target is non-zero. The
576 goal is to fill the buffers along the data path with a defined
577 amount of data. This mode is recommended when DRBD-proxy is
578 used. Configured with c-plan-ahead, c-fill-target and
579 c-max-rate.
580
581 • Dynamic control with delay target. Enabled when c-plan-ahead is
582 non-zero (default) and c-fill-target is zero. The goal is to
583 have a defined delay along the path. Configured with
584 c-plan-ahead, c-delay-target and c-max-rate.
585
586 • Fixed resync rate. Enabled when c-plan-ahead is zero. DRBD will
587 try to perform resync I/O at a fixed rate. Configured with
588 resync-rate.
589
590 The c-plan-ahead parameter defines how fast DRBD adapts to changes
591 in the resync speed. It should be set to five times the network
592 round-trip time or more. The default value of c-plan-ahead is 20,
593 in units of 0.1 seconds.
594
595 The c-fill-target parameter defines the how much resync data DRBD
596 should aim to have in-flight at all times. Common values for
597 "normal" data paths range from 4K to 100K. The default value of
598 c-fill-target is 100, in units of sectors
599
600 The c-delay-target parameter defines the delay in the resync path
601 that DRBD should aim for. This should be set to five times the
602 network round-trip time or more. The default value of
603 c-delay-target is 10, in units of 0.1 seconds.
604
605 The c-max-rate parameter limits the maximum bandwidth used by
606 dynamically controlled resyncs. Setting this to zero removes the
607 limitation (since DRBD 9.0.28). It should be set to either the
608 bandwidth available between the DRBD hosts and the machines hosting
609 DRBD-proxy, or to the available disk bandwidth. The default value
610 of c-max-rate is 102400, in units of KiB/s.
611
612 Dynamic resync speed control is available since DRBD 8.3.9.
613
614 c-min-rate min_rate
615 A node which is primary and sync-source has to schedule application
616 I/O requests and resync I/O requests. The c-min-rate parameter
617 limits how much bandwidth is available for resync I/O; the
618 remaining bandwidth is used for application I/O.
619
620 A c-min-rate value of 0 means that there is no limit on the resync
621 I/O bandwidth. This can slow down application I/O significantly.
622 Use a value of 1 (1 KiB/s) for the lowest possible resync rate.
623
624 The default value of c-min-rate is 250, in units of KiB/s.
625
626 resync-rate rate
627
628 Define how much bandwidth DRBD may use for resynchronizing. DRBD
629 allows "normal" application I/O even during a resync. If the resync
630 takes up too much bandwidth, application I/O can become very slow.
631 This parameter allows to avoid that. Please note this is option
632 only works when the dynamic resync controller is disabled.
633
634 Section global Parameters
635 dialog-refresh time
636
637 The DRBD init script can be used to configure and start DRBD
638 devices, which can involve waiting for other cluster nodes. While
639 waiting, the init script shows the remaining waiting time. The
640 dialog-refresh defines the number of seconds between updates of
641 that countdown. The default value is 1; a value of 0 turns off the
642 countdown.
643
644 disable-ip-verification
645 Normally, DRBD verifies that the IP addresses in the configuration
646 match the host names. Use the disable-ip-verification parameter to
647 disable these checks.
648
649 usage-count {yes | no | ask}
650 A explained on DRBD's Online Usage Counter[2] web page, DRBD
651 includes a mechanism for anonymously counting how many
652 installations are using which versions of DRBD. The results are
653 available on the web page for anyone to see.
654
655 This parameter defines if a cluster node participates in the usage
656 counter; the supported values are yes, no, and ask (ask the user,
657 the default).
658
659 We would like to ask users to participate in the online usage
660 counter as this provides us valuable feedback for steering the
661 development of DRBD.
662
663 udev-always-use-vnr
664 When udev asks drbdadm for a list of device related symlinks,
665 drbdadm would suggest symlinks with differing naming conventions,
666 depending on whether the resource has explicit volume VNR { }
667 definitions, or only one single volume with the implicit volume
668 number 0:
669
670 # implicit single volume without "volume 0 {}" block
671 DEVICE=drbd<minor>
672 SYMLINK_BY_RES=drbd/by-res/<resource-name>
673 SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
674
675 # explicit volume definition: volume VNR { }
676 DEVICE=drbd<minor>
677 SYMLINK_BY_RES=drbd/by-res/<resource-name>/VNR
678 SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
679
680 If you define this parameter in the global section, drbdadm will
681 always add the .../VNR part, and will not care for whether the
682 volume definition was implicit or explicit.
683
684 For legacy backward compatibility, this is off by default, but we
685 do recommend to enable it.
686
687 Section handlers Parameters
688 after-resync-target cmd
689
690 Called on a resync target when a node state changes from
691 Inconsistent to Consistent when a resync finishes. This handler can
692 be used for removing the snapshot created in the
693 before-resync-target handler.
694
695 before-resync-target cmd
696
697 Called on a resync target before a resync begins. This handler can
698 be used for creating a snapshot of the lower-level device for the
699 duration of the resync: if the resync source becomes unavailable
700 during a resync, reverting to the snapshot can restore a consistent
701 state.
702
703 before-resync-source cmd
704
705 Called on a resync source before a resync begins.
706
707 out-of-sync cmd
708
709 Called on all nodes after a verify finishes and out-of-sync blocks
710 were found. This handler is mainly used for monitoring purposes. An
711 example would be to call a script that sends an alert SMS.
712
713 quorum-lost cmd
714
715 Called on a Primary that lost quorum. This handler is usually used
716 to reboot the node if it is not possible to restart the application
717 that uses the storage on top of DRBD.
718
719 fence-peer cmd
720
721 Called when a node should fence a resource on a particular peer.
722 The handler should not use the same communication path that DRBD
723 uses for talking to the peer.
724
725 unfence-peer cmd
726
727 Called when a node should remove fencing constraints from other
728 nodes.
729
730 initial-split-brain cmd
731
732 Called when DRBD connects to a peer and detects that the peer is in
733 a split-brain state with the local node. This handler is also
734 called for split-brain scenarios which will be resolved
735 automatically.
736
737 local-io-error cmd
738
739 Called when an I/O error occurs on a lower-level device.
740
741 pri-lost cmd
742
743 The local node is currently primary, but DRBD believes that it
744 should become a sync target. The node should give up its primary
745 role.
746
747 pri-lost-after-sb cmd
748
749 The local node is currently primary, but it has lost the
750 after-split-brain auto recovery procedure. The node should be
751 abandoned.
752
753 pri-on-incon-degr cmd
754
755 The local node is primary, and neither the local lower-level device
756 nor a lower-level device on a peer is up to date. (The primary has
757 no device to read from or to write to.)
758
759 split-brain cmd
760
761 DRBD has detected a split-brain situation which could not be
762 resolved automatically. Manual recovery is necessary. This handler
763 can be used to call for administrator attention.
764
765 disconnected cmd
766
767 A connection to a peer went down. The handler can learn about the
768 reason for the disconnect from the DRBD_CSTATE environment
769 variable.
770
771 Section net Parameters
772 after-sb-0pri policy
773 Define how to react if a split-brain scenario is detected and none
774 of the two nodes is in primary role. (We detect split-brain
775 scenarios when two nodes connect; split-brain decisions are always
776 between two nodes.) The defined policies are:
777
778 disconnect
779 No automatic resynchronization; simply disconnect.
780
781 discard-younger-primary,
782 discard-older-primary
783 Resynchronize from the node which became primary first
784 (discard-younger-primary) or last (discard-older-primary). If
785 both nodes became primary independently, the
786 discard-least-changes policy is used.
787
788 discard-zero-changes
789 If only one of the nodes wrote data since the split brain
790 situation was detected, resynchronize from this node to the
791 other. If both nodes wrote data, disconnect.
792
793 discard-least-changes
794 Resynchronize from the node with more modified blocks.
795
796 discard-node-nodename
797 Always resynchronize to the named node.
798
799 after-sb-1pri policy
800 Define how to react if a split-brain scenario is detected, with one
801 node in primary role and one node in secondary role. (We detect
802 split-brain scenarios when two nodes connect, so split-brain
803 decisions are always among two nodes.) The defined policies are:
804
805 disconnect
806 No automatic resynchronization, simply disconnect.
807
808 consensus
809 Discard the data on the secondary node if the after-sb-0pri
810 algorithm would also discard the data on the secondary node.
811 Otherwise, disconnect.
812
813 violently-as0p
814 Always take the decision of the after-sb-0pri algorithm, even
815 if it causes an erratic change of the primary's view of the
816 data. This is only useful if a single-node file system (i.e.,
817 not OCFS2 or GFS) with the allow-two-primaries flag is used.
818 This option can cause the primary node to crash, and should not
819 be used.
820
821 discard-secondary
822 Discard the data on the secondary node.
823
824 call-pri-lost-after-sb
825 Always take the decision of the after-sb-0pri algorithm. If the
826 decision is to discard the data on the primary node, call the
827 pri-lost-after-sb handler on the primary node.
828
829 after-sb-2pri policy
830 Define how to react if a split-brain scenario is detected and both
831 nodes are in primary role. (We detect split-brain scenarios when
832 two nodes connect, so split-brain decisions are always among two
833 nodes.) The defined policies are:
834
835 disconnect
836 No automatic resynchronization, simply disconnect.
837
838 violently-as0p
839 See the violently-as0p policy for after-sb-1pri.
840
841 call-pri-lost-after-sb
842 Call the pri-lost-after-sb helper program on one of the
843 machines unless that machine can demote to secondary. The
844 helper program is expected to reboot the machine, which brings
845 the node into a secondary role. Which machine runs the helper
846 program is determined by the after-sb-0pri strategy.
847
848 allow-two-primaries
849
850 The most common way to configure DRBD devices is to allow only one
851 node to be primary (and thus writable) at a time.
852
853 In some scenarios it is preferable to allow two nodes to be primary
854 at once; a mechanism outside of DRBD then must make sure that
855 writes to the shared, replicated device happen in a coordinated
856 way. This can be done with a shared-storage cluster file system
857 like OCFS2 and GFS, or with virtual machine images and a virtual
858 machine manager that can migrate virtual machines between physical
859 machines.
860
861 The allow-two-primaries parameter tells DRBD to allow two nodes to
862 be primary at the same time. Never enable this option when using a
863 non-distributed file system; otherwise, data corruption and node
864 crashes will result!
865
866 always-asbp
867 Normally the automatic after-split-brain policies are only used if
868 current states of the UUIDs do not indicate the presence of a third
869 node.
870
871 With this option you request that the automatic after-split-brain
872 policies are used as long as the data sets of the nodes are somehow
873 related. This might cause a full sync, if the UUIDs indicate the
874 presence of a third node. (Or double faults led to strange UUID
875 sets.)
876
877 connect-int time
878
879 As soon as a connection between two nodes is configured with
880 drbdsetup connect, DRBD immediately tries to establish the
881 connection. If this fails, DRBD waits for connect-int seconds and
882 then repeats. The default value of connect-int is 10 seconds.
883
884 cram-hmac-alg hash-algorithm
885
886 Configure the hash-based message authentication code (HMAC) or
887 secure hash algorithm to use for peer authentication. The kernel
888 supports a number of different algorithms, some of which may be
889 loadable as kernel modules. See the shash algorithms listed in
890 /proc/crypto. By default, cram-hmac-alg is unset. Peer
891 authentication also requires a shared-secret to be configured.
892
893 csums-alg hash-algorithm
894
895 Normally, when two nodes resynchronize, the sync target requests a
896 piece of out-of-sync data from the sync source, and the sync source
897 sends the data. With many usage patterns, a significant number of
898 those blocks will actually be identical.
899
900 When a csums-alg algorithm is specified, when requesting a piece of
901 out-of-sync data, the sync target also sends along a hash of the
902 data it currently has. The sync source compares this hash with its
903 own version of the data. It sends the sync target the new data if
904 the hashes differ, and tells it that the data are the same
905 otherwise. This reduces the network bandwidth required, at the cost
906 of higher cpu utilization and possibly increased I/O on the sync
907 target.
908
909 The csums-alg can be set to one of the secure hash algorithms
910 supported by the kernel; see the shash algorithms listed in
911 /proc/crypto. By default, csums-alg is unset.
912
913 csums-after-crash-only
914
915 Enabling this option (and csums-alg, above) makes it possible to
916 use the checksum based resync only for the first resync after
917 primary crash, but not for later "network hickups".
918
919 In most cases, block that are marked as need-to-be-resynced are in
920 fact changed, so calculating checksums, and both reading and
921 writing the blocks on the resync target is all effective overhead.
922
923 The advantage of checksum based resync is mostly after primary
924 crash recovery, where the recovery marked larger areas (those
925 covered by the activity log) as need-to-be-resynced, just in case.
926 Introduced in 8.4.5.
927
928 data-integrity-alg alg
929 DRBD normally relies on the data integrity checks built into the
930 TCP/IP protocol, but if a data integrity algorithm is configured,
931 it will additionally use this algorithm to make sure that the data
932 received over the network match what the sender has sent. If a data
933 integrity error is detected, DRBD will close the network connection
934 and reconnect, which will trigger a resync.
935
936 The data-integrity-alg can be set to one of the secure hash
937 algorithms supported by the kernel; see the shash algorithms listed
938 in /proc/crypto. By default, this mechanism is turned off.
939
940 Because of the CPU overhead involved, we recommend not to use this
941 option in production environments. Also see the notes on data
942 integrity below.
943
944 fencing fencing_policy
945
946 Fencing is a preventive measure to avoid situations where both
947 nodes are primary and disconnected. This is also known as a
948 split-brain situation. DRBD supports the following fencing
949 policies:
950
951 dont-care
952 No fencing actions are taken. This is the default policy.
953
954 resource-only
955 If a node becomes a disconnected primary, it tries to fence the
956 peer. This is done by calling the fence-peer handler. The
957 handler is supposed to reach the peer over an alternative
958 communication path and call 'drbdadm outdate minor' there.
959
960 resource-and-stonith
961 If a node becomes a disconnected primary, it freezes all its IO
962 operations and calls its fence-peer handler. The fence-peer
963 handler is supposed to reach the peer over an alternative
964 communication path and call 'drbdadm outdate minor' there. In
965 case it cannot do that, it should stonith the peer. IO is
966 resumed as soon as the situation is resolved. In case the
967 fence-peer handler fails, I/O can be resumed manually with
968 'drbdadm resume-io'.
969
970 ko-count number
971
972 If a secondary node fails to complete a write request in ko-count
973 times the timeout parameter, it is excluded from the cluster. The
974 primary node then sets the connection to this secondary node to
975 Standalone. To disable this feature, you should explicitly set it
976 to 0; defaults may change between versions.
977
978 max-buffers number
979
980 Limits the memory usage per DRBD minor device on the receiving
981 side, or for internal buffers during resync or online-verify. Unit
982 is PAGE_SIZE, which is 4 KiB on most systems. The minimum possible
983 setting is hard coded to 32 (=128 KiB). These buffers are used to
984 hold data blocks while they are written to/read from disk. To avoid
985 possible distributed deadlocks on congestion, this setting is used
986 as a throttle threshold rather than a hard limit. Once more than
987 max-buffers pages are in use, further allocation from this pool is
988 throttled. You want to increase max-buffers if you cannot saturate
989 the IO backend on the receiving side.
990
991 max-epoch-size number
992
993 Define the maximum number of write requests DRBD may issue before
994 issuing a write barrier. The default value is 2048, with a minimum
995 of 1 and a maximum of 20000. Setting this parameter to a value
996 below 10 is likely to decrease performance.
997
998 on-congestion policy,
999 congestion-fill threshold,
1000 congestion-extents threshold
1001 By default, DRBD blocks when the TCP send queue is full. This
1002 prevents applications from generating further write requests until
1003 more buffer space becomes available again.
1004
1005 When DRBD is used together with DRBD-proxy, it can be better to use
1006 the pull-ahead on-congestion policy, which can switch DRBD into
1007 ahead/behind mode before the send queue is full. DRBD then records
1008 the differences between itself and the peer in its bitmap, but it
1009 no longer replicates them to the peer. When enough buffer space
1010 becomes available again, the node resynchronizes with the peer and
1011 switches back to normal replication.
1012
1013 This has the advantage of not blocking application I/O even when
1014 the queues fill up, and the disadvantage that peer nodes can fall
1015 behind much further. Also, while resynchronizing, peer nodes will
1016 become inconsistent.
1017
1018 The available congestion policies are block (the default) and
1019 pull-ahead. The congestion-fill parameter defines how much data is
1020 allowed to be "in flight" in this connection. The default value is
1021 0, which disables this mechanism of congestion control, with a
1022 maximum of 10 GiBytes. The congestion-extents parameter defines how
1023 many bitmap extents may be active before switching into
1024 ahead/behind mode, with the same default and limits as the
1025 al-extents parameter. The congestion-extents parameter is effective
1026 only when set to a value smaller than al-extents.
1027
1028 Ahead/behind mode is available since DRBD 8.3.10.
1029
1030 ping-int interval
1031
1032 When the TCP/IP connection to a peer is idle for more than ping-int
1033 seconds, DRBD will send a keep-alive packet to make sure that a
1034 failed peer or network connection is detected reasonably soon. The
1035 default value is 10 seconds, with a minimum of 1 and a maximum of
1036 120 seconds. The unit is seconds.
1037
1038 ping-timeout timeout
1039
1040 Define the timeout for replies to keep-alive packets. If the peer
1041 does not reply within ping-timeout, DRBD will close and try to
1042 reestablish the connection. The default value is 0.5 seconds, with
1043 a minimum of 0.1 seconds and a maximum of 30 seconds. The unit is
1044 tenths of a second.
1045
1046 socket-check-timeout timeout
1047 In setups involving a DRBD-proxy and connections that experience a
1048 lot of buffer-bloat it might be necessary to set ping-timeout to an
1049 unusual high value. By default DRBD uses the same value to wait if
1050 a newly established TCP-connection is stable. Since the DRBD-proxy
1051 is usually located in the same data center such a long wait time
1052 may hinder DRBD's connect process.
1053
1054 In such setups socket-check-timeout should be set to at least to
1055 the round trip time between DRBD and DRBD-proxy. I.e. in most cases
1056 to 1.
1057
1058 The default unit is tenths of a second, the default value is 0
1059 (which causes DRBD to use the value of ping-timeout instead).
1060 Introduced in 8.4.5.
1061
1062 protocol name
1063 Use the specified protocol on this connection. The supported
1064 protocols are:
1065
1066 A
1067 Writes to the DRBD device complete as soon as they have reached
1068 the local disk and the TCP/IP send buffer.
1069
1070 B
1071 Writes to the DRBD device complete as soon as they have reached
1072 the local disk, and all peers have acknowledged the receipt of
1073 the write requests.
1074
1075 C
1076 Writes to the DRBD device complete as soon as they have reached
1077 the local and all remote disks.
1078
1079
1080 rcvbuf-size size
1081
1082 Configure the size of the TCP/IP receive buffer. A value of 0 (the
1083 default) causes the buffer size to adjust dynamically. This
1084 parameter usually does not need to be set, but it can be set to a
1085 value up to 10 MiB. The default unit is bytes.
1086
1087 rr-conflict policy
1088 This option helps to solve the cases when the outcome of the resync
1089 decision is incompatible with the current role assignment in the
1090 cluster. The defined policies are:
1091
1092 disconnect
1093 No automatic resynchronization, simply disconnect.
1094
1095 retry-connect
1096 Disconnect now, and retry to connect immediatly afterwards.
1097
1098 violently
1099 Resync to the primary node is allowed, violating the assumption
1100 that data on a block device are stable for one of the nodes.
1101 Do not use this option, it is dangerous.
1102
1103 call-pri-lost
1104 Call the pri-lost handler on one of the machines. The handler
1105 is expected to reboot the machine, which puts it into secondary
1106 role.
1107
1108 auto-discard
1109 Auto-discard reverses the resync direction, so that DRBD
1110 resyncs the current primary to the current secondary.
1111 Auto-discard only applies when protocol A is in use and the
1112 resync decision is based on the principle that a crashed
1113 primary should be the source of a resync. When a primary node
1114 crashes, it might have written some last updates to its disk,
1115 which were not received by a protocol A secondary. By promoting
1116 the secondary in the meantime the user accepted that those last
1117 updates have been lost. By using auto-discard you consent that
1118 the last updates (before the crash of the primary) should be
1119 rolled back automatically.
1120
1121 shared-secret secret
1122
1123 Configure the shared secret used for peer authentication. The
1124 secret is a string of up to 64 characters. Peer authentication also
1125 requires the cram-hmac-alg parameter to be set.
1126
1127 sndbuf-size size
1128
1129 Configure the size of the TCP/IP send buffer. Since DRBD 8.0.13 /
1130 8.2.7, a value of 0 (the default) causes the buffer size to adjust
1131 dynamically. Values below 32 KiB are harmful to the throughput on
1132 this connection. Large buffer sizes can be useful especially when
1133 protocol A is used over high-latency networks; the maximum value
1134 supported is 10 MiB.
1135
1136 tcp-cork
1137 By default, DRBD uses the TCP_CORK socket option to prevent the
1138 kernel from sending partial messages; this results in fewer and
1139 bigger packets on the network. Some network stacks can perform
1140 worse with this optimization. On these, the tcp-cork parameter can
1141 be used to turn this optimization off.
1142
1143 timeout time
1144
1145 Define the timeout for replies over the network: if a peer node
1146 does not send an expected reply within the specified timeout, it is
1147 considered dead and the TCP/IP connection is closed. The timeout
1148 value must be lower than connect-int and lower than ping-int. The
1149 default is 6 seconds; the value is specified in tenths of a second.
1150
1151 transport type
1152
1153 With DRBD9 the network transport used by DRBD is loaded as a
1154 seperate module. With this option you can specify which transport
1155 and module to load. At present only two options exist, tcp and
1156 rdma. Please note that currently the RDMA transport module is only
1157 available with a license purchased from LINBIT. Default is tcp.
1158
1159 use-rle
1160
1161 Each replicated device on a cluster node has a separate bitmap for
1162 each of its peer devices. The bitmaps are used for tracking the
1163 differences between the local and peer device: depending on the
1164 cluster state, a disk range can be marked as different from the
1165 peer in the device's bitmap, in the peer device's bitmap, or in
1166 both bitmaps. When two cluster nodes connect, they exchange each
1167 other's bitmaps, and they each compute the union of the local and
1168 peer bitmap to determine the overall differences.
1169
1170 Bitmaps of very large devices are also relatively large, but they
1171 usually compress very well using run-length encoding. This can save
1172 time and bandwidth for the bitmap transfers.
1173
1174 The use-rle parameter determines if run-length encoding should be
1175 used. It is on by default since DRBD 8.4.0.
1176
1177 verify-alg hash-algorithm
1178 Online verification (drbdadm verify) computes and compares
1179 checksums of disk blocks (i.e., hash values) in order to detect if
1180 they differ. The verify-alg parameter determines which algorithm to
1181 use for these checksums. It must be set to one of the secure hash
1182 algorithms supported by the kernel before online verify can be
1183 used; see the shash algorithms listed in /proc/crypto.
1184
1185 We recommend to schedule online verifications regularly during
1186 low-load periods, for example once a month. Also see the notes on
1187 data integrity below.
1188
1189 allow-remote-read bool-value
1190 Allows or disallows DRBD to read from a peer node.
1191
1192 When the disk of a primary node is detached, DRBD will try to
1193 continue reading and writing from another node in the cluster. For
1194 this purpose, it searches for nodes with up-to-date data, and uses
1195 any found node to resume operations. In some cases it may not be
1196 desirable to read back data from a peer node, because the node
1197 should only be used as a replication target. In this case, the
1198 allow-remote-read parameter can be set to no, which would prohibit
1199 this node from reading data from the peer node.
1200
1201 The allow-remote-read parameter is available since DRBD 9.0.19, and
1202 defaults to yes.
1203
1204 Section on Parameters
1205 address [address-family] address:port
1206
1207 Defines the address family, address, and port of a connection
1208 endpoint.
1209
1210 The address families ipv4, ipv6, ssocks (Dolphin Interconnect
1211 Solutions' "super sockets"), sdp (Infiniband Sockets Direct
1212 Protocol), and sci are supported (sci is an alias for ssocks). If
1213 no address family is specified, ipv4 is assumed. For all address
1214 families except ipv6, the address is specified in IPV4 address
1215 notation (for example, 1.2.3.4). For ipv6, the address is enclosed
1216 in brackets and uses IPv6 address notation (for example,
1217 [fd01:2345:6789:abcd::1]). The port is always specified as a
1218 decimal number from 1 to 65535.
1219
1220 On each host, the port numbers must be unique for each address;
1221 ports cannot be shared.
1222
1223 node-id value
1224
1225 Defines the unique node identifier for a node in the cluster. Node
1226 identifiers are used to identify individual nodes in the network
1227 protocol, and to assign bitmap slots to nodes in the metadata.
1228
1229 Node identifiers can only be reasssigned in a cluster when the
1230 cluster is down. It is essential that the node identifiers in the
1231 configuration and in the device metadata are changed consistently
1232 on all hosts. To change the metadata, dump the current state with
1233 drbdmeta dump-md, adjust the bitmap slot assignment, and update the
1234 metadata with drbdmeta restore-md.
1235
1236 The node-id parameter exists since DRBD 9. Its value ranges from 0
1237 to 16; there is no default.
1238
1239 Section options Parameters (Resource Options)
1240 auto-promote bool-value
1241 A resource must be promoted to primary role before any of its
1242 devices can be mounted or opened for writing.
1243
1244 Before DRBD 9, this could only be done explicitly ("drbdadm
1245 primary"). Since DRBD 9, the auto-promote parameter allows to
1246 automatically promote a resource to primary role when one of its
1247 devices is mounted or opened for writing. As soon as all devices
1248 are unmounted or closed with no more remaining users, the role of
1249 the resource changes back to secondary.
1250
1251 Automatic promotion only succeeds if the cluster state allows it
1252 (that is, if an explicit drbdadm primary command would succeed).
1253 Otherwise, mounting or opening the device fails as it already did
1254 before DRBD 9: the mount(2) system call fails with errno set to
1255 EROFS (Read-only file system); the open(2) system call fails with
1256 errno set to EMEDIUMTYPE (wrong medium type).
1257
1258 Irrespective of the auto-promote parameter, if a device is promoted
1259 explicitly (drbdadm primary), it also needs to be demoted
1260 explicitly (drbdadm secondary).
1261
1262 The auto-promote parameter is available since DRBD 9.0.0, and
1263 defaults to yes.
1264
1265 cpu-mask cpu-mask
1266
1267 Set the cpu affinity mask for DRBD kernel threads. The cpu mask is
1268 specified as a hexadecimal number. The default value is 0, which
1269 lets the scheduler decide which kernel threads run on which CPUs.
1270 CPU numbers in cpu-mask which do not exist in the system are
1271 ignored.
1272
1273 on-no-data-accessible policy
1274 Determine how to deal with I/O requests when the requested data is
1275 not available locally or remotely (for example, when all disks have
1276 failed). When quorum is enabled, on-no-data-accessible should be
1277 set to the same value as on-no-quorum. The defined policies are:
1278
1279 io-error
1280 System calls fail with errno set to EIO.
1281
1282 suspend-io
1283 The resource suspends I/O. I/O can be resumed by (re)attaching
1284 the lower-level device, by connecting to a peer which has
1285 access to the data, or by forcing DRBD to resume I/O with
1286 drbdadm resume-io res. When no data is available, forcing I/O
1287 to resume will result in the same behavior as the io-error
1288 policy.
1289
1290 This setting is available since DRBD 8.3.9; the default policy is
1291 io-error.
1292
1293 peer-ack-window value
1294
1295 On each node and for each device, DRBD maintains a bitmap of the
1296 differences between the local and remote data for each peer device.
1297 For example, in a three-node setup (nodes A, B, C) each with a
1298 single device, every node maintains one bitmap for each of its
1299 peers.
1300
1301 When nodes receive write requests, they know how to update the
1302 bitmaps for the writing node, but not how to update the bitmaps
1303 between themselves. In this example, when a write request
1304 propagates from node A to B and C, nodes B and C know that they
1305 have the same data as node A, but not whether or not they both have
1306 the same data.
1307
1308 As a remedy, the writing node occasionally sends peer-ack packets
1309 to its peers which tell them which state they are in relative to
1310 each other.
1311
1312 The peer-ack-window parameter specifies how much data a primary
1313 node may send before sending a peer-ack packet. A low value causes
1314 increased network traffic; a high value causes less network traffic
1315 but higher memory consumption on secondary nodes and higher resync
1316 times between the secondary nodes after primary node failures.
1317 (Note: peer-ack packets may be sent due to other reasons as well,
1318 e.g. membership changes or expiry of the peer-ack-delay timer.)
1319
1320 The default value for peer-ack-window is 2 MiB, the default unit is
1321 sectors. This option is available since 9.0.0.
1322
1323 peer-ack-delay expiry-time
1324
1325 If after the last finished write request no new write request gets
1326 issued for expiry-time, then a peer-ack packet is sent. If a new
1327 write request is issued before the timer expires, the timer gets
1328 reset to expiry-time. (Note: peer-ack packets may be sent due to
1329 other reasons as well, e.g. membership changes or the
1330 peer-ack-window option.)
1331
1332 This parameter may influence resync behavior on remote nodes. Peer
1333 nodes need to wait until they receive an peer-ack for releasing a
1334 lock on an AL-extent. Resync operations between peers may need to
1335 wait for for these locks.
1336
1337 The default value for peer-ack-delay is 100 milliseconds, the
1338 default unit is milliseconds. This option is available since 9.0.0.
1339
1340 quorum value
1341
1342 When activated, a cluster partition requires quorum in order to
1343 modify the replicated data set. That means a node in the cluster
1344 partition can only be promoted to primary if the cluster partition
1345 has quorum. Every node with a disk directly connected to the node
1346 that should be promoted counts. If a primary node should execute a
1347 write request, but the cluster partition has lost quorum, it will
1348 freeze IO or reject the write request with an error (depending on
1349 the on-no-quorum setting). Upon loosing quorum a primary always
1350 invokes the quorum-lost handler. The handler is intended for
1351 notification purposes, its return code is ignored.
1352
1353 The option's value might be set to off, majority, all or a numeric
1354 value. If you set it to a numeric value, make sure that the value
1355 is greater than half of your number of nodes. Quorum is a mechanism
1356 to avoid data divergence, it might be used instead of fencing when
1357 there are more than two repicas. It defaults to off
1358
1359 If all missing nodes are marked as outdated, a partition always has
1360 quorum, no matter how small it is. I.e. If you disconnect all
1361 secondary nodes gracefully a single primary continues to operate.
1362 In the moment a single secondary is lost, it has to be assumed that
1363 it forms a partition with all the missing outdated nodes. In case
1364 my partition might be smaller than the other, quorum is lost in
1365 this moment.
1366
1367 In case you want to allow permanently diskless nodes to gain quorum
1368 it is recommendet to not use majority or all. It is recommended to
1369 specify an absolute number, since DBRD's heuristic to determine the
1370 complete number of diskfull nodes in the cluster is unreliable.
1371
1372 The quorum implementation is available starting with the DRBD
1373 kernel driver version 9.0.7.
1374
1375 quorum-minimum-redundancy value
1376
1377 This option sets the minimal required number of nodes with an
1378 UpToDate disk to allow the partition to gain quorum. This is a
1379 different requirement than the plain quorum option expresses.
1380
1381 The option's value might be set to off, majority, all or a numeric
1382 value. If you set it to a numeric value, make sure that the value
1383 is greater than half of your number of nodes.
1384
1385 In case you want to allow permanently diskless nodes to gain quorum
1386 it is recommendet to not use majority or all. It is recommended to
1387 specify an absolute number, since DBRD's heuristic to determine the
1388 complete number of diskfull nodes in the cluster is unreliable.
1389
1390 This option is available starting with the DRBD kernel driver
1391 version 9.0.10.
1392
1393 on-no-quorum {io-error | suspend-io}
1394
1395 By default DRBD freezes IO on a device, that lost quorum. By
1396 setting the on-no-quorum to io-error it completes all IO operations
1397 with an error if quorum is lost.
1398
1399 Usually, the on-no-data-accessible should be set to the same value
1400 as on-no-quorum, as it has precedence.
1401
1402 The on-no-quorum options is available starting with the DRBD kernel
1403 driver version 9.0.8.
1404
1405 on-suspended-primary-outdated {disconnect | force-secondary}
1406
1407 This setting is only relevant when on-no-quorum is set to
1408 suspend-io. It is relevant in the following scenario. A primary
1409 node loses quorum hence has all IO requests frozen. This primary
1410 node then connects to another, quorate partition. It detects that a
1411 node in this quorate partition was promoted to primary, and started
1412 a newer data-generation there. As a result, the first primary
1413 learns that it has to consider itself outdated.
1414
1415 When it is set to force-secondary then it will demote to secondary
1416 immediately, and fail all pending (and new) IO requests with IO
1417 errors. It will refuse to allow any process to open the DRBD
1418 devices until all openers closed the device. This state is visible
1419 in status and events2 under the name force-io-failures.
1420
1421 The disconnect setting simply causes that node to reject connect
1422 attempts and stay isolated.
1423
1424 The on-suspended-primary-outdated option is available starting with
1425 the DRBD kernel driver version 9.1.7. It has a default value of
1426 disconnect.
1427
1428 Section startup Parameters
1429 The parameters in this section define the behavior of DRBD at system
1430 startup time, in the DRBD init script. They have no effect once the
1431 system is up and running.
1432
1433 degr-wfc-timeout timeout
1434
1435 Define how long to wait until all peers are connected in case the
1436 cluster consisted of a single node only when the system went down.
1437 This parameter is usually set to a value smaller than wfc-timeout.
1438 The assumption here is that peers which were unreachable before a
1439 reboot are less likely to be reachable after the reboot, so waiting
1440 is less likely to help.
1441
1442 The timeout is specified in seconds. The default value is 0, which
1443 stands for an infinite timeout. Also see the wfc-timeout parameter.
1444
1445 outdated-wfc-timeout timeout
1446
1447 Define how long to wait until all peers are connected if all peers
1448 were outdated when the system went down. This parameter is usually
1449 set to a value smaller than wfc-timeout. The assumption here is
1450 that an outdated peer cannot have become primary in the meantime,
1451 so we don't need to wait for it as long as for a node which was
1452 alive before.
1453
1454 The timeout is specified in seconds. The default value is 0, which
1455 stands for an infinite timeout. Also see the wfc-timeout parameter.
1456
1457 stacked-timeouts
1458 On stacked devices, the wfc-timeout and degr-wfc-timeout parameters
1459 in the configuration are usually ignored, and both timeouts are set
1460 to twice the connect-int timeout. The stacked-timeouts parameter
1461 tells DRBD to use the wfc-timeout and degr-wfc-timeout parameters
1462 as defined in the configuration, even on stacked devices. Only use
1463 this parameter if the peer of the stacked resource is usually not
1464 available, or will not become primary. Incorrect use of this
1465 parameter can lead to unexpected split-brain scenarios.
1466
1467 wait-after-sb
1468 This parameter causes DRBD to continue waiting in the init script
1469 even when a split-brain situation has been detected, and the nodes
1470 therefore refuse to connect to each other.
1471
1472 wfc-timeout timeout
1473
1474 Define how long the init script waits until all peers are
1475 connected. This can be useful in combination with a cluster manager
1476 which cannot manage DRBD resources: when the cluster manager
1477 starts, the DRBD resources will already be up and running. With a
1478 more capable cluster manager such as Pacemaker, it makes more sense
1479 to let the cluster manager control DRBD resources. The timeout is
1480 specified in seconds. The default value is 0, which stands for an
1481 infinite timeout. Also see the degr-wfc-timeout parameter.
1482
1483 Section volume Parameters
1484 device /dev/drbdminor-number
1485
1486 Define the device name and minor number of a replicated block
1487 device. This is the device that applications are supposed to
1488 access; in most cases, the device is not used directly, but as a
1489 file system. This parameter is required and the standard device
1490 naming convention is assumed.
1491
1492 In addition to this device, udev will create
1493 /dev/drbd/by-res/resource/volume and
1494 /dev/drbd/by-disk/lower-level-device symlinks to the device.
1495
1496 disk {[disk] | none}
1497
1498 Define the lower-level block device that DRBD will use for storing
1499 the actual data. While the replicated drbd device is configured,
1500 the lower-level device must not be used directly. Even read-only
1501 access with tools like dumpe2fs(8) and similar is not allowed. The
1502 keyword none specifies that no lower-level block device is
1503 configured; this also overrides inheritance of the lower-level
1504 device.
1505
1506 meta-disk internal,
1507 meta-disk device,
1508 meta-disk device [index]
1509
1510 Define where the metadata of a replicated block device resides: it
1511 can be internal, meaning that the lower-level device contains both
1512 the data and the metadata, or on a separate device.
1513
1514 When the index form of this parameter is used, multiple replicated
1515 devices can share the same metadata device, each using a separate
1516 index. Each index occupies 128 MiB of data, which corresponds to a
1517 replicated device size of at most 4 TiB with two cluster nodes. We
1518 recommend not to share metadata devices anymore, and to instead use
1519 the lvm volume manager for creating metadata devices as needed.
1520
1521 When the index form of this parameter is not used, the size of the
1522 lower-level device determines the size of the metadata. The size
1523 needed is 36 KiB + (size of lower-level device) / 32K * (number of
1524 nodes - 1). If the metadata device is bigger than that, the extra
1525 space is not used.
1526
1527 This parameter is required if a disk other than none is specified,
1528 and ignored if disk is set to none. A meta-disk parameter without a
1529 disk parameter is not allowed.
1530
1532 DRBD supports two different mechanisms for data integrity checking:
1533 first, the data-integrity-alg network parameter allows to add a
1534 checksum to the data sent over the network. Second, the online
1535 verification mechanism (drbdadm verify and the verify-alg parameter)
1536 allows to check for differences in the on-disk data.
1537
1538 Both mechanisms can produce false positives if the data is modified
1539 during I/O (i.e., while it is being sent over the network or written to
1540 disk). This does not always indicate a problem: for example, some file
1541 systems and applications do modify data under I/O for certain
1542 operations. Swap space can also undergo changes while under I/O.
1543
1544 Network data integrity checking tries to identify data modification
1545 during I/O by verifying the checksums on the sender side after sending
1546 the data. If it detects a mismatch, it logs an error. The receiver also
1547 logs an error when it detects a mismatch. Thus, an error logged only on
1548 the receiver side indicates an error on the network, and an error
1549 logged on both sides indicates data modification under I/O.
1550
1551 The most recent example of systematic data corruption was identified as
1552 a bug in the TCP offloading engine and driver of a certain type of GBit
1553 NIC in 2007: the data corruption happened on the DMA transfer from core
1554 memory to the card. Because the TCP checksum were calculated on the
1555 card, the TCP/IP protocol checksums did not reveal this problem.
1556
1558 This document was revised for version 9.0.0 of the DRBD distribution.
1559
1561 Written by Philipp Reisner <philipp.reisner@linbit.com> and Lars
1562 Ellenberg <lars.ellenberg@linbit.com>.
1563
1565 Report bugs to <drbd-user@lists.linbit.com>.
1566
1568 Copyright 2001-2018 LINBIT Information Technologies, Philipp Reisner,
1569 Lars Ellenberg. This is free software; see the source for copying
1570 conditions. There is NO warranty; not even for MERCHANTABILITY or
1571 FITNESS FOR A PARTICULAR PURPOSE.
1572
1574 drbd(8), drbdsetup(8), drbdadm(8), DRBD User's Guide[1], DRBD Web
1575 Site[3]
1576
1578 1. DRBD User's Guide
1579 http://www.drbd.org/users-guide/
1580
1581 2.
1582
1583 Online Usage Counter
1584 http://usage.drbd.org
1585
1586 3. DRBD Web Site
1587 http://www.drbd.org/
1588
1589
1590
1591DRBD 9.0.x 17 January 2018 DRBD.CONF(5)