drbd.conf-9.0(5)

1DRBD.CONF(5)                  Configuration Files                 DRBD.CONF(5)
2
3
4

NAME

6       drbd.conf - DRBD Configuration Files
7

INTRODUCTION

9       DRBD implements block devices which replicate their data to all nodes
10       of a cluster. The actual data and associated metadata are usually
11       stored redundantly on "ordinary" block devices on each cluster node.
12
13       Replicated block devices are called /dev/drbdminor by default. They are
14       grouped into resources, with one or more devices per resource.
15       Replication among the devices in a resource takes place in
16       chronological order. With DRBD, we refer to the devices inside a
17       resource as volumes.
18
19       In DRBD 9, a resource can be replicated between two or more cluster
20       nodes. The connections between cluster nodes are point-to-point links,
21       and use TCP or a TCP-like protocol. All nodes must be directly
22       connected.
23
24       DRBD consists of low-level user-space components which interact with
25       the kernel and perform basic operations (drbdsetup, drbdmeta), a
26       high-level user-space component which understands and processes the
27       DRBD configuration and translates it into basic operations of the
28       low-level components (drbdadm), and a kernel component.
29
30       The default DRBD configuration consists of /etc/drbd.conf and of
31       additional files included from there, usually global_common.conf and
32       all *.res files inside /etc/drbd.d/. It has turned out to be useful to
33       define each resource in a separate *.res file.
34
35       The configuration files are designed so that each cluster node can
36       contain an identical copy of the entire cluster configuration. The host
37       name of each node determines which parts of the configuration apply
38       (uname -n). It is highly recommended to keep the cluster configuration
39       on all nodes in sync by manually copying it to all nodes, or by
40       automating the process with csync2 or a similar tool.
41

EXAMPLE CONFIGURATION FILE

43           global {
44                usage-count yes;
45                udev-always-use-vnr;
46           }
47           resource r0 {
48                 net {
49                      cram-hmac-alg sha1;
50                      shared-secret "FooFunFactory";
51                 }
52                 volume 0 {
53                      device    "/dev/drbd1";
54                      disk      "/dev/sda7";
55                      meta-disk internal;
56                 }
57                 on "alice" {
58                      node-id   0;
59                      address   10.1.1.31:7000;
60                 }
61                 on "bob" {
62                      node-id   1;
63                      address   10.1.1.32:7000;
64                 }
65                 connection {
66                      host      "alice"  port 7000;
67                      host      "bob"    port 7000;
68                      net {
69                          protocol C;
70                      }
71                 }
72           }
73
74       This example defines a resource r0 which contains a single replicated
75       device with volume number 0. The resource is replicated among hosts
76       alice and bob, which have the IPv4 addresses 10.1.1.31 and 10.1.1.32
77       and the node identifiers 0 and 1, respectively. On both hosts, the
78       replicated device is called /dev/drbd1, and the actual data and
79       metadata are stored on the lower-level device /dev/sda7. The connection
80       between the hosts uses protocol C.
81
82       Enclose strings within double-quotation marks (") to differentiate them
83       from resource keywords. Please refer to the DRBD User's Guide[1] for
84       more examples.
85

FILE FORMAT

87       DRBD configuration files consist of sections, which contain other
88       sections and parameters depending on the section types. Each section
89       consists of one or more keywords, sometimes a section name, an opening
90       brace (“{”), the section's contents, and a closing brace (“}”).
91       Parameters inside a section consist of a keyword, followed by one or
92       more keywords or values, and a semicolon (“;”).
93
94       Some parameter values have a default scale which applies when a plain
95       number is specified (for example Kilo, or 1024 times the numeric
96       value). Such default scales can be overridden by using a suffix (for
97       example, M for Mega). The common suffixes K = 2^10 = 1024, M = 1024 K,
98       and G = 1024 M are supported.
99
100       Comments start with a hash sign (“#”) and extend to the end of the
101       line. In addition, any section can be prefixed with the keyword skip,
102       which causes the section and any sub-sections to be ignored.
103
104       Additional files can be included with the include file-pattern
105       statement (see glob(7) for the expressions supported in file-pattern).
106       Include statements are only allowed outside of sections.
107
108       The following sections are defined (indentation indicates in which
109       context):
110
111           common
112              [disk]
113              [handlers]
114              [net]
115              [options]
116              [startup]
117           global
118           [require-drbd-module-version-{eq,ne,gt,ge,lt,le}]
119           resource
120              connection
121                 multiple path | 2 host
122                 [net]
123                 [volume]
124                    [peer-device-options]
125                 [peer-device-options]
126              connection-mesh
127                 [net]
128              [disk]
129              floating
130              handlers
131              [net]
132              on
133                 volume
134                    disk
135                    [disk]
136              options
137              stacked-on-top-of
138              startup
139
140       Sections in brackets affect other parts of the configuration: inside
141       the common section, they apply to all resources. A disk section inside
142       a resource or on section applies to all volumes of that resource, and a
143       net section inside a resource section applies to all connections of
144       that resource. This allows to avoid repeating identical options for
145       each resource, connection, or volume. Options can be overridden in a
146       more specific resource, connection, on, or volume section.
147
148       peer-device-options are resync-rate, c-plan-ahead, c-delay-target,
149       c-fill-target, c-max-rate and c-min-rate. Due to backward
150       comapatibility they can be specified in any disk options section as
151       well. They are inherited into all relevant connections. If they are
152       given on connection level they are inherited to all volumes on that
153       connection. A peer-device-options section is started with the disk
154       keyword.
155
156   Sections
157       common
158
159           This section can contain each a disk, handlers, net, options, and
160           startup section. All resources inherit the parameters in these
161           sections as their default values.
162
163       connection
164
165           Define a connection between two hosts. This section must contain
166           two host parameters or multiple path sections.
167
168       path
169
170           Define a path between two hosts. This section must contain two host
171           parameters.
172
173       connection-mesh
174
175           Define a connection mesh between multiple hosts. This section must
176           contain a hosts parameter, which has the host names as arguments.
177           This section is a shortcut to define many connections which share
178           the same network options.
179
180       disk
181
182           Define parameters for a volume. All parameters in this section are
183           optional.
184
185       floating [address-family] addr:port
186
187           Like the on section, except that instead of the host name a network
188           address is used to determine if it matches a floating section.
189
190           The node-id parameter in this section is required. If the address
191           parameter is not provided, no connections to peers will be created
192           by default. The device, disk, and meta-disk parameters must be
193           defined in, or inherited by, this section.
194
195       global
196
197           Define some global parameters. All parameters in this section are
198           optional. Only one global section is allowed in the configuration.
199
200       require-drbd-module-version-{eq,ne,gt,ge,lt,le}
201
202           This statement contains one of the valid forms and a three digit
203           version number (e.g., require-drbd-module-version-eq 9.0.16;). If
204           the currently loaded DRBD kernel module does not match the
205           specification, parsing is aborted. Comparison operator names have
206           same semantic as in test(1).
207
208       handlers
209
210           Define handlers to be invoked when certain events occur. The kernel
211           passes the resource name in the first command-line argument and
212           sets the following environment variables depending on the event's
213           context:
214
215           •   For events related to a particular device: the device's minor
216               number in DRBD_MINOR, the device's volume number in
217               DRBD_VOLUME.
218
219           •   For events related to a particular device on a particular peer:
220               the connection endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF,
221               DRBD_PEER_ADDRESS, and DRBD_PEER_AF; the device's local minor
222               number in DRBD_MINOR, and the device's volume number in
223               DRBD_VOLUME.
224
225           •   For events related to a particular connection: the connection
226               endpoints in DRBD_MY_ADDRESS, DRBD_MY_AF, DRBD_PEER_ADDRESS,
227               and DRBD_PEER_AF; and, for each device defined for that
228               connection: the device's minor number in
229               DRBD_MINOR_volume-number.
230
231           •   For events that identify a device, if a lower-level device is
232               attached, the lower-level device's device name is passed in
233               DRBD_BACKING_DEV (or DRBD_BACKING_DEV_volume-number).
234
235           All parameters in this section are optional. Only a single handler
236           can be defined for each event; if no handler is defined, nothing
237           will happen.
238
239       net
240
241           Define parameters for a connection. All parameters in this section
242           are optional.
243
244       on host-name [...]
245
246           Define the properties of a resource on a particular host or set of
247           hosts. Specifying more than one host name can make sense in a setup
248           with IP address failover, for example. The host-name argument must
249           match the Linux host name (uname -n).
250
251           Usually contains or inherits at least one volume section. The
252           node-id and address parameters must be defined in this section. The
253           device, disk, and meta-disk parameters must be defined in, or
254           inherited by, this section.
255
256           A normal configuration file contains two or more on sections for
257           each resource. Also see the floating section.
258
259       options
260
261           Define parameters for a resource. All parameters in this section
262           are optional.
263
264       resource name
265
266           Define a resource. Usually contains at least two on sections and at
267           least one connection section.
268
269       stacked-on-top-of resource
270
271           Used instead of an on section for configuring a stacked resource
272           with three to four nodes.
273
274           Starting with DRBD 9, stacking is deprecated. It is advised to use
275           resources which are replicated among more than two nodes instead.
276
277       startup
278
279           The parameters in this section determine the behavior of a resource
280           at startup time.
281
282       volume volume-number
283
284           Define a volume within a resource. The volume numbers in the
285           various volume sections of a resource define which devices on which
286           hosts form a replicated device.
287
288   Section connection Parameters
289       host name [address [address-family] address] [port port-number]
290
291           Defines an endpoint for a connection. Each host statement refers to
292           an on section in a resource. If a port number is defined, this
293           endpoint will use the specified port instead of the port defined in
294           the on section. Each connection section must contain exactly two
295           host parameters. Instead of two host parameters the connection may
296           contain multiple path sections.
297
298   Section path Parameters
299       host name [address [address-family] address] [port port-number]
300
301           Defines an endpoint for a connection. Each host statement refers to
302           an on section in a resource. If a port number is defined, this
303           endpoint will use the specified port instead of the port defined in
304           the on section. Each path section must contain exactly two host
305           parameters.
306
307   Section connection-mesh Parameters
308       hosts name...
309
310           Defines all nodes of a mesh. Each name refers to an on section in a
311           resource. The port that is defined in the on section will be used.
312
313   Section disk Parameters
314       al-extents extents
315
316           DRBD automatically maintains a "hot" or "active" disk area likely
317           to be written to again soon based on the recent write activity. The
318           "active" disk area can be written to immediately, while "inactive"
319           disk areas must be "activated" first, which requires a meta-data
320           write. We also refer to this active disk area as the "activity
321           log".
322
323           The activity log saves meta-data writes, but the whole log must be
324           resynced upon recovery of a failed node. The size of the activity
325           log is a major factor of how long a resync will take and how fast a
326           replicated disk will become consistent after a crash.
327
328           The activity log consists of a number of 4-Megabyte segments; the
329           al-extents parameter determines how many of those segments can be
330           active at the same time. The default value for al-extents is 1237,
331           with a minimum of 7 and a maximum of 65536.
332
333           Note that the effective maximum may be smaller, depending on how
334           you created the device meta data, see also drbdmeta(8) The
335           effective maximum is 919 * (available on-disk activity-log
336           ring-buffer area/4kB -1), the default 32kB ring-buffer effects a
337           maximum of 6433 (covers more than 25 GiB of data) We recommend to
338           keep this well within the amount your backend storage and
339           replication link are able to resync inside of about 5 minutes.
340
341       al-updates {yes | no}
342
343           With this parameter, the activity log can be turned off entirely
344           (see the al-extents parameter). This will speed up writes because
345           fewer meta-data writes will be necessary, but the entire device
346           needs to be resynchronized opon recovery of a failed primary node.
347           The default value for al-updates is yes.
348
349       disk-barrier,
350       disk-flushes,
351       disk-drain
352           DRBD has three methods of handling the ordering of dependent write
353           requests:
354
355           disk-barrier
356               Use disk barriers to make sure that requests are written to
357               disk in the right order. Barriers ensure that all requests
358               submitted before a barrier make it to the disk before any
359               requests submitted after the barrier. This is implemented using
360               'tagged command queuing' on SCSI devices and 'native command
361               queuing' on SATA devices. Only some devices and device stacks
362               support this method. The device mapper (LVM) only supports
363               barriers in some configurations.
364
365               Note that on systems which do not support disk barriers,
366               enabling this option can lead to data loss or corruption. Until
367               DRBD 8.4.1, disk-barrier was turned on if the I/O stack below
368               DRBD did support barriers. Kernels since linux-2.6.36 (or
369               2.6.32 RHEL6) no longer allow to detect if barriers are
370               supported. Since drbd-8.4.2, this option is off by default and
371               needs to be enabled explicitly.
372
373           disk-flushes
374               Use disk flushes between dependent write requests, also
375               referred to as 'force unit access' by drive vendors. This
376               forces all data to disk. This option is enabled by default.
377
378           disk-drain
379               Wait for the request queue to "drain" (that is, wait for the
380               requests to finish) before submitting a dependent write
381               request. This method requires that requests are stable on disk
382               when they finish. Before DRBD 8.0.9, this was the only method
383               implemented. This option is enabled by default. Do not disable
384               in production environments.
385
386           From these three methods, drbd will use the first that is enabled
387           and supported by the backing storage device. If all three of these
388           options are turned off, DRBD will submit write requests without
389           bothering about dependencies. Depending on the I/O stack, write
390           requests can be reordered, and they can be submitted in a different
391           order on different cluster nodes. This can result in data loss or
392           corruption. Therefore, turning off all three methods of controlling
393           write ordering is strongly discouraged.
394
395           A general guideline for configuring write ordering is to use disk
396           barriers or disk flushes when using ordinary disks (or an ordinary
397           disk array) with a volatile write cache. On storage without cache
398           or with a battery backed write cache, disk draining can be a
399           reasonable choice.
400
401       disk-timeout
402           If the lower-level device on which a DRBD device stores its data
403           does not finish an I/O request within the defined disk-timeout,
404           DRBD treats this as a failure. The lower-level device is detached,
405           and the device's disk state advances to Diskless. If DRBD is
406           connected to one or more peers, the failed request is passed on to
407           one of them.
408
409           This option is dangerous and may lead to kernel panic!
410
411           "Aborting" requests, or force-detaching the disk, is intended for
412           completely blocked/hung local backing devices which do no longer
413           complete requests at all, not even do error completions. In this
414           situation, usually a hard-reset and failover is the only way out.
415
416           By "aborting", basically faking a local error-completion, we allow
417           for a more graceful swichover by cleanly migrating services. Still
418           the affected node has to be rebooted "soon".
419
420           By completing these requests, we allow the upper layers to re-use
421           the associated data pages.
422
423           If later the local backing device "recovers", and now DMAs some
424           data from disk into the original request pages, in the best case it
425           will just put random data into unused pages; but typically it will
426           corrupt meanwhile completely unrelated data, causing all sorts of
427           damage.
428
429           Which means delayed successful completion, especially for READ
430           requests, is a reason to panic(). We assume that a delayed *error*
431           completion is OK, though we still will complain noisily about it.
432
433           The default value of disk-timeout is 0, which stands for an
434           infinite timeout. Timeouts are specified in units of 0.1 seconds.
435           This option is available since DRBD 8.3.12.
436
437       md-flushes
438           Enable disk flushes and disk barriers on the meta-data device. This
439           option is enabled by default. See the disk-flushes parameter.
440
441       on-io-error handler
442
443           Configure how DRBD reacts to I/O errors on a lower-level device.
444           The following policies are defined:
445
446           pass_on
447               Change the disk status to Inconsistent, mark the failed block
448               as inconsistent in the bitmap, and retry the I/O operation on a
449               remote cluster node.
450
451           call-local-io-error
452               Call the local-io-error handler (see the handlers section).
453
454           detach
455               Detach the lower-level device and continue in diskless mode.
456
457
458       read-balancing policy
459           Distribute read requests among cluster nodes as defined by policy.
460           The supported policies are prefer-local (the default),
461           prefer-remote, round-robin, least-pending, when-congested-remote,
462           32K-striping, 64K-striping, 128K-striping, 256K-striping,
463           512K-striping and 1M-striping.
464
465           This option is available since DRBD 8.4.1.
466
467           Note: the when-congested-remote option has no effect on Linux
468           kernel 5.18 or above. It is deprecated starting from DRBD 9.1.12.
469
470       resync-after res-name/volume
471
472           Define that a device should only resynchronize after the specified
473           other device. By default, no order between devices is defined, and
474           all devices will resynchronize in parallel. Depending on the
475           configuration of the lower-level devices, and the available network
476           and disk bandwidth, this can slow down the overall resync process.
477           This option can be used to form a chain or tree of dependencies
478           among devices.
479
480       rs-discard-granularity byte
481           When rs-discard-granularity is set to a non zero, positive value
482           then DRBD tries to do a resync operation in requests of this size.
483           In case such a block contains only zero bytes on the sync source
484           node, the sync target node will issue a discard/trim/unmap command
485           for the area.
486
487           The value is constrained by the discard granularity of the backing
488           block device. In case rs-discard-granularity is not a multiplier of
489           the discard granularity of the backing block device DRBD rounds it
490           up. The feature only gets active if the backing block device reads
491           back zeroes after a discard command.
492
493           The usage of rs-discard-granularity may cause c-max-rate to be
494           exceeded. In particular, the resync rate may reach 10x the value of
495           rs-discard-granularity per second.
496
497           The default value of rs-discard-granularity is 0. This option is
498           available since 8.4.7.
499
500       discard-zeroes-if-aligned {yes | no}
501
502           There are several aspects to discard/trim/unmap support on linux
503           block devices. Even if discard is supported in general, it may fail
504           silently, or may partially ignore discard requests. Devices also
505           announce whether reading from unmapped blocks returns defined data
506           (usually zeroes), or undefined data (possibly old data, possibly
507           garbage).
508
509           If on different nodes, DRBD is backed by devices with differing
510           discard characteristics, discards may lead to data divergence (old
511           data or garbage left over on one backend, zeroes due to unmapped
512           areas on the other backend). Online verify would now potentially
513           report tons of spurious differences. While probably harmless for
514           most use cases (fstrim on a file system), DRBD cannot have that.
515
516           To play safe, we have to disable discard support, if our local
517           backend (on a Primary) does not support "discard_zeroes_data=true".
518           We also have to translate discards to explicit zero-out on the
519           receiving side, unless the receiving side (Secondary) supports
520           "discard_zeroes_data=true", thereby allocating areas what were
521           supposed to be unmapped.
522
523           There are some devices (notably the LVM/DM thin provisioning) that
524           are capable of discard, but announce discard_zeroes_data=false. In
525           the case of DM-thin, discards aligned to the chunk size will be
526           unmapped, and reading from unmapped sectors will return zeroes.
527           However, unaligned partial head or tail areas of discard requests
528           will be silently ignored.
529
530           If we now add a helper to explicitly zero-out these unaligned
531           partial areas, while passing on the discard of the aligned full
532           chunks, we effectively achieve discard_zeroes_data=true on such
533           devices.
534
535           Setting discard-zeroes-if-aligned to yes will allow DRBD to use
536           discards, and to announce discard_zeroes_data=true, even on
537           backends that announce discard_zeroes_data=false.
538
539           Setting discard-zeroes-if-aligned to no will cause DRBD to always
540           fall-back to zero-out on the receiving side, and to not even
541           announce discard capabilities on the Primary, if the respective
542           backend announces discard_zeroes_data=false.
543
544           We used to ignore the discard_zeroes_data setting completely. To
545           not break established and expected behaviour, and suddenly cause
546           fstrim on thin-provisioned LVs to run out-of-space instead of
547           freeing up space, the default value is yes.
548
549           This option is available since 8.4.7.
550
551       disable-write-same {yes | no}
552
553           Some disks announce WRITE_SAME support to the kernel but fail with
554           an I/O error upon actually receiving such a request. This mostly
555           happens when using virtualized disks -- notably, this behavior has
556           been observed with VMware's virtual disks.
557
558           When disable-write-same is set to yes, WRITE_SAME detection is
559           manually overriden and support is disabled.
560
561           The default value of disable-write-same is no. This option is
562           available since 8.4.7.
563
564       block-size size
565
566           Block storage devices have a particular sector size or block size.
567           This block size has many different names. Examples are
568           'hw_sector_size', 'PHY-SEC', 'physical block (sector) size', and
569           'logical block (sector) size'.
570
571           DRBD needs to combine these block sizes of the backing disks. In
572           clusters with storage devices with different block sizes, it is
573           necessary to configure the maximal block sizes on the DRBD level.
574           Here is an example highlighting the need.
575
576           Let's say node A is diskless. It connects to node B, which has a
577           physical block size of 512 bytes. Then the user mounts the
578           filesystem on node A; the filesystem recognizes that it can do I/O
579           in units of 512 bytes. Later, node C joins the cluster with a
580           physical block size of 4096 bytes. Now, suddenly DRBD starts to
581           deliver I/O errors to the filesystem if it chooses to do I/O on,
582           e.g., 512 or 1024 bytes.
583
584           The default value of block-size 512 bytes. This option is available
585           since drbd-utils 9.24 and the drbd kernel driver 9.1.14 and 9.2.3.
586
587   Section peer-device-options Parameters
588       Please note that you open the section with the disk keyword.
589
590       c-delay-target delay_target,
591       c-fill-target fill_target,
592       c-max-rate max_rate,
593       c-plan-ahead plan_time
594           Dynamically control the resync speed. The following modes are
595           available:
596
597           •   Dynamic control with fill target (default). Enabled when
598               c-plan-ahead is non-zero and c-fill-target is non-zero. The
599               goal is to fill the buffers along the data path with a defined
600               amount of data. This mode is recommended when DRBD-proxy is
601               used. Configured with c-plan-ahead, c-fill-target and
602               c-max-rate.
603
604           •   Dynamic control with delay target. Enabled when c-plan-ahead is
605               non-zero (default) and c-fill-target is zero. The goal is to
606               have a defined delay along the path. Configured with
607               c-plan-ahead, c-delay-target and c-max-rate.
608
609           •   Fixed resync rate. Enabled when c-plan-ahead is zero. DRBD will
610               try to perform resync I/O at a fixed rate. Configured with
611               resync-rate.
612
613           The c-plan-ahead parameter defines how fast DRBD adapts to changes
614           in the resync speed. It should be set to five times the network
615           round-trip time or more. The default value of c-plan-ahead is 20,
616           in units of 0.1 seconds.
617
618           The c-fill-target parameter defines the how much resync data DRBD
619           should aim to have in-flight at all times. Common values for
620           "normal" data paths range from 4K to 100K. The default value of
621           c-fill-target is 100, in units of sectors
622
623           The c-delay-target parameter defines the delay in the resync path
624           that DRBD should aim for. This should be set to five times the
625           network round-trip time or more. The default value of
626           c-delay-target is 10, in units of 0.1 seconds.
627
628           The c-max-rate parameter limits the maximum bandwidth used by
629           dynamically controlled resyncs. Setting this to zero removes the
630           limitation (since DRBD 9.0.28). It should be set to either the
631           bandwidth available between the DRBD hosts and the machines hosting
632           DRBD-proxy, or to the available disk bandwidth. The default value
633           of c-max-rate is 102400, in units of KiB/s.
634
635           Dynamic resync speed control is available since DRBD 8.3.9.
636
637       c-min-rate min_rate
638           A node which is primary and sync-source has to schedule application
639           I/O requests and resync I/O requests. The c-min-rate parameter
640           limits how much bandwidth is available for resync I/O; the
641           remaining bandwidth is used for application I/O.
642
643           A c-min-rate value of 0 means that there is no limit on the resync
644           I/O bandwidth. This can slow down application I/O significantly.
645           Use a value of 1 (1 KiB/s) for the lowest possible resync rate.
646
647           The default value of c-min-rate is 250, in units of KiB/s.
648
649       resync-rate rate
650
651           Define how much bandwidth DRBD may use for resynchronizing. DRBD
652           allows "normal" application I/O even during a resync. If the resync
653           takes up too much bandwidth, application I/O can become very slow.
654           This parameter allows to avoid that. Please note this is option
655           only works when the dynamic resync controller is disabled.
656
657   Section global Parameters
658       dialog-refresh time
659
660           The DRBD init script can be used to configure and start DRBD
661           devices, which can involve waiting for other cluster nodes. While
662           waiting, the init script shows the remaining waiting time. The
663           dialog-refresh defines the number of seconds between updates of
664           that countdown. The default value is 1; a value of 0 turns off the
665           countdown.
666
667       disable-ip-verification
668           Normally, DRBD verifies that the IP addresses in the configuration
669           match the host names. Use the disable-ip-verification parameter to
670           disable these checks.
671
672       usage-count {yes | no | ask}
673           A explained on DRBD's Online Usage Counter[2] web page, DRBD
674           includes a mechanism for anonymously counting how many
675           installations are using which versions of DRBD. The results are
676           available on the web page for anyone to see.
677
678           This parameter defines if a cluster node participates in the usage
679           counter; the supported values are yes, no, and ask (ask the user,
680           the default).
681
682           We would like to ask users to participate in the online usage
683           counter as this provides us valuable feedback for steering the
684           development of DRBD.
685
686       udev-always-use-vnr
687           When udev asks drbdadm for a list of device related symlinks,
688           drbdadm would suggest symlinks with differing naming conventions,
689           depending on whether the resource has explicit volume VNR { }
690           definitions, or only one single volume with the implicit volume
691           number 0:
692
693               # implicit single volume without "volume 0 {}" block
694               DEVICE=drbd<minor>
695               SYMLINK_BY_RES=drbd/by-res/<resource-name>
696               SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
697
698               # explicit volume definition: volume VNR { }
699               DEVICE=drbd<minor>
700               SYMLINK_BY_RES=drbd/by-res/<resource-name>/VNR
701               SYMLINK_BY_DISK=drbd/by-disk/<backing-disk-name>
702
703           If you define this parameter in the global section, drbdadm will
704           always add the .../VNR part, and will not care for whether the
705           volume definition was implicit or explicit.
706
707           For legacy backward compatibility, this is off by default, but we
708           do recommend to enable it.
709
710   Section handlers Parameters
711       after-resync-target cmd
712
713           Called on a resync target when a node state changes from
714           Inconsistent to Consistent when a resync finishes. This handler can
715           be used for removing the snapshot created in the
716           before-resync-target handler.
717
718       before-resync-target cmd
719
720           Called on a resync target before a resync begins. This handler can
721           be used for creating a snapshot of the lower-level device for the
722           duration of the resync: if the resync source becomes unavailable
723           during a resync, reverting to the snapshot can restore a consistent
724           state.
725
726       before-resync-source cmd
727
728           Called on a resync source before a resync begins.
729
730       out-of-sync cmd
731
732           Called on all nodes after a verify finishes and out-of-sync blocks
733           were found. This handler is mainly used for monitoring purposes. An
734           example would be to call a script that sends an alert SMS.
735
736       quorum-lost cmd
737
738           Called on a Primary that lost quorum. This handler is usually used
739           to reboot the node if it is not possible to restart the application
740           that uses the storage on top of DRBD.
741
742       fence-peer cmd
743
744           Called when a node should fence a resource on a particular peer.
745           The handler should not use the same communication path that DRBD
746           uses for talking to the peer.
747
748       unfence-peer cmd
749
750           Called when a node should remove fencing constraints from other
751           nodes.
752
753       initial-split-brain cmd
754
755           Called when DRBD connects to a peer and detects that the peer is in
756           a split-brain state with the local node. This handler is also
757           called for split-brain scenarios which will be resolved
758           automatically.
759
760       local-io-error cmd
761
762           Called when an I/O error occurs on a lower-level device.
763
764       pri-lost cmd
765
766           The local node is currently primary, but DRBD believes that it
767           should become a sync target. The node should give up its primary
768           role.
769
770       pri-lost-after-sb cmd
771
772           The local node is currently primary, but it has lost the
773           after-split-brain auto recovery procedure. The node should be
774           abandoned.
775
776       pri-on-incon-degr cmd
777
778           The local node is primary, and neither the local lower-level device
779           nor a lower-level device on a peer is up to date. (The primary has
780           no device to read from or to write to.)
781
782       split-brain cmd
783
784           DRBD has detected a split-brain situation which could not be
785           resolved automatically. Manual recovery is necessary. This handler
786           can be used to call for administrator attention.
787
788       disconnected cmd
789
790           A connection to a peer went down. The handler can learn about the
791           reason for the disconnect from the DRBD_CSTATE environment
792           variable.
793
794   Section net Parameters
795       after-sb-0pri policy
796           Define how to react if a split-brain scenario is detected and none
797           of the two nodes is in primary role. (We detect split-brain
798           scenarios when two nodes connect; split-brain decisions are always
799           between two nodes.) The defined policies are:
800
801           disconnect
802               No automatic resynchronization; simply disconnect.
803
804           discard-younger-primary,
805           discard-older-primary
806               Resynchronize from the node which became primary first
807               (discard-younger-primary) or last (discard-older-primary). If
808               both nodes became primary independently, the
809               discard-least-changes policy is used.
810
811           discard-zero-changes
812               If only one of the nodes wrote data since the split brain
813               situation was detected, resynchronize from this node to the
814               other. If both nodes wrote data, disconnect.
815
816           discard-least-changes
817               Resynchronize from the node with more modified blocks.
818
819           discard-node-nodename
820               Always resynchronize to the named node.
821
822       after-sb-1pri policy
823           Define how to react if a split-brain scenario is detected, with one
824           node in primary role and one node in secondary role. (We detect
825           split-brain scenarios when two nodes connect, so split-brain
826           decisions are always among two nodes.) The defined policies are:
827
828           disconnect
829               No automatic resynchronization, simply disconnect.
830
831           consensus
832               Discard the data on the secondary node if the after-sb-0pri
833               algorithm would also discard the data on the secondary node.
834               Otherwise, disconnect.
835
836           violently-as0p
837               Always take the decision of the after-sb-0pri algorithm, even
838               if it causes an erratic change of the primary's view of the
839               data. This is only useful if a single-node file system (i.e.,
840               not OCFS2 or GFS) with the allow-two-primaries flag is used.
841               This option can cause the primary node to crash, and should not
842               be used.
843
844           discard-secondary
845               Discard the data on the secondary node.
846
847           call-pri-lost-after-sb
848               Always take the decision of the after-sb-0pri algorithm. If the
849               decision is to discard the data on the primary node, call the
850               pri-lost-after-sb handler on the primary node.
851
852       after-sb-2pri policy
853           Define how to react if a split-brain scenario is detected and both
854           nodes are in primary role. (We detect split-brain scenarios when
855           two nodes connect, so split-brain decisions are always among two
856           nodes.) The defined policies are:
857
858           disconnect
859               No automatic resynchronization, simply disconnect.
860
861           violently-as0p
862               See the violently-as0p policy for after-sb-1pri.
863
864           call-pri-lost-after-sb
865               Call the pri-lost-after-sb helper program on one of the
866               machines unless that machine can demote to secondary. The
867               helper program is expected to reboot the machine, which brings
868               the node into a secondary role. Which machine runs the helper
869               program is determined by the after-sb-0pri strategy.
870
871       allow-remote-read bool-value
872           Allows or disallows DRBD to read from a peer node.
873
874           When the disk of a primary node is detached, DRBD will try to
875           continue reading and writing from another node in the cluster. For
876           this purpose, it searches for nodes with up-to-date data, and uses
877           any found node to resume operations. In some cases it may not be
878           desirable to read back data from a peer node, because the node
879           should only be used as a replication target. In this case, the
880           allow-remote-read parameter can be set to no, which would prohibit
881           this node from reading data from the peer node.
882
883           The allow-remote-read parameter is available since DRBD 9.0.19, and
884           defaults to yes.
885
886       allow-two-primaries
887
888           The most common way to configure DRBD devices is to allow only one
889           node to be primary (and thus writable) at a time.
890
891           In some scenarios it is preferable to allow two nodes to be primary
892           at once; a mechanism outside of DRBD then must make sure that
893           writes to the shared, replicated device happen in a coordinated
894           way. This can be done with a shared-storage cluster file system
895           like OCFS2 and GFS, or with virtual machine images and a virtual
896           machine manager that can migrate virtual machines between physical
897           machines.
898
899           The allow-two-primaries parameter tells DRBD to allow two nodes to
900           be primary at the same time. Never enable this option when using a
901           non-distributed file system; otherwise, data corruption and node
902           crashes will result!
903
904       always-asbp
905           Normally the automatic after-split-brain policies are only used if
906           current states of the UUIDs do not indicate the presence of a third
907           node.
908
909           With this option you request that the automatic after-split-brain
910           policies are used as long as the data sets of the nodes are somehow
911           related. This might cause a full sync, if the UUIDs indicate the
912           presence of a third node. (Or double faults led to strange UUID
913           sets.)
914
915       connect-int time
916
917           As soon as a connection between two nodes is configured with
918           drbdsetup connect, DRBD immediately tries to establish the
919           connection. If this fails, DRBD waits for connect-int seconds and
920           then repeats. The default value of connect-int is 10 seconds.
921
922       cram-hmac-alg hash-algorithm
923
924           Configure the hash-based message authentication code (HMAC) or
925           secure hash algorithm to use for peer authentication. The kernel
926           supports a number of different algorithms, some of which may be
927           loadable as kernel modules. See the shash algorithms listed in
928           /proc/crypto. By default, cram-hmac-alg is unset. Peer
929           authentication also requires a shared-secret to be configured.
930
931       csums-alg hash-algorithm
932
933           Normally, when two nodes resynchronize, the sync target requests a
934           piece of out-of-sync data from the sync source, and the sync source
935           sends the data. With many usage patterns, a significant number of
936           those blocks will actually be identical.
937
938           When a csums-alg algorithm is specified, when requesting a piece of
939           out-of-sync data, the sync target also sends along a hash of the
940           data it currently has. The sync source compares this hash with its
941           own version of the data. It sends the sync target the new data if
942           the hashes differ, and tells it that the data are the same
943           otherwise. This reduces the network bandwidth required, at the cost
944           of higher cpu utilization and possibly increased I/O on the sync
945           target.
946
947           The csums-alg can be set to one of the secure hash algorithms
948           supported by the kernel; see the shash algorithms listed in
949           /proc/crypto. By default, csums-alg is unset.
950
951       csums-after-crash-only
952
953           Enabling this option (and csums-alg, above) makes it possible to
954           use the checksum based resync only for the first resync after
955           primary crash, but not for later "network hickups".
956
957           In most cases, block that are marked as need-to-be-resynced are in
958           fact changed, so calculating checksums, and both reading and
959           writing the blocks on the resync target is all effective overhead.
960
961           The advantage of checksum based resync is mostly after primary
962           crash recovery, where the recovery marked larger areas (those
963           covered by the activity log) as need-to-be-resynced, just in case.
964           Introduced in 8.4.5.
965
966       data-integrity-alg  alg
967           DRBD normally relies on the data integrity checks built into the
968           TCP/IP protocol, but if a data integrity algorithm is configured,
969           it will additionally use this algorithm to make sure that the data
970           received over the network match what the sender has sent. If a data
971           integrity error is detected, DRBD will close the network connection
972           and reconnect, which will trigger a resync.
973
974           The data-integrity-alg can be set to one of the secure hash
975           algorithms supported by the kernel; see the shash algorithms listed
976           in /proc/crypto. By default, this mechanism is turned off.
977
978           Because of the CPU overhead involved, we recommend not to use this
979           option in production environments. Also see the notes on data
980           integrity below.
981
982       fencing fencing_policy
983
984           Fencing is a preventive measure to avoid situations where both
985           nodes are primary and disconnected. This is also known as a
986           split-brain situation. DRBD supports the following fencing
987           policies:
988
989           dont-care
990               No fencing actions are taken. This is the default policy.
991
992           resource-only
993               If a node becomes a disconnected primary, it tries to fence the
994               peer. This is done by calling the fence-peer handler. The
995               handler is supposed to reach the peer over an alternative
996               communication path and call 'drbdadm outdate minor' there.
997
998           resource-and-stonith
999               If a node becomes a disconnected primary, it freezes all its IO
1000               operations and calls its fence-peer handler. The fence-peer
1001               handler is supposed to reach the peer over an alternative
1002               communication path and call 'drbdadm outdate minor' there. In
1003               case it cannot do that, it should stonith the peer. IO is
1004               resumed as soon as the situation is resolved. In case the
1005               fence-peer handler fails, I/O can be resumed manually with
1006               'drbdadm resume-io'.
1007
1008       ko-count number
1009
1010           If a secondary node fails to complete a write request in ko-count
1011           times the timeout parameter, it is excluded from the cluster. The
1012           primary node then sets the connection to this secondary node to
1013           Standalone. To disable this feature, you should explicitly set it
1014           to 0; defaults may change between versions.
1015
1016       load-balance-paths {yes | no}
1017           By default, the TCP transport establishes only one configured path
1018           at a time. It switches to another path only in case the established
1019           one fails. When you set load-balance-paths to yes the TCP transport
1020           establishes all paths in parallel. It will transmit data packets
1021           over the paths with the least data in its socket send queue.
1022
1023           Please note enabling load-balancing introduces additional chunking
1024           headers into the network protocol. In other words, you must enable
1025           it on both sides of a connection.
1026
1027           As of drbd-9.2.6 the RDMA transport does not obey this setting. It
1028           always uses all paths in parallel. This option became available
1029           with drbd-9.2.6.
1030
1031       max-buffers number
1032
1033           Limits the memory usage per DRBD minor device on the receiving
1034           side, or for internal buffers during resync or online-verify. Unit
1035           is PAGE_SIZE, which is 4 KiB on most systems. The minimum possible
1036           setting is hard coded to 32 (=128 KiB). These buffers are used to
1037           hold data blocks while they are written to/read from disk. To avoid
1038           possible distributed deadlocks on congestion, this setting is used
1039           as a throttle threshold rather than a hard limit. Once more than
1040           max-buffers pages are in use, further allocation from this pool is
1041           throttled. You want to increase max-buffers if you cannot saturate
1042           the IO backend on the receiving side.
1043
1044       max-epoch-size number
1045
1046           Define the maximum number of write requests DRBD may issue before
1047           issuing a write barrier. The default value is 2048, with a minimum
1048           of 1 and a maximum of 20000. Setting this parameter to a value
1049           below 10 is likely to decrease performance.
1050
1051       on-congestion policy,
1052       congestion-fill threshold,
1053       congestion-extents threshold
1054           By default, DRBD blocks when the TCP send queue is full. This
1055           prevents applications from generating further write requests until
1056           more buffer space becomes available again.
1057
1058           When DRBD is used together with DRBD-proxy, it can be better to use
1059           the pull-ahead on-congestion policy, which can switch DRBD into
1060           ahead/behind mode before the send queue is full. DRBD then records
1061           the differences between itself and the peer in its bitmap, but it
1062           no longer replicates them to the peer. When enough buffer space
1063           becomes available again, the node resynchronizes with the peer and
1064           switches back to normal replication.
1065
1066           This has the advantage of not blocking application I/O even when
1067           the queues fill up, and the disadvantage that peer nodes can fall
1068           behind much further. Also, while resynchronizing, peer nodes will
1069           become inconsistent.
1070
1071           The available congestion policies are block (the default) and
1072           pull-ahead. The congestion-fill parameter defines how much data is
1073           allowed to be "in flight" in this connection. The default value is
1074           0, which disables this mechanism of congestion control, with a
1075           maximum of 10 GiBytes. The congestion-extents parameter defines how
1076           many bitmap extents may be active before switching into
1077           ahead/behind mode, with the same default and limits as the
1078           al-extents parameter. The congestion-extents parameter is effective
1079           only when set to a value smaller than al-extents.
1080
1081           Ahead/behind mode is available since DRBD 8.3.10.
1082
1083       ping-int interval
1084
1085           When the TCP/IP connection to a peer is idle for more than ping-int
1086           seconds, DRBD will send a keep-alive packet to make sure that a
1087           failed peer or network connection is detected reasonably soon. The
1088           default value is 10 seconds, with a minimum of 1 and a maximum of
1089           120 seconds. The unit is seconds.
1090
1091       ping-timeout timeout
1092
1093           Define the timeout for replies to keep-alive packets. If the peer
1094           does not reply within ping-timeout, DRBD will close and try to
1095           reestablish the connection. The default value is 0.5 seconds, with
1096           a minimum of 0.1 seconds and a maximum of 30 seconds. The unit is
1097           tenths of a second.
1098
1099       socket-check-timeout timeout
1100           In setups involving a DRBD-proxy and connections that experience a
1101           lot of buffer-bloat it might be necessary to set ping-timeout to an
1102           unusual high value. By default DRBD uses the same value to wait if
1103           a newly established TCP-connection is stable. Since the DRBD-proxy
1104           is usually located in the same data center such a long wait time
1105           may hinder DRBD's connect process.
1106
1107           In such setups socket-check-timeout should be set to at least to
1108           the round trip time between DRBD and DRBD-proxy. I.e. in most cases
1109           to 1.
1110
1111           The default unit is tenths of a second, the default value is 0
1112           (which causes DRBD to use the value of ping-timeout instead).
1113           Introduced in 8.4.5.
1114
1115       protocol name
1116           Use the specified protocol on this connection. The supported
1117           protocols are:
1118
1119           A
1120               Writes to the DRBD device complete as soon as they have reached
1121               the local disk and the TCP/IP send buffer.
1122
1123           B
1124               Writes to the DRBD device complete as soon as they have reached
1125               the local disk, and all peers have acknowledged the receipt of
1126               the write requests.
1127
1128           C
1129               Writes to the DRBD device complete as soon as they have reached
1130               the local and all remote disks.
1131
1132
1133       rcvbuf-size size
1134
1135           Configure the size of the TCP/IP receive buffer. A value of 0 (the
1136           default) causes the buffer size to adjust dynamically. This
1137           parameter usually does not need to be set, but it can be set to a
1138           value up to 10 MiB. The default unit is bytes.
1139
1140       rdma-ctrl-rcvbuf-size value
1141
1142           By default, the RDMA transport divides the rcvbuf-size by 64 and
1143           uses the result for the number of buffers on the control stream.
1144           This result might be too low depending on the timing
1145           characteristics of the backing storage devices and the network
1146           link.
1147
1148           The option rdma-ctrl-rcvbuf-size allows you to explicitly set the
1149           number of buffers for the control stream, overruling the divide by
1150           64 heuristics. The default unit of this setting is bytes.
1151
1152       rdma-ctrl-sndbuf-size value
1153
1154           By default, the RDMA transport divides the sndbuf-size by 64 and
1155           uses the result for the number of buffers on the control stream.
1156           This result might be too low depending on the timing
1157           characteristics of the backing storage devices and the network
1158           link.
1159
1160           The option rdma-ctrl-sndbuf-size allows you to explicitly set the
1161           number of buffers for the control stream, overruling the divide by
1162           64 heuristics. The default unit of this setting is bytes.
1163
1164       rr-conflict policy
1165           This option helps to solve the cases when the outcome of the resync
1166           decision is incompatible with the current role assignment in the
1167           cluster. The defined policies are:
1168
1169           disconnect
1170               No automatic resynchronization, simply disconnect.
1171
1172           retry-connect
1173               Disconnect now, and retry to connect immediatly afterwards.
1174
1175           violently
1176               Resync to the primary node is allowed, violating the assumption
1177               that data on a block device are stable for one of the nodes.
1178               Do not use this option, it is dangerous.
1179
1180           call-pri-lost
1181               Call the pri-lost handler on one of the machines. The handler
1182               is expected to reboot the machine, which puts it into secondary
1183               role.
1184
1185           auto-discard
1186               Auto-discard reverses the resync direction, so that DRBD
1187               resyncs the current primary to the current secondary.
1188               Auto-discard only applies when protocol A is in use and the
1189               resync decision is based on the principle that a crashed
1190               primary should be the source of a resync. When a primary node
1191               crashes, it might have written some last updates to its disk,
1192               which were not received by a protocol A secondary. By promoting
1193               the secondary in the meantime the user accepted that those last
1194               updates have been lost. By using auto-discard you consent that
1195               the last updates (before the crash of the primary) should be
1196               rolled back automatically.
1197
1198       shared-secret secret
1199
1200           Configure the shared secret used for peer authentication. The
1201           secret is a string of up to 64 characters. Peer authentication also
1202           requires the cram-hmac-alg parameter to be set.
1203
1204       sndbuf-size size
1205
1206           Configure the size of the TCP/IP send buffer. Since DRBD 8.0.13 /
1207           8.2.7, a value of 0 (the default) causes the buffer size to adjust
1208           dynamically. Values below 32 KiB are harmful to the throughput on
1209           this connection. Large buffer sizes can be useful especially when
1210           protocol A is used over high-latency networks; the maximum value
1211           supported is 10 MiB.
1212
1213       tcp-cork
1214           By default, DRBD uses the TCP_CORK socket option to prevent the
1215           kernel from sending partial messages; this results in fewer and
1216           bigger packets on the network. Some network stacks can perform
1217           worse with this optimization. On these, the tcp-cork parameter can
1218           be used to turn this optimization off.
1219
1220       timeout time
1221
1222           Define the timeout for replies over the network: if a peer node
1223           does not send an expected reply within the specified timeout, it is
1224           considered dead and the TCP/IP connection is closed. The timeout
1225           value must be lower than connect-int and lower than ping-int. The
1226           default is 6 seconds; the value is specified in tenths of a second.
1227
1228       tls bool-value
1229           Enable TLS.
1230
1231       tls-keyring key-description
1232           Key description (name) of the keyring where the TLS key material is
1233           stored. The keyring will be shared with the handshake daemon.
1234
1235       tls-privkey key-description
1236           Key description (name) of the DER encoded private key for TLS
1237           encryption.
1238
1239       tls-certificate key-description
1240           Key description (name) of the DER encoded certificate for TLS
1241           encryption.
1242
1243       transport type
1244
1245           With DRBD9 the network transport used by DRBD is loaded as a
1246           seperate module. With this option you can specify which transport
1247           and module to load. At present only two options exist, tcp and
1248           rdma. Default is tcp.
1249
1250       use-rle
1251
1252           Each replicated device on a cluster node has a separate bitmap for
1253           each of its peer devices. The bitmaps are used for tracking the
1254           differences between the local and peer device: depending on the
1255           cluster state, a disk range can be marked as different from the
1256           peer in the device's bitmap, in the peer device's bitmap, or in
1257           both bitmaps. When two cluster nodes connect, they exchange each
1258           other's bitmaps, and they each compute the union of the local and
1259           peer bitmap to determine the overall differences.
1260
1261           Bitmaps of very large devices are also relatively large, but they
1262           usually compress very well using run-length encoding. This can save
1263           time and bandwidth for the bitmap transfers.
1264
1265           The use-rle parameter determines if run-length encoding should be
1266           used. It is on by default since DRBD 8.4.0.
1267
1268       verify-alg hash-algorithm
1269           Online verification (drbdadm verify) computes and compares
1270           checksums of disk blocks (i.e., hash values) in order to detect if
1271           they differ. The verify-alg parameter determines which algorithm to
1272           use for these checksums. It must be set to one of the secure hash
1273           algorithms supported by the kernel before online verify can be
1274           used; see the shash algorithms listed in /proc/crypto.
1275
1276           We recommend to schedule online verifications regularly during
1277           low-load periods, for example once a month. Also see the notes on
1278           data integrity below.
1279
1280   Section on Parameters
1281       address [address-family] address:port
1282
1283           Defines the address family, address, and port of a connection
1284           endpoint.
1285
1286           The address families ipv4, ipv6, ssocks (Dolphin Interconnect
1287           Solutions' "super sockets"), sdp (Infiniband Sockets Direct
1288           Protocol), and sci are supported (sci is an alias for ssocks). If
1289           no address family is specified, ipv4 is assumed. For all address
1290           families except ipv6, the address is specified in IPV4 address
1291           notation (for example, 1.2.3.4). For ipv6, the address is enclosed
1292           in brackets and uses IPv6 address notation (for example,
1293           [fd01:2345:6789:abcd::1]). The port is always specified as a
1294           decimal number from 1 to 65535.
1295
1296           On each host, the port numbers must be unique for each address;
1297           ports cannot be shared.
1298
1299       node-id value
1300
1301           Defines the unique node identifier for a node in the cluster. Node
1302           identifiers are used to identify individual nodes in the network
1303           protocol, and to assign bitmap slots to nodes in the metadata.
1304
1305           Node identifiers can only be reasssigned in a cluster when the
1306           cluster is down. It is essential that the node identifiers in the
1307           configuration and in the device metadata are changed consistently
1308           on all hosts. To change the metadata, dump the current state with
1309           drbdmeta dump-md, adjust the bitmap slot assignment, and update the
1310           metadata with drbdmeta restore-md.
1311
1312           The node-id parameter exists since DRBD 9. Its value ranges from 0
1313           to 16; there is no default.
1314
1315   Section options Parameters (Resource Options)
1316       auto-promote bool-value
1317           A resource must be promoted to primary role before any of its
1318           devices can be mounted or opened for writing.
1319
1320           Before DRBD 9, this could only be done explicitly ("drbdadm
1321           primary"). Since DRBD 9, the auto-promote parameter allows to
1322           automatically promote a resource to primary role when one of its
1323           devices is mounted or opened for writing. As soon as all devices
1324           are unmounted or closed with no more remaining users, the role of
1325           the resource changes back to secondary.
1326
1327           Automatic promotion only succeeds if the cluster state allows it
1328           (that is, if an explicit drbdadm primary command would succeed).
1329           Otherwise, mounting or opening the device fails as it already did
1330           before DRBD 9: the mount(2) system call fails with errno set to
1331           EROFS (Read-only file system); the open(2) system call fails with
1332           errno set to EMEDIUMTYPE (wrong medium type).
1333
1334           Irrespective of the auto-promote parameter, if a device is promoted
1335           explicitly (drbdadm primary), it also needs to be demoted
1336           explicitly (drbdadm secondary).
1337
1338           The auto-promote parameter is available since DRBD 9.0.0, and
1339           defaults to yes.
1340
1341       auto-promote-timeout 1/10-of-seconds
1342
1343           When a user process promotes a drbd resource by opening one of its
1344           devices, DRBD waits up to auto-promote-timeout for the device to
1345           become promotable if it is not in the first place.
1346
1347           auto-promote-timeout is specified in units of 0.1 seconds. Its
1348           default value is 20 (2 seconds), its minimum value is 0, and its
1349           maximum value is 600 (=one minute).
1350
1351       cpu-mask cpu-mask
1352
1353           Set the cpu affinity mask for DRBD kernel threads. The cpu mask is
1354           specified as a hexadecimal number. The default value is 0, which
1355           lets the scheduler decide which kernel threads run on which CPUs.
1356           CPU numbers in cpu-mask which do not exist in the system are
1357           ignored.
1358
1359       max-io-depth value
1360
1361           This limits the number of outstanding requests on a DRBD device.
1362           Any process that tries to issue more I/O requests will sleep in "D
1363           state" (uninterruptible by signals) until some previously issued
1364           requests finish.
1365
1366           max-io-depth has a default value of 8000, its minimum value is 4,
1367           and its maximum value is 2^32.
1368
1369       on-no-data-accessible policy
1370           Determine how to deal with I/O requests when the requested data is
1371           not available locally or remotely (for example, when all disks have
1372           failed). When quorum is enabled, on-no-data-accessible should be
1373           set to the same value as on-no-quorum. The defined policies are:
1374
1375           io-error
1376               System calls fail with errno set to EIO.
1377
1378           suspend-io
1379               The resource suspends I/O. I/O can be resumed by (re)attaching
1380               the lower-level device, by connecting to a peer which has
1381               access to the data, or by forcing DRBD to resume I/O with
1382               drbdadm resume-io res. When no data is available, forcing I/O
1383               to resume will result in the same behavior as the io-error
1384               policy.
1385
1386           This setting is available since DRBD 8.3.9; the default policy is
1387           io-error.
1388
1389       on-no-quorum {io-error | suspend-io}
1390
1391           By default DRBD freezes IO on a device, that lost quorum. By
1392           setting the on-no-quorum to io-error it completes all IO operations
1393           with an error if quorum is lost.
1394
1395           Usually, the on-no-data-accessible should be set to the same value
1396           as on-no-quorum, as it has precedence.
1397
1398           The on-no-quorum options is available starting with the DRBD kernel
1399           driver version 9.0.8.
1400
1401       on-suspended-primary-outdated {disconnect | force-secondary}
1402
1403           This setting is only relevant when on-no-quorum is set to
1404           suspend-io. It is relevant in the following scenario. A primary
1405           node loses quorum hence has all IO requests frozen. This primary
1406           node then connects to another, quorate partition. It detects that a
1407           node in this quorate partition was promoted to primary, and started
1408           a newer data-generation there. As a result, the first primary
1409           learns that it has to consider itself outdated.
1410
1411           When it is set to force-secondary then it will demote to secondary
1412           immediately, and fail all pending (and new) IO requests with IO
1413           errors. It will refuse to allow any process to open the DRBD
1414           devices until all openers closed the device. This state is visible
1415           in status and events2 under the name force-io-failures.
1416
1417           The disconnect setting simply causes that node to reject connect
1418           attempts and stay isolated.
1419
1420           The on-suspended-primary-outdated option is available starting with
1421           the DRBD kernel driver version 9.1.7. It has a default value of
1422           disconnect.
1423
1424       peer-ack-delay expiry-time
1425
1426           If after the last finished write request no new write request gets
1427           issued for expiry-time, then a peer-ack packet is sent. If a new
1428           write request is issued before the timer expires, the timer gets
1429           reset to expiry-time. (Note: peer-ack packets may be sent due to
1430           other reasons as well, e.g. membership changes or the
1431           peer-ack-window option.)
1432
1433           This parameter may influence resync behavior on remote nodes. Peer
1434           nodes need to wait until they receive an peer-ack for releasing a
1435           lock on an AL-extent. Resync operations between peers may need to
1436           wait for for these locks.
1437
1438           The default value for peer-ack-delay is 100 milliseconds, the
1439           default unit is milliseconds. This option is available since 9.0.0.
1440
1441       peer-ack-window value
1442
1443           On each node and for each device, DRBD maintains a bitmap of the
1444           differences between the local and remote data for each peer device.
1445           For example, in a three-node setup (nodes A, B, C) each with a
1446           single device, every node maintains one bitmap for each of its
1447           peers.
1448
1449           When nodes receive write requests, they know how to update the
1450           bitmaps for the writing node, but not how to update the bitmaps
1451           between themselves. In this example, when a write request
1452           propagates from node A to B and C, nodes B and C know that they
1453           have the same data as node A, but not whether or not they both have
1454           the same data.
1455
1456           As a remedy, the writing node occasionally sends peer-ack packets
1457           to its peers which tell them which state they are in relative to
1458           each other.
1459
1460           The peer-ack-window parameter specifies how much data a primary
1461           node may send before sending a peer-ack packet. A low value causes
1462           increased network traffic; a high value causes less network traffic
1463           but higher memory consumption on secondary nodes and higher resync
1464           times between the secondary nodes after primary node failures.
1465           (Note: peer-ack packets may be sent due to other reasons as well,
1466           e.g. membership changes or expiry of the peer-ack-delay timer.)
1467
1468           The default value for peer-ack-window is 2 MiB, the default unit is
1469           sectors. This option is available since 9.0.0.
1470
1471       quorum value
1472
1473           When activated, a cluster partition requires quorum in order to
1474           modify the replicated data set. That means a node in the cluster
1475           partition can only be promoted to primary if the cluster partition
1476           has quorum. Every node with a disk directly connected to the node
1477           that should be promoted counts. If a primary node should execute a
1478           write request, but the cluster partition has lost quorum, it will
1479           freeze IO or reject the write request with an error (depending on
1480           the on-no-quorum setting). Upon loosing quorum a primary always
1481           invokes the quorum-lost handler. The handler is intended for
1482           notification purposes, its return code is ignored.
1483
1484           The option's value might be set to off, majority, all or a numeric
1485           value. If you set it to a numeric value, make sure that the value
1486           is greater than half of your number of nodes. Quorum is a mechanism
1487           to avoid data divergence, it might be used instead of fencing when
1488           there are more than two repicas. It defaults to off
1489
1490           If all missing nodes are marked as outdated, a partition always has
1491           quorum, no matter how small it is. I.e. If you disconnect all
1492           secondary nodes gracefully a single primary continues to operate.
1493           In the moment a single secondary is lost, it has to be assumed that
1494           it forms a partition with all the missing outdated nodes. In case
1495           my partition might be smaller than the other, quorum is lost in
1496           this moment.
1497
1498           In case you want to allow permanently diskless nodes to gain quorum
1499           it is recommendet to not use majority or all. It is recommended to
1500           specify an absolute number, since DBRD's heuristic to determine the
1501           complete number of diskfull nodes in the cluster is unreliable.
1502
1503           The quorum implementation is available starting with the DRBD
1504           kernel driver version 9.0.7.
1505
1506       quorum-minimum-redundancy value
1507
1508           This option sets the minimal required number of nodes with an
1509           UpToDate disk to allow the partition to gain quorum. This is a
1510           different requirement than the plain quorum option expresses.
1511
1512           The option's value might be set to off, majority, all or a numeric
1513           value. If you set it to a numeric value, make sure that the value
1514           is greater than half of your number of nodes.
1515
1516           In case you want to allow permanently diskless nodes to gain quorum
1517           it is recommendet to not use majority or all. It is recommended to
1518           specify an absolute number, since DBRD's heuristic to determine the
1519           complete number of diskfull nodes in the cluster is unreliable.
1520
1521           This option is available starting with the DRBD kernel driver
1522           version 9.0.10.
1523
1524       twopc-retry-timeout 1/10-of-seconds
1525
1526           Due to conflicting two-phase-commit sometimes DRBD needs to retry
1527           them. But if two nodes retry their intended two-phase-commits after
1528           the same time, they would end up in an endless retry loop. To avoid
1529           that, DRBD selects a random wait time within an upper bound, an
1530           exponential backoff, and a function of the retry number. The
1531           twopc-retry-timeout is a base multiplier for that function.
1532
1533           twopc-retry-timeout has a default value of a (0.1 seconds), its
1534           minimum value is 1 (0.1 seconds), and its maximum value is 50 (5
1535           seconds).
1536
1537       twopc-timeout 1/10-of-seconds
1538
1539           In some situations, a DRBD cluster requires a cluster-wide
1540           coordinated state transition. A perfect example of this is the
1541           'promote-to-primary' action. Even if two not directly connected
1542           nodes in a cluster try this action concurrently, it may only
1543           succeed for one of the two.
1544
1545           For these cluster-wide coordinated state transitions, DRBD
1546           implements a two-phase commit protocol. If a connection breaks in
1547           phase one (prepare packet sent), the coordinator of the two-phase
1548           commit might never get the expected reply packet.
1549
1550           A cluster in this state can not start any new cluster-wide
1551           coordinated state transition, as the already prepared one blocks
1552           all such attempts. After twopc-timeout all nodes abort the prepared
1553           transaction and unlock the cluster again.
1554
1555           twopc-timeout has a default value of 300 (30 seconds), its minimum
1556           value is 50 (5 seconds), and its maximum value is 600 (one minute).
1557
1558   Section startup Parameters
1559       The parameters in this section define the behavior of DRBD at system
1560       startup time, in the DRBD init script. They have no effect once the
1561       system is up and running.
1562
1563       degr-wfc-timeout timeout
1564
1565           Define how long to wait until all peers are connected in case the
1566           cluster consisted of a single node only when the system went down.
1567           This parameter is usually set to a value smaller than wfc-timeout.
1568           The assumption here is that peers which were unreachable before a
1569           reboot are less likely to be reachable after the reboot, so waiting
1570           is less likely to help.
1571
1572           The timeout is specified in seconds. The default value is 0, which
1573           stands for an infinite timeout. Also see the wfc-timeout parameter.
1574
1575       outdated-wfc-timeout timeout
1576
1577           Define how long to wait until all peers are connected if all peers
1578           were outdated when the system went down. This parameter is usually
1579           set to a value smaller than wfc-timeout. The assumption here is
1580           that an outdated peer cannot have become primary in the meantime,
1581           so we don't need to wait for it as long as for a node which was
1582           alive before.
1583
1584           The timeout is specified in seconds. The default value is 0, which
1585           stands for an infinite timeout. Also see the wfc-timeout parameter.
1586
1587       stacked-timeouts
1588           On stacked devices, the wfc-timeout and degr-wfc-timeout parameters
1589           in the configuration are usually ignored, and both timeouts are set
1590           to twice the connect-int timeout. The stacked-timeouts parameter
1591           tells DRBD to use the wfc-timeout and degr-wfc-timeout parameters
1592           as defined in the configuration, even on stacked devices. Only use
1593           this parameter if the peer of the stacked resource is usually not
1594           available, or will not become primary. Incorrect use of this
1595           parameter can lead to unexpected split-brain scenarios.
1596
1597       wait-after-sb
1598           This parameter causes DRBD to continue waiting in the init script
1599           even when a split-brain situation has been detected, and the nodes
1600           therefore refuse to connect to each other.
1601
1602       wfc-timeout timeout
1603
1604           Define how long the init script waits until all peers are
1605           connected. This can be useful in combination with a cluster manager
1606           which cannot manage DRBD resources: when the cluster manager
1607           starts, the DRBD resources will already be up and running. With a
1608           more capable cluster manager such as Pacemaker, it makes more sense
1609           to let the cluster manager control DRBD resources. The timeout is
1610           specified in seconds. The default value is 0, which stands for an
1611           infinite timeout. Also see the degr-wfc-timeout parameter.
1612
1613   Section volume Parameters
1614       device /dev/drbdminor-number
1615
1616           Define the device name and minor number of a replicated block
1617           device. This is the device that applications are supposed to
1618           access; in most cases, the device is not used directly, but as a
1619           file system. This parameter is required and the standard device
1620           naming convention is assumed.
1621
1622           In addition to this device, udev will create
1623           /dev/drbd/by-res/resource/volume and
1624           /dev/drbd/by-disk/lower-level-device symlinks to the device.
1625
1626       disk {[disk] | none}
1627
1628           Define the lower-level block device that DRBD will use for storing
1629           the actual data. While the replicated drbd device is configured,
1630           the lower-level device must not be used directly. Even read-only
1631           access with tools like dumpe2fs(8) and similar is not allowed. The
1632           keyword none specifies that no lower-level block device is
1633           configured; this also overrides inheritance of the lower-level
1634           device.
1635
1636       meta-disk internal,
1637       meta-disk device,
1638       meta-disk device [index]
1639
1640           Define where the metadata of a replicated block device resides: it
1641           can be internal, meaning that the lower-level device contains both
1642           the data and the metadata, or on a separate device.
1643
1644           When the index form of this parameter is used, multiple replicated
1645           devices can share the same metadata device, each using a separate
1646           index. Each index occupies 128 MiB of data, which corresponds to a
1647           replicated device size of at most 4 TiB with two cluster nodes. We
1648           recommend not to share metadata devices anymore, and to instead use
1649           the lvm volume manager for creating metadata devices as needed.
1650
1651           When the index form of this parameter is not used, the size of the
1652           lower-level device determines the size of the metadata. The size
1653           needed is 36 KiB + (size of lower-level device) / 32K * (number of
1654           nodes - 1). If the metadata device is bigger than that, the extra
1655           space is not used.
1656
1657           This parameter is required if a disk other than none is specified,
1658           and ignored if disk is set to none. A meta-disk parameter without a
1659           disk parameter is not allowed.
1660

NOTES ON DATA INTEGRITY

1662       DRBD supports two different mechanisms for data integrity checking:
1663       first, the data-integrity-alg network parameter allows to add a
1664       checksum to the data sent over the network. Second, the online
1665       verification mechanism (drbdadm verify and the verify-alg parameter)
1666       allows to check for differences in the on-disk data.
1667
1668       Both mechanisms can produce false positives if the data is modified
1669       during I/O (i.e., while it is being sent over the network or written to
1670       disk). This does not always indicate a problem: for example, some file
1671       systems and applications do modify data under I/O for certain
1672       operations. Swap space can also undergo changes while under I/O.
1673
1674       Network data integrity checking tries to identify data modification
1675       during I/O by verifying the checksums on the sender side after sending
1676       the data. If it detects a mismatch, it logs an error. The receiver also
1677       logs an error when it detects a mismatch. Thus, an error logged only on
1678       the receiver side indicates an error on the network, and an error
1679       logged on both sides indicates data modification under I/O.
1680
1681       The most recent example of systematic data corruption was identified as
1682       a bug in the TCP offloading engine and driver of a certain type of GBit
1683       NIC in 2007: the data corruption happened on the DMA transfer from core
1684       memory to the card. Because the TCP checksum were calculated on the
1685       card, the TCP/IP protocol checksums did not reveal this problem.
1686

VERSION

1688       This document was revised for version 9.0.0 of the DRBD distribution.
1689

AUTHOR

1691       Written by Philipp Reisner <philipp.reisner@linbit.com> and Lars
1692       Ellenberg <lars.ellenberg@linbit.com>.
1693

REPORTING BUGS

1695       Report bugs to <drbd-user@lists.linbit.com>.
1696

COPYRIGHT

1698       Copyright 2001-2018 LINBIT Information Technologies, Philipp Reisner,
1699       Lars Ellenberg. This is free software; see the source for copying
1700       conditions. There is NO warranty; not even for MERCHANTABILITY or
1701       FITNESS FOR A PARTICULAR PURPOSE.
1702

NOTES

1708        1. DRBD User's Guide
1709           http://www.drbd.org/users-guide/
1710
1711        2.
1712
1713                 Online Usage Counter
1714           http://usage.drbd.org
1715
1716        3. DRBD Web Site
1717           http://www.drbd.org/
1718
1719
1720
1721DRBD 9.0.x                      17 January 2018                   DRBD.CONF(5)