mdmon(8) - f14

1MDMON(8)                    System Manager's Manual                   MDMON(8)
2
3
4

NAME

6       mdmon - monitor MD external metadata arrays
7
8

SYNOPSIS

10       mdmon [--all] [--takeover] CONTAINER
11
12

OVERVIEW

14       The  2.6.27  kernel  brings  the  ability  to support external metadata
15       arrays.  External metadata implies that user space handles all  updates
16       to  the  metadata.  The kernel's responsibility is to notify user space
17       when a "metadata event" occurs, like disk failures  and  clean-to-dirty
18       transitions.   The  kernel, in important cases, waits for user space to
19       take action on these notifications.
20
21

DESCRIPTION

23   Metadata updates:
24       To service metadata update requests a  daemon,  mdmon,  is  introduced.
25       Mdmon is tasked with polling the sysfs namespace looking for changes in
26       array_state, sync_action, and per disk state attributes.  When a change
27       is  detected it calls a per metadata type handler to make modifications
28       to the metadata.  The following actions are taken:
29
30              array_state - inactive
31                     Clear the dirty bit for the volume and let the  array  be
32                     stopped
33
34              array_state - write pending
35                     Set  the dirty bit for the array and then set array_state
36                     to active.  Writes are  blocked  until  userspace  writes
37                     active.
38
39              array_state - active-idle
40                     The  safe  mode  timer  has expired so set array state to
41                     clean to block writes to the array
42
43              array_state - clean
44                     Clear the dirty bit for the volume
45
46              array_state - read-only
47                     This is the initial  state  that  all  arrays  start  at.
48                     mdmon takes one of the three actions:
49
50                     1/     Transition  the  array  to  read-auto  keeping the
51                            dirty bit clear if the metadata handler determines
52                            that  the  array  does not need resyncing or other
53                            modification
54
55                     2/     Transition the array to  active  if  the  metadata
56                            handler  determines a resync or some other manipu‐
57                            lation is necessary
58
59                     3/     Leave the array read-only if the volume is  marked
60                            to  not  be  monitored;  for example, the metadata
61                            version  has  been  set  to  "external:-dev/md127"
62                            instead of "external:/dev/md127"
63
64              sync_action - resync-to-idle
65                     Notify  the  metadata handler that a resync may have com‐
66                     pleted.  If a resync process is idled before it completes
67                     this  event  allows  the  metadata  handler to checkpoint
68                     resync.
69
70              sync_action - recover-to-idle
71                     A spare may have completed rebuilding so tell  the  meta‐
72                     data  handler  about the state of each disk.  This is the
73                     metadata handler's opportunity to clear any "out-of-sync"
74                     bits and clear the volume's degraded status.  If a recov‐
75                     ery process is  idled  before  it  completes  this  event
76                     allows the metadata handler to checkpoint recovery.
77
78              <disk>/state - faulty
79                     A  disk  failure  kicks  off  a series of events.  First,
80                     notify the metadata handler that a disk has  failed,  and
81                     then  notify  the  kernel that it can unblock writes that
82                     were dependent on this disk.  After unblocking the kernel
83                     this  disk  is  set to be removed+ from the member array.
84                     Finally the disk is marked failed  in  all  other  member
85                     arrays in the container.
86
87                     +  Note  This  behavior  differs  slightly from native MD
88                     arrays where removal is reserved  for  a  mdadm  --remove
89                     event.  In the external metadata case the container holds
90                     the final  reference  on  a  block  device  and  a  mdadm
91                     --remove <container> <victim> call is still required.
92
93
94   Containers:
95       External metadata formats, like DDF, differ from the native MD metadata
96       formats in that they define a set of disks and a series  of  sub-arrays
97       within  those disks.  MD metadata in comparison defines a 1:1 relation‐
98       ship between a set of block devices and a raid array.  For  example  to
99       create  2  arrays at different raid levels on a single set of disks, MD
100       metadata requires the disks be partitioned and then each array can cre‐
101       ated  be  created  with  a  subset  of those partitions.  The supported
102       external formats perform this disk carving internally.
103
104       Container devices simply hold references to all member disks and  allow
105       tools  like mdmon to determine which active arrays belong to which con‐
106       tainer.  Some array management commands like disk removal and disk  add
107       are  now  only valid at the container level.  Attempts to perform these
108       actions on member arrays are blocked with error messages like:
109
110              "mdadm: Cannot remove disks from a ´member´ array, perform  this
111              operation on the parent container"
112
113       Containers  are  identified  in  /proc/mdstat  with  a metadata version
114       string "external:<metadata name>". Member  devices  are  identified  by
115       "external:/<container device>/<member index>", or "external:-<container
116       device>/<member index>" if the array is to remain readonly.
117
118

OPTIONS

120       CONTAINER
121              The container device to monitor.  It can be  a  full  path  like
122              /dev/md/container, or a simple md device name like md127.
123
124       --takeover
125              This  instructs  mdmon to replace any active mdmon which is cur‐
126              rently monitoring the array.  This is primarily used late in the
127              boot  process  to  replace  any  mdmon which was started from an
128              initramfs before the root filesystem was mounted.   This  avoids
129              holding  a  reference on that initramfs indefinitely and ensures
130              that the pid and sock files used to communicate with  mdmon  are
131              in a standard place.
132
133       --all  This  tells  mdmon to find any active containers and start moni‐
134              toring each of them if appropriate.  This is normally used  with
135              --takeover  late in the boot sequence.  A separate mdmon process
136              is started for each container as the  --all  argument  is  over-
137              written with the name of the container.  To allow for containers
138              with names longer than 5 characters, this argument can be  arbi‐
139              trarily extended, e.g. to --all-active-arrays.
140
141
142       Note  that  mdmon  is automatically started by mdadm when needed and so
143       does not need to be considered when working with RAID arrays.  The only
144       times  it  is  run other that by mdadm is when the boot scripts need to
145       restart it after mounting the new root filesystem.
146
147

START UP AND SHUTDOWN

149       As mdmon needs to be running whenever any filesystem on  the  monitored
150       device  is  mounted  there  are  special  considerations  when the root
151       filesystem is mounted from an mdmon monitored  device.   Note  that  in
152       general  mdmon is needed even if the filesystem is mounted read-only as
153       some filesystems can still write to the device in those  circumstances,
154       for example to replay a journal after an unclean shutdown.
155
156       When the array is assembled by the initramfs code, mdadm will automati‐
157       cally start mdmon as required.  This means that mdmon must be installed
158       on  the  initramfs  and  there must be a writable filesystem (typically
159       tmpfs) in which mdmon can create a .pid and .sock file.  The particular
160       filesystem  to  use  is  given to mdmon at compile time and defaults to
161       /dev/.mdadm.
162
163       This filesystem must persist through to shutdown time.
164
165       After the final root  filesystem  has  be  instantiated  (usually  with
166       pivot_root) mdmon should be run with --all --takeover so that the mdmon
167       running from the initramfs can be replaced with one running in the main
168       root, and so the memory used by the initramfs can be released.
169
170       At  shutdown  time,  mdmon  should  not be killed along with other pro‐
171       cesses.  Also as it holds a file (socket actually)  open  in  /dev  (by
172       default)  it  will  not be possible to unmount /dev if it is a separate
173       filesystem.
174
175

EXAMPLES

177         mdmon --all-active-arrays --takeover
178       Any mdmon which is currently running is killed and a  new  instance  is
179       started.   This  should  be  run  during  in  the  boot  sequence if an
180       initramfs was used, so that any mdmon running from the  initramfs  will
181       not hold the initramfs active.
182