cciss_vol_status(8)

1CCISS_VOL_STATUS(8)                                        CCISS_VOL_STATUS(8)
2
3
4

NAME

6       cciss_vol_status  -  show status of logical drives attached to HP Smar‐
7       tarray controllers
8

SYNOPSIS

10       cciss_vol_status [OPTION] [DEVICE]...
11

DESCRIPTION

13       Shows the status of logical drives configured  on  HP  Smartarray  con‐
14       trollers.
15

OPTIONS

17       -p, --persnickety
18              Without  this  option,  device  nodes  which can't be opened, or
19              which are not found  to  be  of  the  correct  device  type  are
20              silently   ignored.    This   lets   you  use  wildcards,  e.g.:
21              cciss_vol_status /dev/sg* /dev/cciss/c*d0, and the program  will
22              not complain as long as all devices which are found to be of the
23              correct type are found to be  ok.   However,  you  may  wish  to
24              explicitly list the devices you expect to be there, and be noti‐
25              fied if they are not there (e.g. perhaps a PCI  slot  has  died,
26              and   the   system   has   rebooted,   so  that  what  was  once
27              /dev/cciss/c1d0 is no longer there at all).   This  option  will
28              cause the program to complain about any device node listed which
29              does not appear to be the right device type, or is not openable.
30
31       -C, --copyright
32              If stderr is a terminal, Print  out  a  copyright  message,  and
33              exit.
34
35       -q, --quiet
36              This  option  doesn't  do  anything.   Previously,  without this
37              option and if stderr is a terminal, a copyright message precedes
38              the  normal  program output.  Now, the copyright message is only
39              printed via the -C option.
40
41       -s     Query each physical drive for  S.M.A.R.T  data  and  report  any
42              drives in "predictive failure" state.
43
44       -u, --try-unknown-devices
45              If  a  device has an unrecognized board ID, normally the program
46              will not attempt to communicate with it.  In case you have  some
47              Smart  Array  controller  which  is newer than this program, the
48              program may not recognize it.  This option permits  the  program
49              to  attempt  to interrogate the board even if it is unrecognized
50              on the assumption that it is in fact a Smart Array of some kind.
51
52       -v, --version
53              Print the version number and exit.
54
55       -x, --exhaustive
56              Deprecated.  Previously, it "exhaustively" searched for  logical
57              drives,  as,  under some circumstances some logical drives might
58              otherwise be missed.  This option no longer  does  anything,  as
59              the  algorithm for finding logical drives was changed to obviate
60              the need for it.
61

DEVICE

63       The DEVICE argument indicates which RAID controller is to  be  queried.
64       Note, that it indicates which RAID controller, not which logical drive.
65
66       For the cciss driver, the "d0" nodes matching "/dev/cciss/c*d0" are the
67       nodes which correspond to the RAID controllers.  (See note  1,  below.)
68       It  is  not  necessary to invoke cciss_vol_status on each logical drive
69       individually, though if you do this, each time it will report the  sta‐
70       tus of ALL logical drives on the controller.
71
72       For  the  hpsa driver, or for fibre attached MSA1000 family devices, or
73       for the hpahcisr sotware RAID driver which emulates Smart  Arrays,  the
74       RAID controller is accessed via the scsi generic driver, and the device
75       nodes will match "/dev/sg*"   Some variants of the "lsscsi"  tool  will
76       easily  identify  which device node corresponds to the RAID controller.
77       Some variants may only report the SCSI nexus (controller/bus/target/lun
78       tuple.)  Some distros may not have the lsscsi tool.
79
80       Executing  the  following  query to the /sys filesystem and correlating
81       this with the contents of /proc/scsi/scsi or output of lsscsi can  help
82       in finding the right /dev/sg node to use with cciss_vol_status:
83
84       wumpus:/home/scameron # ls -l /sys/class/scsi_generic/*
85       lrwxrwxrwx 1 root root 0 2009-11-18 12:31 /sys/class/scsi_generic/sg0 -> ../../devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:03.0/host0/target0:0:0/0:0:0:0/scsi_generic/sg0
86       lrwxrwxrwx 1 root root 0 2009-11-18 12:31 /sys/class/scsi_generic/sg1 -> ../../devices/pci0000:00/0000:00:1f.1/host2/target2:0:0/2:0:0:0/scsi_generic/sg1
87       lrwxrwxrwx 1 root root 0 2009-11-19 07:47 /sys/class/scsi_generic/sg2 -> ../../devices/pci0000:00/0000:00:05.0/0000:0e:00.0/host4/target4:3:0/4:3:0:0/scsi_generic/sg2
88       wumpus:/home/scameron # cat /proc/scsi/scsi
89       Attached devices:
90       Host: scsi0 Channel: 00 Id: 00 Lun: 00
91         Vendor: COMPAQ   Model: BD03685A24       Rev: HPB6
92         Type:   Direct-Access                    ANSI  SCSI revision: 03
93       Host: scsi2 Channel: 00 Id: 00 Lun: 00
94         Vendor: SAMSUNG  Model: CD-ROM SC-148A   Rev: B408
95         Type:   CD-ROM                           ANSI  SCSI revision: 05
96       Host: scsi4 Channel: 03 Id: 00 Lun: 00
97         Vendor: HP       Model: P800             Rev: 6.82
98         Type:   RAID                             ANSI  SCSI revision: 00
99       wumpus:/home/scameron # lsscsi
100       [0:0:0:0]    disk    COMPAQ   BD03685A24       HPB6  /dev/sda
101       [2:0:0:0]    cd/dvd  SAMSUNG  CD-ROM SC-148A   B408  /dev/sr0
102       [4:3:0:0]    storage HP       P800             6.82  -
103
104       From  the  above  you  can  see that /dev/sg2 corresponds to SCSI nexus
105       4:3:0:0, which corresponds to the HP P800  RAID  controller  listed  in
106       /proc/scsi/scsi.
107

EXAMPLE

109            [root@somehost]# cciss_vol_status -q /dev/cciss/c*d0
110            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.
111            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 1 status: OK.
112            /dev/cciss/c0d0: (Smart Array P800) RAID 1 Volume 2 status: OK.
113            /dev/cciss/c0d0: (Smart Array P800) RAID 5 Volume 4 status: OK.
114            /dev/cciss/c0d0: (Smart Array P800) RAID 5 Volume 5 status: OK.
115            /dev/cciss/c0d0: (Smart Array P800) Enclosure MSA60 (S/N: USP6340B3F) on Bus 2, Physical Port 1E status: Power Supply Unit failed
116            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 0 status: OK.
117            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 1 status: OK.
118            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 2 status: OK.
119            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 3 status: OK.
120            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 4 status: OK.
121            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 5 status: OK.
122            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 6 status: OK.
123            /dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 7 status: OK.
124
125            [root@someotherhost]# cciss_vol_status -q /dev/sg0 /dev/cciss/c*d0
126            /dev/sg0: (MSA1000) RAID 1 Volume 0 status: OK.   At least one spare drive.
127            /dev/sg0: (MSA1000) RAID 5 Volume 1 status: OK.
128            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.
129
130            [root@localhost]# ./cciss_vol_status -s /dev/sg1
131            /dev/sda: (Smart Array P410i) RAID 0 Volume 0 status: OK.
132                  connector 1I box 1 bay 1                 HP      DG072A9BB7                               B365P6803PCP0633     HPD0 S.M.A.R.T. predictive failure.
133            [root@localhost]# echo $?
134            1
135
136            [root@localhost]# ./cciss_vol_status -s /dev/cciss/c0d0
137            /dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.
138                  connector 2E box 1 bay 8                 HP      DF300BB6C3                           3LM08AP700009713RXUT     HPD3 S.M.A.R.T. predictive failure.
139            /dev/cciss/c0d0: (Smart Array P800) Enclosure MSA60 (S/N: USP6340B3F) on Bus 2, Physical Port 2E status: OK.
140
141

DIAGNOSTICS

143       Normally,  a logical drive in good working order should report a status
144       of "OK."  Possible status values are:
145
146       "OK." (0) - The logical drive is in good working order.
147
148       "FAILED." (1) - The logical drive has failed,  and  no  i/o  to  it  is
149       poosible.
150              Additionally, failed drives will be identified by connector, box
151              and bay, as well as vendor, model, serial number,  and  firmware
152              revision.
153
154       "Using interim recovery mode." (3) - One or more drives has failed,
155              but  not  so  many that the logical drive can no longer operate.
156              The failed drives should be replaced as soon as possible.
157
158       "Ready for recovery operation." (4) -  Failed drive(s) have been
159              replaced, and the controller is about to begin rebuilding redun‐
160              dant parity data.
161
162       "Currently recovering." (5) - Failed drive(s) have been replaced,
163              and  the  controller  is  currently  rebuilding redundant parity
164              information.
165
166       "Wrong physical drive was replaced." (6) - A drive has failed, and
167              another (working) drive was replaced.
168
169       "A physical drive is not properly connected." (7) - There is some
170              cabling or backplane problem in the drive enclosure.
171
172       (From fwspecwww.doc, see cpqarray project on sourceforge.net):
173              Note: If the unit_status value is 6 (Wrong  physical  drive  was
174              replaced) or 7 (A physical drive is not properly connected), the
175              unit_status of all  other  configured  logical  drives  will  be
176              marked as 1 (Logical drive failed). This is to force the user to
177              correct the problem and to insure that once the problem is  cor‐
178              rected,  the  data  will  not  have  been  corrupted by any user
179              action.
180
181       "Hardware is overheating." (8) - Hardware is too hot.
182
183       "Hardware was overheated." (9) - At some point in the past,
184              the hardware got too hot.
185
186       "Currently expannding." (10) - The controller is currently in the
187              process of expanding a logical drive.
188
189       "Not yet available." (11) - The logical drive is not yet finished
190              being configured.
191
192       "Queued for expansion." (12) - The logical drive will be expended
193              when the controller is able to begin working on it.
194
195       Additionally, the following messages may appear regarding  spare  drive
196       status:
197
198            "At least one spare drive designated"
199            "At least one spare drive activated and currently rebuilding"
200            "At least one activated on-line spare drive is completely rebuilt on this logical drive"
201            "At least one spare drive has failed"
202            "At least one spare drive activated"
203            "At least one spare drive remains available"
204       Active spares will be identified by connector, box and bay, as well
205       as by vendor, model, serial number, and firmware revision.
206
207       For  each logical drive, the total number of failed physical drives, if
208       more than zero, will be reported as:
209
210            "Total of n failed physical drives detected on this logical drive."
211
212       with "n" replaced by the actual number, of course.
213
214       "Replacement" drives -- newly inserted drives that replace a previously
215       failed drive but are not yet finished rebuilding -- are also identified
216       by connector, box and bay, as well as by vendor, model, serial  number,
217       and firmware revision.
218
219       If  the -s option is specified, each physical drive will be queried for
220       S.M.A.R.T data, any any drives in  predictive  failure  state  will  be
221       reported,  identified  by  connector,  box  and bay, as well as vendor,
222       model, serial number, and firmware revision.
223
224       Additionally failure conditions of disk enclosure fans, power supplies,
225       and temperature are reported as follows:
226
227            "Fan failed"
228            "Temperature problem"
229            "Door alert"
230            "Power Supply Unit failed"
231

FILES

233       /dev/cciss/c*d0 (Smart Array PCI controllers using the cciss driver)
234       /dev/sg*  (Fibre  attached  MSA1000  controllers  and  Smart Array con‐
235       trollers using the hpsa driver or hpahcisr software RAID driver.)
236

EXIT CODES

238       0 - All configured logical drives queried have status of "OK."
239
240       1 - One or more configured logical drives  queried  have  status  other
241       than "OK."
242

AUTHOR

244       Written by Stephen M. Cameron
245

REPORTING BUGS

247       MSA500 G1 logical drive numbers may not be reported correctly.
248
249       I've seen enclosure serial numbers contain garbage.
250
251       Report bugs to <steve.cameron@hp.com>
252

COPYRIGHT

254       Copyright © 2007 Hewlett-Packard Development Company, L.P.
255       This is free software; see the source for copying conditions.  There is
256       NO warranty; not even for MERCHANTABILITY or FITNESS FOR  A  PARTICULAR
257       PURPOSE.
258

note 1

263       The  /dev/cciss/c*d0  device  nodes of the cciss driver do double duty.
264       They serve as an access point to both the RAID controllers, and to  the
265       first   logical   drive   of  each  RAID  controller.   Notice  that  a
266       /dev/cciss/c*d0 node will be present for each  controller  even  if  no
267       logical  drives are configured on that controller.  It might be cleaner
268       if the driver had a  special  device  node  just  for  the  controller,
269       instead  of making these device nodes do double duty.  It has been like
270       that since the 2.2 linux kernel timeframe.  At that time, device  major
271       and  minor nodes were statically allocated at compile time, and were in
272       short supply.  Changing this behavior at this point would break lots of
273       userland programs.
274
275
276
277cciss_vol_status (ccissutils)      Nov 2009                CCISS_VOL_STATUS(8)