wiki:SmartArray

Version 24 (modified by Adam Cécile, 11 years ago) ( diff )

--

HP/Compaq SmartArray series

hp_logo



1. Vendor information

Theses HP/Compaq cards are usually well supported on Linux.
Proliant servers usually (always?) have this kind of controllers.
HP supports Linux and provide an opensource kernel driver which has been part of Linux for ages.


2. Linux kernel drivers

There is only one drivers to handle all cards:

Driver Supported cards
cciss All SmartArray

You should not expect any problems with theses drivers which are known to be mature and stable.
We don't know any current Linux distrubtion which miss theses drivers so no additional step should be required to get it working.

Some lspci -nn output examples:

  • 02:04.0 RAID bus controller [0104]: Compaq Computer Corporation Smart Array 64xx [0e11:0046] (rev 01)
  • 0f:00.0 RAID bus controller [0104]: Hewlett-Packard Company Smart Array Controller [103c:3230] (rev 03)

2.1. Devices nodes note

This driver doesn't use the regular SCSI stack, so don't expect to find your logical drives as /dev/sdX.
Here is an example:

server:~# ls -lah /dev/cciss/
total 0
drwxr-xr-x  2 root root     160 2008-08-28 14:36 .
drwxr-xr-x 14 root root    3,1K 2008-08-28 14:36 ..
brw-rw----  1 root disk 104,  0 2008-08-28 14:36 c0d0
brw-rw----  1 root disk 104,  1 2008-08-28 14:36 c0d0p1
brw-rw----  1 root disk 104,  2 2008-08-28 14:36 c0d0p2
brw-rw----  1 root disk 104,  3 2008-08-28 14:36 c0d0p3
brw-rw----  1 root disk 104, 16 2008-08-28 14:36 c0d1
brw-rw----  1 root disk 104, 17 2008-08-28 14:36 c0d1p1

/dev/c0d0 means logical drive 0 on controller 0. /dev/c0d0p1 means first partition of logical drive 0 on controller 0.
In exemple /dev/sda is /dev/c0d0 here. /dev/sdb1 is /dev/c0d1p1.


3. Management and reporting tools

Both opensource and proprietary tools currently exist.
Both are edited by HP/Compaq but the opensource one (cciss-vol-status) only covers monitoring.
The proprietary one (hpacucli) adds management features.

3.1. cciss-vol-status

3.1.1. Quickstart guide

There's not much options for cciss_vol_status so let's have a look to an example first:

vmware:~# cciss_vol_status /dev/cciss/c*d0
/dev/cciss/c0d0: (Smart Array 6i) RAID 1 Volume 0 status: OK.
/dev/cciss/c0d0: (Smart Array 6i) RAID 1 Volume 1 status: OK.

This command will print status of all arrays. You need to query all controllers (if you have more than one) but query the first logical drive only is fine (otherwise you will have duplicated lines).
That's why you should really use /dev/cciss/c*d0.

Please have a look to the manpage to read more about the possible status.
In example:

"OK." (0) - The logical drive is in good working order.

"FAILED." (1) - The logical drive has failed, and no i/o to it is poosible.

"Using interim recovery mode." (3) - One or more drives has failed,
       but not so many that the logical drive can no longer operate.  The failed drives should be replaced as soon as possible.

"Ready for recovery operation." (4) -  Failed drive(s) have been
       replaced, and the controller is about to begin rebuilding redundant parity data.

"Currently recovering." (5) - Failed drive(s) have been replaced,
       and the controller is currently rebuilding redundant parity information.
[...]

You may have to add /dev/sgX as command parameters to query a external Fibre Channel disk bay.
However I don't have such hardware to give a try, so you should refer to the manpage if you need this feature. Please open a ticket and adds some output example if you have such hardware, so I'll be able to update this page (thanks ;))

3.1.2. My opinion about cciss-vol-status

Despites this tools doesn't provide any high-end feature and can't show anything about physical disks, it's nice for two reasons:

  • According to the manpage it can reports many kind of issues and is much more powerfull than a tools that just tells good or bad
  • It's looks safe to rely on it as it's edited by HP

3.1.3. Periodic checks

The package for our repository comes with integration that periodically run the script through an initscript.
It reports failures by mail and syslog. It also handle unexpected output changes and reminders until the status is fine again. You don't have anything to do to get it working, just install cciss-vol-status package.

If you have more than one controller you will have to create /etc/default/cciss-vol-statusd and fill it with:

ID=/dev/cciss/c*d0

3.2. hpacucli

This tool is a proprietary one created by HP. It can do both reporting and management.

3.2.1. Quickstart guide

List all controllers:

server:~# hpacucli controller all show

Smart Array 6i in Slot 0      ()

List arrays on controller in slot 0:

server:~# hpacucli ctrl slot=0 logicaldrive all show status

logicaldrive 1 (33.9 GB, RAID RAID 1+0):  OK
logicaldrive 2 (136.7 GB, RAID RAID 1+0):  OK

I don't know why it reports RAID 1+0. This is regular RAID1 arrays.

List physical drives on controller in slot 0:

server:~# hpacucli ctrl slot=0 pd all show status

physicaldrive 1:0 (port 1:id 0, 36.4 GB): OK
physicaldrive 1:1 (port 1:id 1, 36.4 GB): OK
physicaldrive 1:2 (port 1:id 2, 146.8 GB): OK
physicaldrive 1:3 (port 1:id 3, 146.8 GB): OK

Summarized status:

server:~# hpacucli ctrl slot=0 show config

Smart Array 6i in Slot 0      ()

   array A (Parallel SCSI, Unused Space: 0 MB)

      logicaldrive 1 (33.9 GB, RAID 1+0, OK)

      physicaldrive 2:0   (port 2:id 0 , Parallel SCSI, 36.4 GB, OK)
      physicaldrive 2:1   (port 2:id 1 , Parallel SCSI, 36.4 GB, OK)

   array B (Parallel SCSI, Unused Space: 0 MB)

      logicaldrive 2 (136.7 GB, RAID 1+0, OK)

      physicaldrive 2:2   (port 2:id 2 , Parallel SCSI, 146.8 GB, OK)
      physicaldrive 2:3   (port 2:id 3 , Parallel SCSI, 146.8 GB, OK)



Controller policies (write cache, disk cache, battery), only interesting lines kept:

root@server:~# hpacucli ctrl slot=0 show 

Smart Array P420i in Slot 0 (Embedded)
   Serial Number: *SERIAL*
   Controller Status: OK
   Firmware Version: 3.54
   Cache Board Present: True
   Cache Status: OK
   Cache Ratio: 25% Read / 75% Write
   Drive Write Cache: Disabled
   Total Cache Size: 512 MB
   No-Battery Write Cache: Disabled
   Cache Backup Power Source: Capacitors
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK

Cache is ok, Battery is too. Write cache disabled if battery back isn't enabled, that's ok.

Check and enable cache for all arrays:

Check current state:

root@server:~# hpacucli ctrl slot=0 ld all show detail

Smart Array P420i in Slot 0 (Embedded)

   array A

      Logical Drive: 1
         Size: 136.7 GB
         Fault Tolerance: RAID 1
         Status: OK
         Caching:  Disabled

Enable caching:

root@server:~# hpacucli ctrl slot=0 ld all modify arrayaccelerator=enable

Enable disks' write cache:

root@server:~# hpacucli ctrl slot=0 modify dwc=enable

Warning: Without the proper safety precautions, use of write cache on physical 
         drives could cause data loss in the event of power failure.  To ensure
         data is properly protected, use redundant power supplies and
         Uninterruptible Power Supplies. Also, if you have multiple storage
         enclosures, all data should be mirrored across them. Use of this
         feature is not recommended unless these precautions are followed.
         Continue? (y/n) y

Warning is self-explaining I guess. Disks's cache aren't protected by controller's battery. It's up to you but I wouldn't enable such features if your power supply isn't protected.

Modify cache ratio between read and write:

root@server:~# hpacucli ctrl slot=0 modify cacheratio=50/50

Attachments (1)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.