Intel Software RAID Driver (iswraid)
====================================
Overview
Intel Software RAID driver works in conjunction with the Intel RAID Option
ROM, distributed with most (but not all) ICH5R/ICH6R/ICH7R chipsets. It
understands the Intel RAID metadata and allows booting from RAID volumes,
regardless of their RAID level. It is useful when there is a need for
compatibility with other operating systems using these RAID volumes.
License, Copyright, Authors
Copyright (C) 2003,2004,2005 Intel Corporation. All rights reserved.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any later
version.
You should have received a copy of the GNU General Public License (for
example /usr/src/linux/COPYING); if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Authors:
Boji Tony Kannanthanam < boji dot t dot kannanthanam at intel dot com >,
Martins Krikis < martins dot krikis at intel dot com >.
Features
This driver is an ataraid subdriver, albeit utilizing a very minimal set
of facilities provided by it. There are several features that currently
distinguish iswraid from other ataraid subdrivers:
* it scans the Linux SCSI subsystem's disks looking for the Intel RAID
metadata instead of IDE disks;
* it notices and reports I/O errors for its RAID volumes;
* it updates the Intel RAID metadata when necessary upon errors, thus
causing volumes to become degraded or even failed, and this status
is persistent across reboots and operating system changes;
* it provides a user interface via the /proc filesystem that allows the
inspection of the status of its RAID arrays, disks and volumes;
* it has several module load-time parameters that influence its behavior
and configurable compile-time defaults for these parameters;
* when necessary to split an I/O request, it does so on natural strip
boundaries;
* it uses slab caches for efficiency;
* it cleans up its own entries in devfs and /proc/partitions on exit;
* it "claims" the disks with Intel RAID metadata for itself, by making
them less convenient to use directly;
* it generally does a lot of things its own way, thus avoiding any existing
problems specific to ataraid subdrivers (and possibly introducing its own).
While they may or may not be distinguishing features, iswraid also:
* supports RAID0 (striping) over n-disk volumes;
* supports RAID1E (mirroring with striping) over n-disk volumes---this is
equivalent to RAID1 for 2-disk volumes and to RAID10 for 4-disk volumes;
* supports multiple volumes per array ("Matrix RAID");
* deals with missing disks in a reasonable manner;
* can operate with volumes in degraded mode (unless instructed not to);
* implements disk error thresholds;
* tries to satisfy failed RAID1E reads using each failed disk's mirror.
Requirements and Installation
Intel RAID metadata is generally created using the Intel RAID OROM. Most
mainboards based on Intel chipsets with ICH5R/ICH6R/ICH7R southbridges
have this OROM. The "RAID" mode needs to be selected in BIOS configuration
to enable the RAID OROM. The ICH5R/ICH6R/ICH7R are Serial ATA controllers
and iswraid depends on the ata_piix or ahci driver, either of which can
present the SATA disks as SCSI devices to the Linux kernel. Thus, the basic
requirements for using this driver are:
* Intel RAID OROM (or Intel RAID metadata already created on disks);
* ata_piix or ahci driver (or any other driver that can present disks
with Intel RAID metadata as SCSI devices);
* ataraid (comes standard with 2.4 kernels).
For older 2.4 series kernels, unless your kernel source came with libata,
please install it before installing iswraid.
The iswraid driver should compile cleanly for all 2.4 series kernels
but has seen more testing with 2.4.22 and above kernels, and such
kernels have the BH_Sync buffer_head flag that this driver likes to use.
In your kernel configuration file you should have "Support for IDE RAID
controllers" (CONFIG_BLK_DEV_ATARAID) and "Support for Intel software RAID"
(CONFIG_BLK_DEV_ATARAID_ISW) enabled (as modules or statically linked,
it does not matter). You should also enable the driver that will present
the disks with Intel RAID metadata as SCSI disks. Normally this means
enabling "Serial ATA (SATA) support" (CONFIG_SCSI_SATA) and either "Intel
PIIX/ICH SATA support" (CONFIG_SCSI_ATA_PIIX) or "AHCI SATA support"
(CONFIG_SCSI_SATA_AHCI). Obviously, SCSI support and SCSI disk
support are also necessary.
Note that the iswraid driver is built as part of the Linux SCSI subsystem,
not as part of the IDE modules because when statically linked it needs to
be initialized after the SCSI subsystem. When loading it as a module,
you should load the scsi low level driver first (ata_piix or ahci,
typically).
Please pay special attention to whether all the necessary disks are
visible by the lower level driver. There can be some unwanted consequences
if iswraid is loaded when not all disks are available to it. Please read
below for how to use one of the module parameters as an additional safety
measure in this situation.
If all the SCSI drivers are built as modules and module dependencies are
current (do "depmod -a"), it is possible to cause the low level driver to
be loaded on demand when loading iswraid. For this, add a line like
alias scsi_hostadapter ata_piix
to your /etc/modules.conf file or to any files that participate in
generating this file (such as /etc/modutils/* or /etc/modprobe.d/*,
depending on your distribution and how recent the modutils package is).
Please only do so once you have made sure that the lower level driver
can access all the necessary devices.
When the iswraid driver runs, it scans the Linux SCSI subsystem and makes
the Intel RAID volumes available as ataraid devices. Their device nodes
typically are called /dev/ataraid/d0, /dev/ataraid/d1, etc. The individual
partitions on disk dX (where X is 0, 1, ...) are typically named
/dev/ataraid/dXpY (where Y is 1, 2, ...). These details may be distribution-
specific; the nodes can be created if necessary---ataraid's major number
is 114 and minor numbers from 16 * X to 16 * X + 15 (where X = 0, 1, ...)
belong to the same volume. Numbers in the form 16 * X are for the whole
volumes, numbers in the form 16 * X + Y (where Y > 0) are for partition Y
of volume X. For example:
mkdir /dev/ataraid
mknod /dev/ataraid/d2 b 114 32
mknod /dev/ataraid/d2p8 b 114 40
When modifying LILO configuration file for booting from volumes, it may
help to use lines like:
disk=/dev/sda
inaccessible
in order to tell the map installer to not bother with direct access to
the disks. It may also be necessary sometimes to specify how BIOS will
be seeing the disks, e.g.:
disk=/dev/ataraid/d0
bios=0x80
disk=/dev/hda
bios=0x81
Module Parameters
Iswraid recognizes a few module load time parameters, explained below.
* iswraid_dont_claim
Normally set to CONFIG_ISWRAID_DONT_CLAIM which is 0 unless defined
otherwise, i.e., normally not enabled. Iswraid claims for RAID all
the disks containing valid Intel RAID metadata. "Claiming" in this
case means invalidating the existing buffers for these disks, deleting
any mention of these disks from devfs and deleting disk partition entries
from /proc/partitions. These operations are not undone on iswraid unload.
The entries for disks themselves remain and the disks are still usable via
static device nodes such as /dev/sda. (Making them truly unusable interferes
with iswraid's own operation.) When this option is enabled, iswraid does not
claim any disks for RAID.
* iswraid_halt_degraded:
Normally set to CONFIG_ISWRAID_HALT_DEGRADED which is 0 unless defined
otherwise, i.e., normally not enabled. This option, when enabled, causes
iswraid to stop using RAID1E (and that includes the normal RAID1 and
4-disk RAID10, too) volumes that are degraded. It will instead fail all
I/O requests for such volumes. This parameter also has a useful side
effect on RAID metadata updates done at startup, which is described in
detail later in this document.
* iswraid_resist_failing:
Normally set to CONFIG_ISWRAID_RESIST_FAILING which is 0 unless defined
otherwise, i.e., normally not enabled. When a RAID1E (including normal
RAID1 and 4-disk RAID10) volume is already degraded, a failed write or
exceeding the disk error threshold can cause it to become failed and
this is the default and generally expected behavior (except for some
lucky many-disk RAID1E cases where several disks can fail safely without
losing the ability to restore data). When this parameter is set, however,
iswraid will try to not mark the disk and the RAID1E volumes containing
it as failed. Instead it will merely fail the I/O that exposed the disk
problem. Some people may prefer this behavior because it always makes it
clear which disk (or set of disks) have the more up-to-date data and
thus should be used to recover the failed disk(s). Please note however
that the state of other volumes containing the failing disk may dictate
that the disk really is to be marked as failed and therefore the states
of all volumes containing it adjusted accordingly. This may cause the
intentions of this option to be overruled and thus RAID1E volumes can
become failed despite this option being enabled..
* iswraid_error_threshold:
Set to CONFIG_ISWRAID_ERROR_THRESHOLD, which is 10 by default. Iswraid
counts the errors on each disk and if they exceed this threshold, it marks
the disk as failed. This could cause the volumes containing the disk to
become degraded or failed (depending on RAID levels and other module load
parameters). Setting this value to 0 disables checking the error counts on
disks. The error counts are not persistent.
Proc Filesystem
The iswraid driver can output information about the state of Intel RAID
arrays, disks and volumes through the /proc filesystem. Each /proc file
generated by iswraid has a header line starting with '#' and containing
space-separated field names. The following lines each correspond to
one object (array, disk or volume) being listed and their fields are
tab-separated. Each of these real data lines is also associated with an
implicit index (starting at 0) and the objects cross-reference each other
using these indices.
In order to query the iswraid arrays, do "cat /proc/iswraid/arrays". Here
is a sample output:
# family generation numdisks numvolumes disks volumes
3e37c9ab 78 2 2 0,2 0,1
3a57e490 74 2 2 1,3 2,3
The first field is the "array family number", which basically distinguishes
each array from any other. The second field is the "array generation number"
that shows how many times this array's metadata have been written out to its
disks. The next fields give the number of disks and volumes in the array,
respectively. The final two fields give comma-separated listings of
disks and volumes that this array contains. The disks and volumes
are given by their implicit indices in the disk and volume listings.
In order to query the disks, do "cat /proc/iswraid/disks". Here is a sample
output:
# major minor status errorcount array serial
8 0 0x13a 0 0 3JT3L0J2
8 16 0x13a 0 1 3JT3LCX6
8 32 0x13a 0 0 3JT3KXRX
8 48 0x13a 0 1 3JT3FX3X
The first two fields are the major and minor numbers of the block devices
corresponding to the disks. The status field is next (the status field
has many bits, not all of which are actually used by iswraid). Each
disk's error count follows. The next field shows which array the disk
belongs to, using the implicit array indices. The last field gives each
disk's serial number (possibly altered by iswraid to strip spaces and
non-printable characters).
The likely most useful information comes from the volume listing, which
can be obtained by doing "cat /proc/iswraid/volumes". A sample output
looks like this:
# node state degradedbits refcnt raidlevel sectors blocksperstrip pbaoflba0 numdisks array disks serial
d0 0x0 0x0 0 0 104026112 8 0 2 0 0,2 RAID_Volume0
-- 0x1 0x0 0 1 104287744 256 52013056 2 0 0,2 RAID_Volume1
-- 0x1 0x0 0 1 104026112 256 0 2 1 1,3 RAID_Volume2
d1 0x0 0x0 0 0 104549888 8 104026112 2 1 1,3 RAID_Volume3
The first field gives the ataraid device name that the volume corresponds
to. (Actually, the driver does not know the name, but if ataraid device
nodes are created in the usual manner described above, the dX should be
accurate.) If the volume is in use, it will have an ataraid device
corresponding to it, and this field will show dX (where X is 0, 1, ...).
If the volume is disabled (this only happens if it is "a hopeless volume"
on iswraid startup), then it will not have a corresponding ataraid device
and this field will be "--". When a volume gets disabled, iswraid prints
the reason for this action, so you can check the kernel log.
The second field gives volume state, which is a bitfield; ideally no bits
should be set. The third field, degradedbits, is a bitfield identifying any
disks that are degraded (and thus not in use by RAID1E volumes). The next
field, refcnt gives the number of references to this volume (how many times
its block device has been opened). The RAID level is next, 0 or 1 (and
RAID10 or multi-disk RAID1E are all listed as raid level 1). The total sector
count and blocks per strip follow. The "physical block address" of volume's
"logical block address 0" tells where (in each of its constituent disks) the
volume begins. Next comes the number of disks the volume contains (which in
theory could be less than the number of disks in the array) and the implicit
array index. The next-to-last field is a comma-separated list of the disks
that the volume contains, using the indices that are implicit in the disk
listing. Please note that this order may be different from the order in
which the volume's array lists the disks. Finally, we have the "serial
number" (symbolic name) of the volume in the last field.
The array, disk and volume indices are not present in the output
intentionally, in order to save space. Any user-space tools processing
these /proc files can easily generate these missing indices and thus
be able to cross reference the data from all 3 files.
Intel RAID Metadata Updates
The iswraid driver is relatively reluctant to update the Intel RAID
metadata. There are a couple of situations when it considers updating
the metadata, explained below.
It normally does update the metadata in error cases, to mark the disks
that have failed and volumes that have changed their state. Sometimes
this can be suppressed, however, by the use of the iswraid_resist_failing
parameter and some luck. If there are no volumes that need to change their
state, the RAID metadata will be unchanged despite I/O errors.
It will also update the metadata when a formerly missing disk is found.
Unless the Intel RAID Option ROM is misbehaving, however, this should
be hard to observe. This update can only be done on module startup.
Finally, iswraid may update the RAID metadata if a disk needed by some
RAID volumes is missing. RAID0 volumes will simply be disabled in this
case (without marking them failed in the RAID metadata), but RAID1E volumes
would become degraded or failed. This update, too, can only happen during
module startup, not during its operation. Furthermore, in the typical
case of loading iswraid after OROM has updated the metadata, the disk
should already be marked as missing, so iswraid will not have to do it.
The last update scenario _could_ unfortunately come up when it really
should not---it could be caused by the lower level driver (e.g.,
ata_piix) not seeing all the disks that it should be seeing. For example,
if 4 disks are plugged into an ICH6R-based mainboard and the OROM sees
them all but iswraid is given only 2 of them by the lower level driver
to work with then many volumes could be missing disks and requiring RAID
metadata updates. Performing such updates would not be helpful overall
because they would later require lengthy array rebuild operations
(to be done with the help of OROM and other operating systems or by
using user-space utilities such as dd and your favorite hex editor).
This situation is where the above mentioned "iswraid_halt_degraded"
parameter can be used as an insurance against needless metadata updates.
It is now explained how.
If iswraid_halt_degraded is set, iswraid will realize that it cannot
use the volumes requiring the missing disks because they are either the
disabled RAID0 volumes or the degraded-or-failed (but definitely not usable)
RAID1E volumes. Because of this, it will skip updating the RAID metadata
because it has no volumes to work with anyway. Therefore, for the first
invocation of iswraid it is recommended to do it with the parameter
iswraid_halt_degraded set to 1 for safety. This way, even if only some
disks are found, the RAID metadata on disks will be unaltered.
Download Driver Pack
After your driver has been downloaded, follow these simple steps to install it.
Expand the archive file (if the download file is in zip or rar format).
If the expanded file has an .exe extension, double click it and follow the installation instructions.
Otherwise, open Device Manager by right-clicking the Start menu and selecting Device Manager.
Find the device and model you want to update in the device list.
Double-click on it to open the Properties dialog box.
From the Properties dialog box, select the Driver tab.
Click the Update Driver button, then follow the instructions.
Very important: You must reboot your system to ensure that any driver updates have taken effect.
For more help, visit our Driver Support section for step-by-step videos on how to install drivers for every file type.