RAID Documentation

From Research
Jump to navigation Jump to search

Raid Usage

In the Early Days <c> we used Arena EX3 external SCSI-attached RAID arrays. These external units featured dual-redundant power-supplies, and held six IDE drives. Later, we found we could use SATA drives, with a low-profile SATA-to-PATA (IDE) interface, and a little bit of connector trimming. This boosted the capacity for these external SCSI devices, but only to a point - the limitations of the internal 32-bit controller seemed to cap the capacity at 2TB.

So, we phased out these Arena EX3 RAID arrays, in favour of big Chenbro rack-mounted chassis, each capable of holding 16 drives. A 3Ware controller with 16 ports addressed these drives, which are hot-pluggable in the Chenbro chassis + drive-bay-backplane.

Our Chenbro-chassis in-house-built RAID arrays are found on Spitfire and Hurricane. Here is how we use them now:

Spitfire

Controller /c1 contains two RAID arrays:

  • /u0 for /home/users
    • /c1/u0 shows up as a single partition /dev/sda1, mounted at /home/users
    • formed from 14x 500GB drives, occupying physical slots p2-p15. RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
    • uses XFS filesystem


  • /u1 for the Gentoo GNU/Linux operating system
    • /c1/u1 shows up as three partitions, following our classical layout:
      • /dev/sdb1 is mountable at /boot
      • /dev/sdb2 is swap
      • /dev/sdb3 is /, type ext3
    • formed from 2x 150GB drives, occupying physical slots p0-p1. RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)
spitfire ~ # tw_cli /c1 show 

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    OK             -       -       64K     6053.47   ON     OFF    
u1    RAID-1    OK             -       -       -       139.688   ON     OFF    

VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u1   139.73 GB SATA  0   -            WDC WD1500ADFD-00NL
p1    OK             u1   139.73 GB SATA  1   -            WDC WD1500ADFD-00NL
p2    OK             u0   465.76 GB SATA  2   -            WDC WD5000ABYS-01TN
p3    OK             u0   465.76 GB SATA  3   -            ST3500320NS
p4    OK             u0   465.76 GB SATA  4   -            ST3500320NS
p5    OK             u0   465.76 GB SATA  5   -            ST3500320NS
p6    OK             u0   465.76 GB SATA  6   -            ST500NM0011
p7    OK             u0   465.76 GB SATA  7   -            ST3500320NS         
p8    OK             u0   465.76 GB SATA  8   -            ST500NM0011
p9    OK             u0   465.76 GB SATA  9   -            ST3500320NS
p10   OK             u0   465.76 GB SATA  10  -            ST3500320NS
p11   OK             u0   465.76 GB SATA  11  -            ST3500320NS
p12   OK             u0   465.76 GB SATA  12  -            ST3500320NS
p13   OK             u0   465.76 GB SATA  13  -            ST500NM0011
p14   OK             u0   465.76 GB SATA  14  -            ST3500320NS
p15   OK             u0   465.76 GB SATA  15  -            ST3500320NS

Name  OnlineState  BBUReady  Status    Volt     Temp     Hours  LastCapTest
---------------------------------------------------------------------------
bbu   On           Yes       OK        OK       OK       229    06-Nov-2011


Hurricane

Controller /c1 contains two RAID arrays:

  • /u0 for projects, which includes SVN, CVS, software-deployments, Amanda-holding-disk, projects/infrastructure (containing eBooks, docs, web_content, scripts and some backups
    • /c1/u0 shows up as a single partition /dev/sda1, mounted at /mnt/raid
    • formed from 14x 500GB drives, occupying physical slots p2-p15. RAID-5 capacity is n-1, for a nominal 6.5TB (6TiB - the reported capacity)
    • uses XFS filesystem
    • Usage, and breakdown as of June 2012:
hurricane raid # cd /mnt/raid/ ; du -h --max-depth=1
15G	./svn
4.0K	./holding
3.5G	./cvs
214G	./projects
763G	./software
995G	.


  • /u1 for the Gentoo GNU/Linux operating system
    • /c1/u1 shows up as three partitions, following our classical layout:
      • /dev/sdb1 is mountable at /boot
      • /dev/sdb2 is swap
      • /dev/sdb3 is /, type ext3
    • formed from 2x 150GB drives, occupying physical slots p0-p1. RAID-1 (mirror) capacity is nominal 150GB (140GiB reported)


Musashi

/mnt/raid4
  • Gentoo GNU/Linux Mirror

RAID Maintenance

We have drive-failures periodically - reported both through Nagios and via daily logwatch-emails. These must be replaced promptly, to avoid data-loss! The Chenbro chassis supports drive-hot-swap, and it used to be that the 3Ware controller would then automatically commence a RAID-rebuild. Lately, however, the replaced drive is showing up as a new Unit, u?. Not helpful :-( but here's what to do:

spitfire ~ # tw_cli
//spitfire> /c1 rescan     will find the replaced drive, assign it to a new Unit u2
//spitfire> /c1/u2 del     do not use remove; del will keep the drive, but un-assign it
//spitfire> maint rebuild c1 u0 p15     example, to add replaced drive p15 into Unit u0 on Controller c1

Redistributing users across the Raids

To move a user from one raid to another:

  1. Make sure the user is not logged in anywhere
  2. Move the files from one raid to another in the appropriate directory for example
    • mv /mnt/raid0/home/m/mdeepwel /mnt/raid1/home/m
  3. Update LDAP with the new root location
    • In ou=AutoFS,ou=home.users,cn=username,automountInformation, and update to new location. (using phpldapadmin is fine)
  4. Restart autofs on any computer or service the user could be using, otherwise they will be able to login, but won't have a home directory.
    • If they use the cluster, you must restart autofs on each node, or restart the whole cluster.
    • Restart teleport, so they can use ssh/ftp to get their files from off site.
  5. Update amanda to make sure the new user directory is backed up.