Sunday, November 28, 2010

to RAID or not RAID.... what was the question again?

Hi everyone,
In my basement setup I've got a nice virtualized setup. A single VMWare ESXi4 server boots form a 4gb pen drive and then connects via iSCSI.
Supporting the connections is a SMC Gb switch.
Supporting data are 2 QNAP NAS. A 6 drives 639 and a 4 drives 459 pro.
Each one is fully populated with 2Tb Samsung Hardrives and WesternDigital.
The 639 is configured with a raid5 of 4x 2Tb 7200prm 32mb cache Samsung HD and 2 separate 1Tb drives
The 459 is configured with a raid5 of 4x 2Tb 7200prm 32mb cache Samsung HD 
On top of this the 639 has:
 - 1 320Gb USB
 - 3 1.5Tb USB
 - 1 750Gb USB
 - 1 1,5Tb eSATA
 - 1 2Tb eSATA
The 459 has:
 - 4 1Tb USB

 - 1 1,5Tb eSATA
 - 1 2Tb eSATA

The event:
Power outage over the 10minute UPS threshold, then another one while the NAS was booting up again. Portuguese power company is probably the worse in existence. I pay 200€/month for my house power supply (yet 200€/month) and I still get frequent power outages and every time I file a complaint... it's someone else's fault but theirs).

The consequences:
The 459 NAS hold out fine as is consumes less power so is never shutdown between outages.
The 639 came up after 2 abrupt shutdowns with a 1 disk failure. The hard drive was fine but it just didn't rebuild. I then changed the hard drive and the NAS declared it as a spare and switched the mode to degraded and made everything read-only.

The solution:
After having the NAS in degraded-readonly mode, all I could do was to backup my data and rebuild the RAID from scratch.

The problem:
1 - 4,7Tb of data are not an easy backup.
2 - most of it was in a iSCSI virtual volume.
3 - The data inside the virtual volume is formated in a VMFS partition.

So how do you solve this:
1 - Go buy 2x 2Tb USB HD
2 - Backup everything into the existing USB HD
3 - Clear the RAID, test the drives independently, replace if the drive actually has problems (some QNAP reports aren't fully accurate on drive problems).
4 - Re-create the RAID
5 - Move the iSCSI VMFS drive contents to the now clean RAID.
6 - The most important part: setup full constant replica's between the two devices.

Not quite. The first EMail was an obvious... backup your data and do-it all over again. After a twisted reply from my side ("is this what you think of how a RAID works? a drive fails the new one doesn't rebuild and it's a backup and do-it all over again time? should I start looking for an alternative solution and return my QNAPS?") Then the reply was different. They asked for a link so they could remotely access and analyse the problem. It happened however 1 week after the event... and that's everything but a good support. The conclusion was that I had to backup the NAS and re-do...jezz thanks QNAP, that's really pro!

My Setup Today:
NAS01                                                                      NAS02
  RAID5 (SATA2,3,4,5)                                             RAID5(SATA1,2,3,4)
      VMachinesDepot1 <-------------------------------------> ReplicaOfVmachinesDepot1
      ReplicaOfVmachinesDepot2 <------------------------> VMachinesDepot2
      LocalWorkShare <---------------------------------------> ReplicaOfLocalWorkShare
      ReplicaOfVPNWork <----------------------------------> VPNWork
      FamilyMedia <--------------------------------------------> ReplicaOfFamilyMedia
  eSATA1                                                                     eSATA1
      TVSeries                                                                    Movies
  USB1                                                                         USB1
      BackupsFromClients                                                  BackupsFromServers
  USB2                                                                         USB2
      KidsMovies                                                                MusicAndVideos
  eSATA2                                                                     eSATA2
      Downloads                                                                 Temp
  USB3                                                                         USB3
      Software                                                                      Library
  SATADisk1                                                              USB4
      CompanyWork <------------------------------------------->ReplicaCompanyWork

So in conclusion:
 Use the RAID5 for Safety...and don't trust it, so replicate to the other RAID.
 So in short... today Hardware RAID is actually a Software RAID with a specific hardware appliance and controlled software packaging and configuration. QNAP (and the other players like it) can sell systems that look professional, but in truth they are not different from a standard PC with a SATA card, running linux and MDADM, with a simple interface and a fancy hotplug drawer system.
 Support is as "pro" as the standard product is.
Not a bad product, especially price-wise, and that's the line you should think of. It's a great solution, not because it costs 1/4 or less, but rather because that 1/4 price tag allows you to buy twice as much as you intended and have full redundancy. So in truth it's NOT as cheap as it claims to be, but taking some simple precautions it can be 1/2 of what a pro system would cost.