Tuesday, June 30, 2009

SVM Mirror

An SVM Mirror is good for a couple of reasons ...

1. Faster reads (Data gets pulled off either drive to fulfill a READ request)
2. Redundancy, (If one drive fails, you don't lose data; make sure you replace it quick though, right?)


There are CONS too ...

1. Slower WRITE transactions
2. Two drives yields one drive of capacity


... but like the old saying goes, "Fast, reliable, inexpensive. Pick any two!"

When setting up SVM you need to do a number of things:

1. Setup the partitions on each drive (with the "# format" command)
2. Setup the State Database Replicas (SDR)
3. Create the Mirrors and Submirrors
4. Link them
5. Sync them
6. Test


Our example here will assume this ...

- DRIVE 0 = c1d0 = 2 SDR's
- DRIVE 1 = c2d0 = 2 SDR's


Now there are two issues with this SDR setup ...

CASE 1. During operation, what happens when a drive "fails".
CASE 2. During a reboot, What happens with "one" failed drive.


For CASE 1 ...

During operation, if either drive fails, the system will auto fail-over to the operating drive and continue normal operation. A message will likely be posted in syslog, and "SysAdmin intervention" will be required to fix the problem. Fixing can happen at your leisure, but during this time there is "NO REDUNDANCY". Search this document for "SysAdmin intervention" and you can find out what you need to do.

For CASE 2 ...

During a reboot, with either drive in a "Failed" status, "Sysadmin intervention" is required to fix things.

NOTE: SysAdmin intervention required means to delete references to SDR's on the bad disk to make the system bootable.

(This is the best we can do with a 2 drive setup. To understand why, read "Understanding the Majority Consensus Algorithm" and "Administering State Database Replicas" in the SVM manual. The best setup is an "ODD drive SVM array". (Example: 3 drives, 3 SDR's with one per drive, or 6 SDR's with 2 per disk for further redundancy.)

OVERVIEW

STEP #1 -> Repartition Drives to accommodate State Database Replica partitions

For a 320GB drive each cylinder is about 8MB. I chose 2 cylinders for each SDR. (Slice 6; size = 2 cyl = 16MB; UFS requires minimum 10MB per partition)

Here is the partition table of DRIVE 0 (320GB) ...

Volume: DRIVE0 Current partition table (original): Total disk cylinders available: 38910 + 2 (reserved cylinders)

Partition Tag Flag Cylinders Size Blocks

0 root wm 526 - 5750 40.03GB (5225/0/0) 83939625 -> / (ROOT)
1 swap wu 3 - 525 4.01GB (523/0/0) 8401995 -> /swap
2 backup wm 0 - 38909 298.07GB (38910/0/0) 625089150 -> ENTIRE DRIVE (Leave this alone)
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 38906 - 38907 15.69MB (2/0/0) 32130 -> SDR
6 unassigned wm 38908 - 38909 15.69MB (2/0/0) 32130 -> SDR
7 home wm 5751 - 38905 253.98GB (33155/0/0) 532635075 -> /export/home
8 boot wu 0 - 0 7.84MB (1/0/0) 16065 -> GRUB Stage 1?
9 alternates wu 1 - 2 15.69MB (2/0/0) 32130 -> GRUB Stage 2?
Here is the partition table of DRIVE 1 (320GB) ...

Volume: DRIVE1 Current partition table (original): Total disk cylinders available: 38910 + 2 (reserved cylinders)

Partition Tag Flag Cylinders Size Blocks

0 root wm 526 - 5750 40.03GB (5225/0/0) 83939625 -> / (ROOT)
1 swap wu 3 - 525 4.01GB (523/0/0) 8401995 -> /swap
2 backup wm 0 - 38909 298.07GB (38910/0/0) 625089150 -> ENTIRE DRIVE (Leave this alone)
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 38906 - 38907 15.69MB (2/0/0) 32130 -> SDR
6 unassigned wm 38908 - 38909 15.69MB (2/0/0) 32130 -> SDR
7 home wm 5751 - 38905 253.98GB (33155/0/0) 532635075 -> /export/home
8 boot wu 0 - 0 7.84MB (1/0/0) 16065 -> GRUB Stage 1?
9 alternates wu 1 - 2 15.69MB (2/0/0) 32130 -> GRUB Stage 2?
NOTE: Both drives are setup identical

STEP #2 -> Confirm Boot Order in BIOS = Boot from Disk 0 (On fail, Boot from Disk 1)

(NOTE: Don't worry if your BIOS doesn't support skipping a FAILED drive and auto-booting the next drive in ORDER. Solaris may do this automatically for you. It does for me. Remove one of the drives and boot to test it out on your system.)

STEP #3 -> Specify the master boot program for DRIVE 1

# fdisk -b /usr/lib/fs/ufs/mboot /dev/rdsk/c2d0p0 -> This means make sure the drive is "Active"
STEP #4 -> Make the Secondary Disk Bootable!

# /sbin/installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c2d0s0
STEP #5 -> Create the SDR's

# metadb -a -f c1d0s5 -> Create State Database Replica (slice 5 must be unmounted File system)
# metadb -a -f c1d0s6 -> Create State Database Replica (slice 6 must be unmounted File system)
# metadb -a -f c2d0s5 -> Create State Database Replica (slice 5 must be unmounted File system)
# metadb -a -f c2d0s6 -> Create State Database Replica (slice 6 must be unmounted File system)
# metadb -i -> Check your handy work
NOTE: Yes, I know, you can create SDR's as part of an existing partition because the first portion of the partition is reserved for SDR's BEFORE you actually create the partition and put data on it. If you know how to do this, go for it. In my opinion separate partitions has it's own benefits and I personally recommend it.

STEP #6 -> MIRROR BOOT(/) Partition

At this point we need to understand the nomenclature we are about to use.

d10 = this is the name we are arbitrarily assigning to DRIVE 0 / slice 0 (the root slice or partition) d20 = this is the name we are arbitrarily assigning to DRIVE 1 / slice 0 (the root slice or partition)

NOTE: You can use your own numbering scheme. I set up one that makes sense to me, and allows me to track "what's, what"! NOTE: Nevada supports "friendly names for metadevices". (Ex. Instead of "d0" you can use "Drive-1" or "whatever".)

d0 = this represents the NEW virtual drive we are creating that represents the SVM array for the "ROOT" partition.

Graphically this is what we are creating ...

D0
/ \
D10 D20
Before SVM, you would access partitions on the D10 (or D20) drive directly like this ... /dev/dsk/c1d0s0
After SVM, you will have a 'virtual' drive in place that you will access instead. (Meaning, you won't access the drives directly anymore. Get it?)

NOTE: All references in (/etc/vfstab) will be updated to point to this new drive (d0). When SVM is active, we don't want to communicate with "/dev/dsk/c1d0s0" anymore. We want to communicate with the new VIRTUAL drive "/dev/md/dsk/d0". The metaroot command updates (/etc/vfstab) for us automatically for the "ROOT" partition. For the other partitions we need to edit (/etc/vfstab) manually.

Let's get started ...

# metainit -f d10 1 1 c1d0s0 -> d10: Concat/Stripe is setup (Note: -f= force, 1 = one stripe, 1 = one slice)
# metainit d20 1 1 c2d0s0 -> d20: Concat/Stripe is setup
# metainit d0 -m d10 -> d0: Mirror is setup
# metaroot d0 -> DO THIS ONLY for "root" partition
# metastat d0 -> View current status (View your handy work!)
# reboot -> Need to reboot to effect changes
# metattach d0 d20 -> d0: submirror d20 is attached (and Sync'ing begins magically!)
# metastat d0 -> Check the Sync'ing (See?)
NOTE: Wait for Sync'ing to finish before rebooting, otherwise I think it restarts. You can test it and tell me!

STEP #7 -> MIRROR (/SWAP) Partition

# metainit -f d11 1 1 c1d0s1 -> d11: Concat/Stripe is setup
# metainit d21 1 1 c2d0s1 -> d21: Concat/Stripe is setup
# metainit d1 -m d11 -> d1: Mirror is setup
# vi /etc/vfstab -> (Edit the /etc/vfstab file so that /swap references the mirror)
"/dev/md/dsk/d1 - - swap - no -" -> Add this line to /etc/vfstab and comment out the old line. Remember, no quotes, right?

# reboot
# metattach d1 d21 -> d1: submirror d21 is attached (and Sync'ing begins magically!)
STEP #8 -> MIRROR (/export/home) partition

# umount /dev/dsk/c1d0s7 -> First umount the partition you want to mirror (-f to force)
# metainit d17 1 1 c1d0s7 -> d17: Concat/Stripe is setup
# metainit d27 1 1 c2d0s7 -> d27: Concat/Stripe is setup
# metainit d7 -m d17 -> d7: Mirror is setup
# vi /etc/vfstab -> (Edit the /etc/vfstab file so that /export/home references the mirror)
"/dev/md/dsk/d7 /dev/md/rdsk/d7 /export/home ufs 2 yes -" -> Add this line to /etc/vfstab and comment out the old line. Again, no quotes.

# mount /dev/md/dsk/d7 /export/home -> Remount this partition
# metattach d7 d27 -> d7: submirror d27 is attached (and Sync'ing begins magically!)
STEP #9 -> TIPS

# metastat d0 -> Check Status of "d0" Mirror
# metadb -d -f c1d0s6 -> If there is trouble, you can delete an SDR
EXAMPLE: Failed DRIVE 1 and "Sysadmin intervention" required ...

To Fix the problem temporarily ...

1. Power down
2. Remove Bad Drive 1
3. Boot into single user mode
4. Remove the "bad" SDR's on the 'Failed drive", Drive 1
5. Reboot (And the System should run fine, a little slow)


When you get a replacement drive ...

1. Power down
2. Insert the replacement drive (Same size, or bigger, right?)
3. Boot into multi-user mode
4. Repartition "NEW DRIVE 1" as per specs above
5. Make sure you create the SDR's as well
6. Build and link Mirrors together as per docs above
7. Resync drives as per these 3 commands ...


# metareplace -e d0 c1d0s0 -> d0: device c1d0s0 is enabled (SYNC ONE AT A TIME!)
# metareplace -e d1 c1d0s1 -> d1: device c1d0s1 is enabled (SYNC ONE AT A TIME!)
# metareplace -e d7 c1d0s7 -> d7: device c1d0s7 is enabled (SYNC ONE AT A TIME!)

NOTE: Additional commands that are handy!

# metadetach mirror submirror -> Detach a Mirror
# metaoffline mirror submirror -> Puts Submirror "OFFLINE"
# metaonline mirror submirror -> Puts Submirror "ONLINE"; Resync'ing begins immediately
# newfs /dev/rdsk/c1d0s1 -> newfs a Filesystem

NOTE SPECIAL FILES:

# pico /etc/lvm/mddb.cf -> (DO NO EDIT) records the locations of state database replicas
# pico /etc/lvm/md.cf -> (DO NO EDIT) contains auto generated config info for the default (unspecified or local) disk set
# /kernel/drv/md.conf -> (DO NO EDIT) contains the state database replica config info and is read by SVM at startup
# /etc/lvm/md.tab -> contains SVM config info that can be used to reconstruct your SVM config (Manually)
# metastat -p > /etc/lvm/md.tab -> This file created manually (just a dump to view info; save it!)
# metainit -> This commands can use the md.tab file as input to do their thing!! Like, RECOVER DATA!
# metadb -> This commands can use the md.tab file as input to do their thing!! Like, RECOVER DATA!
# metahs -> This commands can use the md.tab file as input to do their thing!! Like, RECOVER DATA!

No comments:

Post a Comment