19 May 2011

Replacing a Failed SVM Disk

At some point, everyone deal's with a disk failure, as I had to do
recently.  In this case, it was a root disk.  Thankfully, however,
it was mirrored with SVM (Solaris Volume Manager).  Unfortunately, disk
failures aren't the type of thing that should happen too frequently so
it can be easy to overlook steps in the recovery process.  The following
details both my oversight and recovery of the failed root disk, mirrored
with SVM.  Our host details are:
        HOST:           helios
        PROMPT:         helios [0]
        SYSTEM:         Sun Fire V490   
        OS:             Solaris 9
After being alerted to an issue with one of my root disks on helios,
I logged into the host to see the following:
        helios [0] /usr/sbin/metastat
        d11: Mirror
            Submirror 0: d9
              State: Needs maintenance
            Submirror 1: d10
              State: Okay
            Pass: 1
            Read option: roundrobin (default)
            Write option: parallel (default)
            Size: 246686592 blocks (117 GB)

        d9: Submirror of d11
            State: Unavailable
            Size: 246686592 blocks (117 GB)
            Stripe 0:
                Device     Start Block  Dbase        State Reloc Hot Spare
                c1t0d0s6          0     No               -   Yes

        d10: Submirror of d11
            State: Okay
            Size: 246686592 blocks (117 GB)
            Stripe 0:
                Device     Start Block  Dbase        State Reloc Hot Spare
                c1t1d0s6          0     No            Okay   Yes

        <snip...>
        <truncated, mirrors d8 and d5 were in the same state as d11>
        <snip...>

        d2: Mirror
            Submirror 0: d0
              State: Needs maintenance
            Submirror 1: d1
              State: Okay
            Pass: 1
            Read option: roundrobin (default)
            Write option: parallel (default)
            Size: 10501632 blocks (5.0 GB)

        d0: Submirror of d2
            State: Needs maintenance
            Invoke: metareplace d2 c1t0d0s0 <new device>
            Size: 10501632 blocks (5.0 GB)
            Stripe 0:
                Device     Start Block  Dbase        State Reloc Hot Spare
                c1t0d0s0          0     No     Maintenance   Yes

        d1: Submirror of d2
            State: Okay
            Size: 10501632 blocks (5.0 GB)
            Stripe 0:
                Device     Start Block  Dbase        State Reloc Hot Spare
                c1t1d0s0          0     No            Okay   Yes

        Device Relocation Information:
        Device   Reloc  Device ID
        c1t1d0   Yes    id1,ssd@w500000e011c5c0c0
        c1t0d0   Yes    id1,ssd@w500000e011c61500
The output of 'metastat' reports a state of "Needs Maintenance' for d5,
d8, and d11, also listing no state information for one side of each
mirror.  Mirror d2 is also set to "Needs Maintenance" but its components
list state "Maintenance" and "Okay", respectively.  Having seen a drop
off in I/O activity to c1t0d0 in Cacti and verified the situation with
system logs and 'iostat -En', I had a failed disk.  (Sorry, I didn't
retain the output for inclusion here.)  A further check of 'metadb' also
confirms this showing write errors to state database replicas on c1t0d0:
        helios [0] /usr/sbin/metadb -i
                flags           first blk       block count
              Wm  p  l          16              8192            /dev/dsk/c1t0d0s3
              W   p  l          8208            8192            /dev/dsk/c1t0d0s3
              W   p  l          16400           8192            /dev/dsk/c1t0d0s3
              W   p  l          16              8192            /dev/dsk/c1t0d0s4
              W   p  l          8208            8192            /dev/dsk/c1t0d0s4
              W   p  l          16400           8192            /dev/dsk/c1t0d0s4
             a    p  luo        16              8192            /dev/dsk/c1t1d0s3
             a    p  luo        8208            8192            /dev/dsk/c1t1d0s3
             a    p  luo        16400           8192            /dev/dsk/c1t1d0s3
             a    p  luo        16              8192            /dev/dsk/c1t1d0s4
             a    p  luo        8208            8192            /dev/dsk/c1t1d0s4
             a    p  luo        16400           8192            /dev/dsk/c1t1d0s4
         r - replica does not have device relocation information
         o - replica active prior to last mddb configuration change
         u - replica is up to date
         l - locator for this replica was read successfully
         c - replica's location was in /etc/lvm/mddb.cf
         p - replica's location was patched in kernel
         m - replica is master, this is replica selected as input
         W - replica has device write errors
         a - replica is active, commits are occurring to this replica
         M - replica had problem with master blocks
         D - replica had problem with data blocks
         F - replica had format problems
         S - replica is too small to hold current data base
         R - replica had device read errors
Note, 'metadb -i' will include the flag definitions below the state
databases as seen above.  At this point, I retrieved a spare disk in
preparation to swap into the machine.  On to recovery, delete the state
databases on the failed disk with 'metadb -d /dev/dsk/cWtXdYsZ':
        helios [0] /usr/sbin/metadb -d /dev/dsk/c1t0d0s3
        helios [0] /usr/sbin/metadb -d /dev/dsk/c1t0d0s4
        helios [0] /usr/sbin/metadb
                flags           first blk       block count
             a    p  luo        16              8192            /dev/dsk/c1t1d0s3
             a    p  luo        8208            8192            /dev/dsk/c1t1d0s3
             a    p  luo        16400           8192            /dev/dsk/c1t1d0s3
             a    p  luo        16              8192            /dev/dsk/c1t1d0s4
             a    p  luo        8208            8192            /dev/dsk/c1t1d0s4
             a    p  luo        16400           8192            /dev/dsk/c1t1d0s4
The subsequent 'metadb' no longer shows the replicas on the failed disk.
Had I thought a little further through, I could have save myself some
trouble later.  By this, I mean that I should have also removed the
failed (c1t0d0) devices from each mirror before continuing.  Instead,
I skipped this step and continued by hot-swapping in the new disk.
This just means that I will have to remove those devices further below:
        helios [0] /usr/sbin/devfsadm -C
        helios [0] echo | /usr/sbin/format
        Searching for disks...done

        c1t0d0: configured with capacity of 136.71GB

        AVAILABLE DISK SELECTIONS:
               0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
                  /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w2100000c505ec040,0
               1. c1t1d0 <FUJITSU-MAW3147FCSUN146G-1203 cyl 14087 alt 2 hd 24 sec 848>
                  /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e011c5c0c1,0
               2. c4t0d0 <STK-FLEXLINE380-0615 cyl 65533 alt 2 hd 64 sec 169>
                  /pci@8,600000/fibre-channel@2/sd@0,0
               3. c4t0d1 <STK-FLEXLINE380-0615 cyl 65533 alt 2 hd 64 sec 169>
                  /pci@8,600000/fibre-channel@2/sd@0,1
               4. c4t0d2 <STK-FLEXLINE380-0615 cyl 65533 alt 2 hd 64 sec 169>
                  /pci@8,600000/fibre-channel@2/sd@0,2
               5. c4t0d3 <STK-FLEXLINE380-0615 cyl 40958 alt 2 hd 64 sec 64>
                  /pci@8,600000/fibre-channel@2/sd@0,3
               6. c5t0d0 <STK-FLEXLINE380-0615 cyl 65533 alt 2 hd 64 sec 169>
                  /pci@8,600000/fibre-channel@2,1/sd@0,0
               7. c5t0d1 <STK-FLEXLINE380-0615 cyl 65533 alt 2 hd 64 sec 169>
                  /pci@8,600000/fibre-channel@2,1/sd@0,1
               8. c5t0d2 <STK-FLEXLINE380-0615 cyl 65533 alt 2 hd 64 sec 169>
                  /pci@8,600000/fibre-channel@2,1/sd@0,2
               9. c5t0d3 <STK-FLEXLINE380-0615 cyl 40958 alt 2 hd 64 sec 64>
                  /pci@8,600000/fibre-channel@2,1/sd@0,3
        Specify disk (enter its number): Specify disk (enter its number):
        helios [0] /usr/sbin/prtvtoc -h /dev/rdsk/c1t0d0s2
        prtvtoc: /dev/rdsk/c1t0d0s2: Unable to read Disk geometry errno = 0x5
        helios [1]
After the failed disk was swapped out with one of the same size,
'devfsadm -C' was run to clean up any dead device links, create new
ones, etc, followed by format to verify the new disk was seen.  While it
was picked up (disk 0 above), 'prtvtoc' suggests a problem so back to
'format'.  The output below shows a simple label issue, so we check the
drive type and set it to auto confiured (shouldn't have been necessary).
After this, the disk is labeled with a standard SMI label, and further
verified (verify) showing the new partition table:
        helios [1] /usr/sbin/format c1t0d0

        c1t0d0: configured with capacity of 136.71GB 
        selecting c1t0d0 
        [disk formatted]

        FORMAT MENU:
                disk       - select a disk
                type       - select (define) a disk type
                partition  - select (define) a partition table
                current    - describe the current disk
                format     - format and analyze the disk
                repair     - repair a defective sector
                label      - write label to the disk
                analyze    - surface analysis
                defect     - defect list management
                backup     - search for backup labels
                verify     - read and display labels
                save       - save new disk/partition definitions
                inquiry    - show vendor, product and revision
                volname    - set 8-character volume name
                !<cmd>     - execute <cmd>, then return
                quit
        format> verify
        Warning: Could not read primary label.
        Warning: Could not read backup labels.

        Warning: Check the current partitioning and 'label' the disk or use the
                 'backup' command.
        format> type

        AVAILABLE DRIVE TYPES:
                0. Auto configure
                1. Quantum ProDrive 80S
                2. Quantum ProDrive 105S
                3. CDC Wren IV 94171-344
                4. SUN0104
                5. SUN0207
                6. SUN0327
                7. SUN0340
                8. SUN0424
                9. SUN0535
                10. SUN0669
                11. SUN1.0G
                12. SUN1.05
                13. SUN1.3G
                14. SUN2.1G
                15. SUN2.9G
                16. Zip 100
                17. Zip 250
                18. SUN146G
                19. other
        Specify disk type (enter its number)[18]: 0
        c1t0d0: configured with capacity of 136.71GB
        <SUN146G cyl 14087 alt 2 hd 24 sec 848>
        selecting c1t0d0
        [disk formatted]
        Disk not labeled.  Label it now? yes
        format> verify

        Primary label contents:

        Volume name = <        >
        ascii name  = <SUN146G cyl 14087 alt 2 hd 24 sec 848>
        pcyl        = 14089
        ncyl        = 14087
        acyl        =    2
        nhead       =   24
        nsect       =  848
        Part      Tag    Flag     Cylinders         Size            Blocks
          0       root    wm       0 -    12      129.19MB    (13/0/0)       264576
          1       swap    wu      13 -    25      129.19MB    (13/0/0)       264576
          2     backup    wu       0 - 14086      136.71GB    (14087/0/0) 286698624
          3 unassigned    wm       0                0         (0/0/0)             0
          4 unassigned    wm       0                0         (0/0/0)             0
          5 unassigned    wm       0                0         (0/0/0)             0
          6        usr    wm      26 - 14086      136.46GB    (14061/0/0) 286169472
          7 unassigned    wm       0                0         (0/0/0)             0

        format> quit
With the disk labeled, we return to 'prtvtoc' to identify the current
slices on c1t1d0 and verify the slice 2 sizes match between c1t1d0 and
c1t0d0, the replaced disk.  Since they match, using 'prtvtoc' piped to
'fmthard', c1t1d0's VTOC is copied over to c1t0d0 and c1t0d0 is verified
with 'prtvtoc':
        helios [0] /usr/sbin/prtvtoc -h /dev/rdsk/c1t1d0s2
               0      2    00          0  10501632  10501631
               1      3    01   10583040   8323968  18907007
               2      5    00          0 286698624 286698623
               3      0    00   10501632     40704  10542335
               4      0    00   10542336     40704  10583039
               5      7    00   19029120  20982912  40012031
               6      4    00   40012032 246686592 286698623
        helios [0] /usr/sbin/prtvtoc -h /dev/rdsk/c1t0d0s2
               0      2    00          0    264576    264575
               1      3    01     264576    264576    529151
               2      5    01          0 286698624 286698623
               6      4    00     529152 286169472 286698623
        helios [0] /usr/sbin/prtvtoc -h /dev/rdsk/c1t1d0s2 |
        > /usr/sbin/fmthard -s - /dev/rdsk/c1t0d0s2
        fmthard:  New volume table of contents now in place.
        helios [0] /usr/sbin/prtvtoc -h /dev/rdsk/c1t0d0s2
               0      2    00          0  10501632  10501631
               1      3    01   10583040   8323968  18907007
               2      5    00          0 286698624 286698623
               3      0    00   10501632     40704  10542335
               4      0    00   10542336     40704  10583039
               5      7    00   19029120  20982912  40012031
               6      4    00   40012032 246686592 286698623
With c1t0d0 prepared, we add back in the state database replicas via
'metadb -a -c #', verifying with 'metadb' which shows them to be up to
date and active:
        helios [0] /usr/sbin/metadb -a -c 3 /dev/dsk/c1t0d0s3
        helios [0] /usr/sbin/metadb -a -c 3 /dev/dsk/c1t0d0s4
        helios [0] /usr/sbin/metadb
                flags           first blk       block count
             a        u         16              8192            /dev/dsk/c1t0d0s3
             a        u         8208            8192            /dev/dsk/c1t0d0s3
             a        u         16400           8192            /dev/dsk/c1t0d0s3
             a        u         16              8192            /dev/dsk/c1t0d0s4
             a        u         8208            8192            /dev/dsk/c1t0d0s4
             a        u         16400           8192            /dev/dsk/c1t0d0s4
             a    p  luo        16              8192            /dev/dsk/c1t1d0s3
             a    p  luo        8208            8192            /dev/dsk/c1t1d0s3
             a    p  luo        16400           8192            /dev/dsk/c1t1d0s3
             a    p  luo        16              8192            /dev/dsk/c1t1d0s4
             a    p  luo        8208            8192            /dev/dsk/c1t1d0s4
             a    p  luo        16400           8192            /dev/dsk/c1t1d0s4
Now to recover the mirrors.  Starting with mirror d2, we do an in-place
replacement via 'metareplace -e' which automatically begins resyncing
the mirror:
        helios [0] /usr/sbin/metastat | /bin/grep 'metareplace'
            Invoke: metareplace d2 c1t0d0s0 <new device>
        helios [0] /usr/sbin/metareplace -e d2 c1t0d0s0
        d2: device c1t0d0s0 is replaced with c1t0d0s0
        helios [0] /usr/sbin/metastat d2
        d2: Mirror
            Submirror 0: d0
              State: Resyncing
            Submirror 1: d1
              State: Okay 
            Resync in progress: 15 % done
            Pass: 1
            Read option: roundrobin (default)
            Write option: parallel (default)
            Size: 10501632 blocks (5.0 GB)

        d0: Submirror of d2
            State: Resyncing
            Size: 10501632 blocks (5.0 GB)
            Stripe 0:
                Device     Start Block  Dbase        State Reloc Hot Spare
                c1t0d0s0          0     No       Resyncing   Yes

        d1: Submirror of d2
            State: Okay
            Size: 10501632 blocks (5.0 GB)
            Stripe 0:
                Device     Start Block  Dbase        State Reloc Hot Spare
                c1t1d0s0          0     No            Okay   Yes

        Device Relocation Information:
        Device   Reloc  Device ID
        c1t0d0   Yes    id1,ssd@w2000000c505ec040
        c1t1d0   Yes    id1,ssd@w500000e011c5c0c0
Earlier I stated that I could have saved myself some trouble by removing
the broken side of the mirrors.  Since I didn't, I get to fix it here.
The output from 'metastat' still shows a lot of "Needs maintenance" and
"Unavailable", which 'metareplace' won't work on:
        helios [0] /usr/sbin/metastat | /usr/bin/egrep '^d|State:'
        d11: Mirror
              State: Needs maintenance
              State: Okay
        d9: Submirror of d11
            State: Unavailable
        d10: Submirror of d11
            State: Okay
        d8: Mirror
              State: Needs maintenance
              State: Okay
        d6: Submirror of d8
            State: Unavailable
        d7: Submirror of d8
            State: Okay
        d5: Mirror
              State: Needs maintenance
              State: Okay
        d3: Submirror of d5
            State: Unavailable
        d4: Submirror of d5
            State: Okay
        d2: Mirror
              State: Okay
              State: Okay
        d0: Submirror of d2
            State: Okay
        d1: Submirror of d2
            State: Okay
Since the submirrors (plexes) are in an erred state, we'll have to force
detach d9 from d11, d6 from d8, and d3 from d5, and subsequently clear
them from SVM.  These detachments and clears are what I should have
done earlier:
        helios [0] /usr/sbin/metadetach d11 d9
        metadetach: helios: d11: attempt an operation on a submirror that has erred components

        helios [1] /usr/sbin/metadetach -f d11 d9
        d11: submirror d9 is detached
        helios [0] /usr/sbin/metadetach -f d8 d6
        d8: submirror d6 is detached
        helios [0] /usr/sbin/metadetach -f d5 d3
        d5: submirror d3 is detached
        helios [0] /usr/sbin/metaclear d3
        d3: Concat/Stripe is cleared
        helios [0] /usr/sbin/metaclear d6
        d6: Concat/Stripe is cleared
        helios [0] /usr/sbin/metaclear d9
        d9: Concat/Stripe is cleared
With the stale "erred" components removed, we can re-add those same
components (on the new disk) back into their respective mirrors via
'metainit' and 'metattach':
        helios [0] /usr/sbin/metainit d3 1 1 c1t0d0s1
        d3: Concat/Stripe is setup
        helios [0] /usr/sbin/metainit d6 1 1 c1t0d0s5
        d6: Concat/Stripe is setup
        helios [0] /usr/sbin/metainit d9 1 1 c1t0d0s6
        d9: Concat/Stripe is setup 
        helios [0] /usr/bin/metattach d11 d9
        d11: submirror d9 is attached
        helios [0] /usr/bin/metattach d8 d6
        d8: submirror d6 is attached
        helios [0] /usr/bin/metattach d5 d3
        d5: submirror d3 is attached
Running 'metastat' on d9 shows the state is now "Okay" rather than
"Unavailable".  On d11, we see that SVM is resyncing the data from d10
to d9:
        helios [0] /usr/sbin/metastat d9
        d9: Concat/Stripe
            Size: 246686592 blocks (117 GB)
            Stripe 0:
                Device     Start Block  Dbase        State Reloc Hot Spare
                c1t0d0s6          0     No            Okay   Yes

        Device Relocation Information:
        Device   Reloc  Device ID
        c1t0d0   Yes    id1,ssd@w2000000c505ec040
        helios [0] /usr/sbin/metastat d11
        d11: Mirror
            Submirror 0: d9
              State: Resyncing
            Submirror 1: d10
              State: Okay
            Resync in progress: 1 % done
            Pass: 1
            Read option: roundrobin (default)
            Write option: parallel (default)
            Size: 246686592 blocks (117 GB)

        d9: Submirror of d11
            State: Resyncing
            Size: 246686592 blocks (117 GB) 
            Stripe 0: 
                Device     Start Block  Dbase        State Reloc Hot Spare
                c1t0d0s6          0     No            Okay   Yes

        d10: Submirror of d11
            State: Okay
            Size: 246686592 blocks (117 GB)
            Stripe 0:
                Device     Start Block  Dbase        State Reloc Hot Spare
                c1t1d0s6          0     No            Okay   Yes

        Device Relocation Information:
        Device   Reloc  Device ID
        c1t0d0   Yes    id1,ssd@w2000000c505ec040
        c1t1d0   Yes    id1,ssd@w500000e011c5c0c0
Running 'metastat' on each of the other mirrors would show similar for
those mirrors.  Without options, 'metastat' will show all mirrors and
plexes, allowing us to keep track of the resync operations until complete.
As a final step, since the faulted disk was a root disk, we still need
the ability to boot from it.  Here we turn to 'installboot' to install
the boot blocks.
        helios [0] /usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0
        helios [0]
The end result to all of this is that the box is back to a healthy state
with all mirrors functional.  Had I not overlooked a step, the above
could have been little shorter.  Still, it illustrates that in either
case, the situation is still recoverable.