19 October 2010

Breaking and Syncing a Hardware Root Mirror (Solaris)

INTRO

The information detailed below is to assist with proactive planning for
the potential demise and recovery of a host back to the state it was
in prior to demise.  Circumstances this will likely be of benefit would
include the patching of a host, application upgrades by a vendor, trying
new and untested configurations, etc.  The value involved is that should
something go catastrophically wrong after the "updates," the issue can be
immediately resolved simply by rebooting and recovering to an "untainted"
(boot) device wherein the system appears as though no changes were ever
made.  Though specifically for handling of root disks which are mirrored
with hardware RAID, the procedure below could also be tweaked and used
with non-root disks.  Caveat, one should be familiar with Solaris, RAID 1,
Solaris bootup, and OBP before attempting the steps detailed below.  Since
the following is working with the root disk, most of the actual work is
performed within OpenBoot, as there is no way to safely do the following
from within Solaris while the root disk is in use.  Before proceeding, the
following points are of interest or consideration:

        - Before using the Hardware RAID features of the Sun Fire T2000
          server, ensure that the following patches have been applied:

            * 119850-13 mpt and /usr/sbin/raidctl patch
            * 122165-01 LSI1064 PCI-X FCode 1.00.39

        - the sample platform detailed is a Sun Fire T2000, though the
          steps involved should be the same or similar for any SPARC
          based host with a hardware RAID controller

        - OS:                   Solaris 10
        - Kernel Revision:      Generic_141414-01
        - Shell Prompt:         prefect [0]
        - OBP Prompt:           {0} ok
        - Solaris ID'd Disks:   c0t0d0
                                c0t1d0
                * refers to both logical volumes and physical disks
        - HW RAID Ctlr Disks:   0.0.0
                                0.1.0
        - Initial RAID 1 Vol:   c0t0d0
        - Final RAID 1 Vol:     c0t1d0
        - SCSI Ctlr 0 Path:     /pci@780/pci@0/pci@9/scsi@0
        - Root FS:              /       (s0)
        - Commands Used:
                -> Solaris:
                        +  /usr/sbin/raidctl
                        +  /usr/bin/df
                        +  /usr/sbin/init
                        +  /usr/sbin/format
                        +  /usr/sbin/mount
                        +  /usr/sbin/fsck
                        +  /usr/bin/vi
                        +  /usr/bin/cat
                        +  /usr/sbin/umount
                        +  /usr/bin/touch
                        +  /usr/sbin/devfsadm
                        +  /usr/bin/cp
                        +  /usr/bin/sed
                        +  /usr/sbin/eeprom
                        +  /usr/sbin/dumpadm
                        +  echo
                -> OBP:
                        +  setenv
                        +  reset-all
                        +  probe-scsi-all
                        +  show-disks
                        +  select
                        +  show-volumes
                        +  delete-volume
                        +  unselect-dev
                        +  devalias
                        +  boot
                        +  create-im-volume

DETAILS

Before breaking our mirror, we need to determine the current setup of
our root device:

        prefect [0] /usr/bin/df -h /
        Filesystem             size   used  avail capacity  Mounted on
        /dev/dsk/c0t0d0s0       11G   4.1G   6.7G    38%    /
        prefect [0] /usr/sbin/raidctl -l
        Controller: 0
                Volume:c0t0d0
                Disk: 0.0.0
                Disk: 0.1.0
        prefect [0] /usr/sbin/raidctl -l c0t0d0
        Volume                  Size    Stripe  Status   Cache  RAID
                Sub                     Size                    Level
                        Disk
        ----------------------------------------------------------------
        c0t0d0                  68.3G   N/A     OPTIMAL  OFF    RAID1
                        0.0.0   68.3G           GOOD
                        0.1.0   68.3G           GOOD

        prefect [0] /usr/sbin/format
        Searching for disks...done


        AVAILABLE DISK SELECTIONS:
               0. c0t0d0 [LSILOGIC-LogicalVolume-3000 cyl 65533 alt 2 hd 16 sec 136]
                  /pci@780/pci@0/pci@9/scsi@0/sd@0,0
        Specify disk (enter its number): ^D

In the above, we've identified / as residing on c0t0d0s0, which is
currently a RAID 1 volume, comprised of physical disks 0.0.0 and 0.1.0.
Verified by format, the only device Solaris sees is the logical RAID
1 volume, presented by the RAID controller.  At this point, we need to
bring down the host to the OBP prompt:

        prefect [1] /usr/sbin/init 0
        {b} ok setenv fcode-debug? true
        fcode-debug? =          true
        {0} ok setenv auto-boot? false
        auto-boot? =            false
        {0} ok reset-all

With the box down, we enable 'fcode-debug?' so we can muck with the
mirror from the OBP.  Disabling 'auto-boot?' is to prevent the box from
attempting OS bootups before we are ready.  The 'reset-all' is to ensure
the new settings are used as well as we cycle back through POST.  Once back
to OBP, we've validated the disks available with 'probe-scsi-all' and
select the base device, '/pci@780/pci@0/pci@9/scsi@0' (previously seen in
the output of 'format'):

        {0} ok probe-scsi-all
        /pci@780/pci@0/pci@9/scsi@0

        MPT Version 1.05, Firmware Version 1.09.00.00

        Target 0 Volume 0 
        Unit 0   Disk     LSILOGICLogical Volume  3000    143243264 Blocks, 73 GB

        {0} ok show-disks
        a) /pci@7c0/pci@0/pci@1/pci@0/ide@8/cdrom
        b) /pci@7c0/pci@0/pci@1/pci@0/ide@8/disk
        c) /pci@780/pci@0/pci@9/scsi@0/disk
        q) NO SELECTION
        Enter Selection, q to quit: c
        /pci@780/pci@0/pci@9/scsi@0/disk has been selected.
        Type ^Y ( Control-Y ) to insert it in the command line.
        e.g. ok nvalias mydev ^Y
                 for creating devalias mydev for /pci@780/pci@0/pci@9/scsi@0/disk
        {0} ok select /pci@780/pci@0/pci@9/scsi@0

With our volume's base SCSI device selected, 'show-volumes' will display
our current volume, so that it can be deleted:

        {0} ok show-volumes
        Volume 0 Target 0  Type IM (Integrated Mirroring)
          Optimal  Enabled
          2 Members                                         143243264 Blocks, 73 GB
          Disk 1
            Primary  Online
            Target 4        FUJITSU MAY2073RCSUN72G 0501
          Disk 0
            Secondary  Online
            Target 1        SEAGATE ST973401LSUN72G 0556
        {0} ok 0 delete-volume
        The volume and its data will be deleted
        Are you sure (yes/no)?  [no] yes
        Volume 0 has been deleted

In the above command, '0 delete-volume', 0 is specifically 'Volume 0'.
You must answer yes to the question to continue.

    * NOTE, only volumes setup as RAID 1 can be handled in this manner
      via the HW RAID controller, as it simply splits the mirrors from
      the logical volume, leaving the data in place.  Performing a
      'delete-volume' with other RAID levels will destroy the volume
      and any contained data.

Verify the volume was removed and reset the system so the two original
physical devices are now visible:

        {0} ok show-volumes
        No volumes to show
        {0} ok unselect-dev
        {0} ok reset-all
        [snip...]
        {0} ok probe-scsi-all
        /pci@780/pci@0/pci@9/scsi@0

        MPT Version 1.05, Firmware Version 1.09.00.00

        Target 0
        Unit 0   Disk     FUJITSU MAY2073RCSUN72G 0501    143374738 Blocks, 73 GB
          SASAddress 500000e01361c882  PhyNum 0
        Target 1
        Unit 0   Disk     SEAGATE ST973401LSUN72G 0556    143374738 Blocks, 73 GB
          SASAddress 5000c500021551cd  PhyNum 1

Verify that aliases are setup for our devices, wherein physical disk
(PhyNum) 0 is 'disk0' and physical disk (PhyNum) 1 is 'disk1'.  Perform a
reconfiguration boot of the system to 'single user' on disk 0:

        {0} ok devalias
        ttya                     /pci@7c0/pci@0/pci@1/pci@0/isa@2/serial@0,3f8
        nvram                    /virtual-devices/nvram@3
        net3                     /pci@7c0/pci@0/pci@2/network@0,1
        net2                     /pci@7c0/pci@0/pci@2/network@0
        net1                     /pci@780/pci@0/pci@1/network@0,1
        net0                     /pci@780/pci@0/pci@1/network@0
        net                      /pci@780/pci@0/pci@1/network@0
        ide                      /pci@7c0/pci@0/pci@1/pci@0/ide@8
        cdrom                    /pci@7c0/pci@0/pci@1/pci@0/ide@8/cdrom@0,0:f
        disk3                    /pci@780/pci@0/pci@9/scsi@0/disk@3
        disk2                    /pci@780/pci@0/pci@9/scsi@0/disk@2
        disk1                    /pci@780/pci@0/pci@9/scsi@0/disk@1
        disk0                    /pci@780/pci@0/pci@9/scsi@0/disk@0
        disk                     /pci@780/pci@0/pci@9/scsi@0/disk@0
        scsi                     /pci@780/pci@0/pci@9/scsi@0
        virtual-console          /virtual-devices/console@1
        name                     aliases
        {0} ok printenv boot-device
        boot-device =           disk net
        {0} ok boot disk -rsmverbose
        Boot device: /pci@780/pci@0/pci@9/scsi@0/disk@0  File and args: -rsmverbose
        ufs-file-system
        Loading: /platform/SUNW,Sun-Fire-T200/boot_archive
        [snip...]
        [ milestone/single-user:default starting (single-user milestone) ]
        Requesting System Maintenance Mode
        SINGLE USER MODE

        Root password for system maintenance (control-d to bypass):
        single-user privilege assigned to /dev/console.
        Entering System Maintenance Mode

        Oct 15 12:16:21 su: 'su root' succeeded for root on /dev/console
        Sun Microsystems Inc.   SunOS 5.10      Generic January 2005

Ensure that we can now see both disks from within Solaris and fsck the
filesystems on disk1 (the mirror that we are not booted from):

        prefect [0] /usr/sbin/mount -a
        mount: /tmp is already mounted or swap is busy
        prefect [0] /usr/bin/df -h
        Filesystem             size   used  avail capacity  Mounted on
        /dev/dsk/c0t0d0s0       11G   4.1G   6.7G    38%    /
        /devices                 0K     0K     0K     0%    /devices
        ctfs                     0K     0K     0K     0%    /system/contract
        proc                     0K     0K     0K     0%    /proc
        mnttab                   0K     0K     0K     0%    /etc/mnttab
        swap                    14G   1.4M    14G     1%    /etc/svc/volatile
        objfs                    0K     0K     0K     0%    /system/object
        sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
        /platform/SUNW,Sun-Fire-T200/lib/libc_psr/libc_psr_hwcap1.so.1
                                11G   4.1G   6.7G    38%    /platform/sun4v/lib/libc_psr.so.1
        /platform/SUNW,Sun-Fire-T200/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
                                11G   4.1G   6.7G    38%    /platform/sun4v/lib/sparcv9/libc_psr.so.1
        fd                       0K     0K     0K     0%    /dev/fd
        /dev/dsk/c0t0d0s3      5.9G   954M   4.9G    16%    /var
        swap                    14G     0K    14G     0%    /tmp
        swap                    14G     0K    14G     0%    /var/run
        /dev/dsk/c0t0d0s4       42G    43M    42G     1%    /space
        prefect [0] /usr/sbin/format
        Searching for disks...done


        AVAILABLE DISK SELECTIONS:
               0. c0t0d0 [LSILOGIC-LogicalVolume-3000 cyl 65533 alt 2 hd 16 sec 136]
                  /pci@780/pci@0/pci@9/scsi@0/sd@0,0
               1. c0t1d0 [LSILOGIC-LogicalVolume-3000 cyl 65533 alt 2 hd 16 sec 136]
                  /pci@780/pci@0/pci@9/scsi@0/sd@1,0
        Specify disk (enter its number): ^D
        prefect [1] for i in 0 3 4 ; do /usr/sbin/fsck -y /dev/rdsk/c0t1d0s${i}; done
        ** /dev/rdsk/c0t1d0s0
        ** Last Mounted on /
        ** Phase 1 - Check Blocks and Sizes
        ** Phase 2 - Check Pathnames
        ** Phase 3a - Check Connectivity
        ** Phase 3b - Verify Shadows/ACLs
        ** Phase 4 - Check Reference Counts
        ** Phase 5 - Check Cylinder Groups
        151110 files, 4236458 used, 7112386 free (6194 frags, 888274 blocks, 0.1% fragmentation)
        ** /dev/rdsk/c0t1d0s3
        ** Last Mounted on /var
        ** Phase 1 - Check Blocks and Sizes
        ** Phase 2 - Check Pathnames
        ** Phase 3a - Check Connectivity
        ** Phase 3b - Verify Shadows/ACLs
        ** Phase 4 - Check Reference Counts
        ** Phase 5 - Check Cylinder Groups
        21170 files, 970681 used, 5219373 free (1213 frags, 652270 blocks, 0.0% fragmentation)
        ** /dev/rdsk/c0t1d0s4
        ** Last Mounted on /space
        ** Phase 1 - Check Blocks and Sizes
        ** Phase 2 - Check Pathnames
        ** Phase 3a - Check Connectivity
        ** Phase 3b - Verify Shadows/ACLs
        ** Phase 4 - Check Reference Counts
        ** Phase 5 - Check Cylinder Groups
        2 files, 9 used, 44356384 free (8 frags, 5544547 blocks, 0.0% fragmentation)

Assuming that the slices come back clean, which they should, we need
to mount up / on disk1, set up a reconfiguration boot, and clean up the
device tree.  The 'devfsadm' command is set to specifically run against
/mnt, where disk1's / is mounted to.  Parameters '-Cv' tell devfsadm
to clean up stale devices, add those newly found, and be verbose about
what it is doing:

        prefect [0] /usr/sbin/mount /dev/dsk/c0t1d0s0 /mnt
        prefect [0] /usr/sbin/mount /dev/dsk/c0t1d0s3 /mnt/var
        prefect [0] /usr/bin/touch /mnt/reconfigure
        prefect [0] /usr/sbin/devfsadm -r /mnt -Cv
        devfsadm[181]: verbose: no devfs node or mismatched dev_t for /mnt/devices/scsi_vhci:devctl

Since on our next reboot we will be booting off of disk1, disk1's vfstab
needs to be updated from the original copy.  The original copy is set
to mount filesystems based upon the usage of the logical volume c0t0d0.
This needs to be updated relevant to disk1's slicing, thus c0t1d0:

        prefect [0] /usr/bin/cp /mnt/etc/vfstab /mnt/etc/vfstab.orig
        prefect [0] /usr/bin/sed -e 's;c0t0d0s;c0t1d0s;g' /mnt/etc/vfstab.orig > /mnt/etc/vfstab

Under the premise of performing updates, be it patching, creating new
files or configs, etc, file /mnt/willitstay is created, though any of
the mentioned actions could otherwise be performed, instead.  For our
purposes, willitstay is being used for illustrative purposes since it
does not exist on disk0, though will soon exist on disk1:

        prefect [0] echo "wonder if this will stay" >> /mnt/willitstay
        prefect [0] /usr/bin/cat /mnt/willitstay
        wonder if this will stay

Unmount disk1 and fsck the boot slice (s0).  Once done, do a
reconfiguration boot to 'single user' using disk1.

    * NOTE, as long as a reconfiguration boot using disk1 is performed,
      the host could otherwise be booted to 'multi-user' and brought
      up normally to allow the changes made to disk1 to be tested,
      used, etc.  For illustration purposes, the following details a
      boot using disk1 to 'single user':

        prefect [0] /usr/sbin/umount /mnt/var
        prefect [0] /usr/sbin/umount /mnt
        prefect [0] /usr/sbin/fsck -y /dev/rdsk/c0t1d0s0
        ** /dev/rdsk/c0t1d0s0
        ** Last Mounted on /mnt
        ** Phase 1 - Check Blocks and Sizes
        ** Phase 2 - Check Pathnames
        ** Phase 3a - Check Connectivity
        ** Phase 3b - Verify Shadows/ACLs
        ** Phase 4 - Check Reference Counts
        ** Phase 5 - Check Cylinder Groups
        151126 files, 4236474 used, 7112370 free (6178 frags, 888274 blocks, 0.1% fragmentation)
        prefect [0] reboot -- 'disk1 -rsmverbose'
        syncing file systems... done
        rebooting...
        [snip...]
        Boot device: /pci@780/pci@0/pci@9/scsi@0/disk@1  File and args: -rsmverbose
        ufs-file-system
        Loading: /platform/SUNW,Sun-Fire-T200/boot_archive
        [snip...]
        [ milestone/single-user:default starting (single-user milestone) ]
        Requesting System Maintenance Mode
        SINGLE USER MODE

        Root password for system maintenance (control-d to bypass):
        single-user privilege assigned to /dev/console.
        Entering System Maintenance Mode

        Oct 15 14:16:44 su: 'su root' succeeded for root on /dev/console
        Sun Microsystems Inc. SunOS 5.10      Generic January 2005

The following is simply validation of the changes that were made to
disk1 prior to booting off of it:

        prefect [0] /usr/bin/cat /willitstay
        wonder if this will stay
        prefect [0] /usr/sbin/mount -a
        mount: /tmp is already mounted or swap is busy
        prefect [1] /usr/bin/df -h
        Filesystem             size   used  avail capacity  Mounted on
        /dev/dsk/c0t1d0s0       11G   4.1G   6.7G    38%    /
        /devices                 0K     0K     0K     0%    /devices
        ctfs                     0K     0K     0K     0%    /system/contract
        proc                     0K     0K     0K     0%    /proc
        mnttab                   0K     0K     0K     0%    /etc/mnttab
        swap                    14G   1.4M    14G     1%    /etc/svc/volatile
        objfs                    0K     0K     0K     0%    /system/object
        sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
        /platform/SUNW,Sun-Fire-T200/lib/libc_psr/libc_psr_hwcap1.so.1
                                11G   4.1G   6.7G    38%    /platform/sun4v/lib/libc_psr.so.1
        /platform/SUNW,Sun-Fire-T200/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
                                11G   4.1G   6.7G    38%    /platform/sun4v/lib/sparcv9/libc_psr.so.1
        fd                       0K     0K     0K     0%    /dev/fd
        /dev/dsk/c0t1d0s3      5.9G   954M   4.9G    16%    /var
        swap                    14G     0K    14G     0%    /tmp
        swap                    14G     0K    14G     0%    /var/run
        /dev/dsk/c0t1d0s4       42G    43M    42G     1%    /space
        prefect [0] /usr/sbin/raidctl -l
        Controller: 0
                Disk: 0.0.0
                Disk: 0.1.0
        prefect [0] /usr/sbin/raidctl -l -g 0.0.0 0
        Disk    Vendor  Product         Firmware        Capacity        Status  HSP
        ----------------------------------------------------------------------------
        0.0.0   FUJITSU MAY2073RCSUN72G 0501            68.3G           GOOD    N/A
        GUID:500000e01361c880
        prefect [0] /usr/sbin/raidctl -l -g 0.1.0 0
        Disk    Vendor  Product         Firmware        Capacity        Status  HSP
        ----------------------------------------------------------------------------
        0.1.0   SEAGATE ST973401LSUN72G 0556            68.3G           GOOD    N/A
        prefect [0] /usr/sbin/raidctl -l 0
        Controller      Type            Version
        ----------------------------------------------------------------
        c0              LSI_1064EE      1.09.00.00

As the changes to disk1 have been tested and validated, the system needs
to be setup to perform a reconfiguration boot at next bootup.  Once the
host is down and system reset, the new logical volume, c0t1d0, will be
created, using disk1 as the primary mirror and syncing disk0 to disk1:

        prefect [0] /usr/bin/touch /reconfigure
        prefect [0] /usr/sbin/init 0
        prefect [0] svc.startd: The system is coming down.  Please wait.
        svc.startd: 55 system services are now being stopped.
        Oct 15 14:33:42 prefect syslogd: going down on signal 15
        svc.startd: The system is down.
        syncing file systems... done
        Program terminated
        {0} ok reset-all
        [snip...]
        Sun Fire T200, No Keyboard
        Copyright 2009 Sun Microsystems, Inc.  All rights reserved.
        OpenBoot 4.30.3, 8064 MB memory available, Serial #75372526.
        Ethernet address 0:14:4f:7e:17:ee, Host ID: 847e17ee.

Below, the scsi devices are identified and selected and we verify there
are no current volumes:

        {0} ok probe-scsi-all
        /pci@780/pci@0/pci@9/scsi@0

        MPT Version 1.05, Firmware Version 1.09.00.00

        Target 0
        Unit 0   Disk     FUJITSU MAY2073RCSUN72G 0501    143374738 Blocks, 73 GB
          SASAddress 500000e01361c882  PhyNum 0
        Target 1
        Unit 0   Disk     SEAGATE ST973401LSUN72G 0556    143374738 Blocks, 73 GB
          SASAddress 5000c500021551cd  PhyNum 1

        {0} ok show-disks
        a) /pci@7c0/pci@0/pci@1/pci@0/ide@8/cdrom
        b) /pci@7c0/pci@0/pci@1/pci@0/ide@8/disk
        c) /pci@780/pci@0/pci@9/scsi@0/disk
        q) NO SELECTION
        Enter Selection, q to quit: c
        /pci@780/pci@0/pci@9/scsi@0/disk has been selected.
        Type ^Y ( Control-Y ) to insert it in the command line.
        e.g. ok nvalias mydev ^Y
                 for creating devalias mydev for /pci@780/pci@0/pci@9/scsi@0/disk
        {0} ok select /pci@780/pci@0/pci@9/scsi@0
        {0} ok show-volumes
        No volumes to show

To setup our mirror, the disks need to be specified in the order of
primary then secondary.  Not setting the mirror up in this manner
will destroy the data on the primary, overwriting it with data from
the secondary.  As we've already modified and intend to use the data on
disk1, our primary disk is disk1.  The parameters to create-im-volume, the
mirrored volume creation command, are '1 0', thus disk1 followed by disk0.

    * NOTE, the cXtYdZ notation of the resulting logical volume is based
      upon the values of the primary physical disk.  As seen above,
      the probe-scsi-all reveals that controller 0, target 1, unit 0, is
      disk1.  (This could be further verified with a review of 'devalias'
      output.) Given the above, the new logical volume will be c0t1d0:

        {0} ok 1 0 create-im-volume
        Target 1 size is 143243264 Blocks, 73 GB
        Target 0 size is 143243264 Blocks, 73 GB
        The volume can be any size from 1 MB to 69943 MB

    * NOTE, when prompted for size, it seems that accepting the default
      does not work, and instead must be typed in (even if it is the
      same value):

        What size do you want?  [69943] 69943
        Volume size will be 143243264 Blocks, 73 GB
        PhysDisk 0 has been created for target 1
        PhysDisk 1 has been created for target 0
        Volume has been created

A quick check of our new mirrored volume show that it is still syncing.
(For this particular box, from the time the volume was created until
it reached an 'OPTIMAL' state (in multi-user mode), the time lapse
was about 30 minutes.)  At this point, unselect the current device,
and reboot the host to disk1 with a reconfiguration boot:

        {0} ok show-volumes
        Volume 0 Target 1  Type IM (Integrated Mirroring)
          Degraded  Enabled  Resync In Progress
          2 Members                                         143243264 Blocks, 73 GB
          Disk 0
            Primary  Online
            Target 4        SEAGATE ST973401LSUN72G 0556 
          Disk 1
            Secondary  Online  Out Of Sync
            Target 0        FUJITSU MAY2073RCSUN72G 0501 
        {0} ok unselect-dev
        {0} ok reset-all
        [snip...]
        {0} ok boot disk1 -rmverbose
        [snip...]
        Boot device: /pci@780/pci@0/pci@9/scsi@0/disk@1  File and args: -rmverbose
        ufs-file-system
        Loading: /platform/SUNW,Sun-Fire-T200/boot_archive
        Loading: /platform/sun4v/boot_archive
        ramdisk-root hsfs-file-system
        Loading: /platform/SUNW,Sun-Fire-T200/kernel/sparcv9/unix
        [snip...]
        [ network/ssh:default starting (SSH server) ]
        [ application/management/sma:default starting (net-snmp SNMP daemon) ]

        prefect console login: [ milestone/multi-user:default starting (multi-user milestone) ]
        prefect console login: root
        Password:
        Oct 15 15:22:00 prefect login: ROOT LOGIN /dev/console
        Last login: Thu Oct 15 14:24:22 on console
        Sun Microsystems Inc.   SunOS 5.10      Generic January 2005

Once the box is back up to multi-user, a quick check shows that we
are booted off of disk1 and that our updates to the system still held
(/willitstay).  A further look, however, shows that we are actually booted
off of the logical volume c0t1d0 and that the volume is still syncing
(as stated earlier, SYNC state for about 30 minutes, till OPTIMAL.
The first disk in the 'raidctl' volume output is the primary.)

        prefect [0] /usr/bin/df -h
        Filesystem             size   used  avail capacity  Mounted on
        /dev/dsk/c0t1d0s0       11G   4.1G   6.7G    38%    /
        /devices                 0K     0K     0K     0%    /devices
        ctfs                     0K     0K     0K     0%    /system/contract
        proc                     0K     0K     0K     0%    /proc
        mnttab                   0K     0K     0K     0%    /etc/mnttab
        swap                    14G   1.5M    14G     1%    /etc/svc/volatile
        objfs                    0K     0K     0K     0%    /system/object
        sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
        /platform/SUNW,Sun-Fire-T200/lib/libc_psr/libc_psr_hwcap1.so.1
                                11G   4.1G   6.7G    38%    /platform/sun4v/lib/libc_psr.so.1
        /platform/SUNW,Sun-Fire-T200/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
                                11G   4.1G   6.7G    38%    /platform/sun4v/lib/sparcv9/libc_psr.so.1
        fd                       0K     0K     0K     0%    /dev/fd
        /dev/dsk/c0t1d0s3      5.9G   954M   4.9G    16%    /var
        swap                    14G     0K    14G     0%    /tmp
        swap                    14G    16K    14G     1%    /var/run
        /dev/dsk/c0t1d0s4       42G    43M    42G     1%    /space
        prefect [0] /usr/bin/cat /willitstay
        wonder if this will stay
        prefect [0] /usr/sbin/raidctl -l
        Controller: 0
                Volume:c0t1d0
                Disk: 0.0.0
                Disk: 0.1.0
        prefect [0] /usr/sbin/raidctl -l c0t1d0
        Volume                  Size    Stripe  Status   Cache  RAID
                Sub                     Size                    Level
                        Disk
        ----------------------------------------------------------------
        c0t1d0                  68.3G   N/A     SYNC     OFF    RAID1
                        0.1.0   68.3G           GOOD
                        0.0.0   68.3G           GOOD

At this point, the system has been remirrored after testing of the
updates made, and brought back online.  As a final thought, a few other
updates still need to be made to the system towards making this a hassle
free solution:

        prefect [0] /usr/sbin/eeprom fcode-debug?=false
        prefect [0] /usr/sbin/eeprom auto-boot?=true
        prefect [0] /usr/sbin/eeprom boot-device="disk1 net"
        prefect [0] /usr/sbin/dumpadm -d /dev/dsk/c0t1d0s1


see also:
    Breaking and Syncing an SVM Root Mirror

24 comments:

Anonymous said...

Thank you very much for your detailed explanation.

Just one thing, if I have it raid 1 like you and I just want to break the mirror and later to use SVM to mirror between the first and second disk. do I have to do all these steps?

because I don't have any test env. and I have to do directly in the production env. and I'm afraid to lose any data.

troy said...

Anonymous,

You are very welcome. I'm glad the post is useful. Before answering your quesion, I would highly recommend having a backup of your data before proceeding. While I've used these same steps numerous times before, problems can still occur and you are mucking with your root disk(s). Remember, your data is still your risk.

Now to answer your question, if you only want to break the mirror and stop using the hardware RAID, then you would essentially only need to do all of the steps up to the first reboot to single user. Once you have have run '0 delete-volume' and 'reset-all', your hardware mirror no longer exists and you are free to use either "disk 0" or "disk 1" for your root disk and boot to multiuser. (Don't forget to do a reconfiguration boot. Also, you would still need to update the ctds values accordingly in your vfstab on the secondary disk if you plan to boot to it.) I think I would still boot to single user though, if only to 'fsck' both disks' slices. After that, if you want to setup SVM to handle the mirroring, you would simply need to pick up at "re-initialize SVM control" in the post on "Breaking and Syncing a SVM Root Mirror".

Let me know if you have any additional questions.

--troy

Michele Vecchiato said...

Troy, thanks for posting this "precious" item! It 'just what I wanted! I also tried the portal My Oracle Support (MOS), but I have not found anything that was right for me, until I found your post! Thank you!

Bye
Michele

troy said...

Michele,

I'm glad the post is useful. I remember looking for this same info on (then) sunsolve and finding bits and pieces but nothing completely detailed. Interestingly, I know of a few Sun / Oracle FEs that have referred to this post as well. Anyway, thanks for the comment and I'm happy the post was useful.

--troy

Ramdev said...

Hi Troy, Excellent work. Thanks for sharing this to everyone.

troy said...

Ramdev,

You're welcome. I'm happy it was useful.

--troy

Al said...

Very useful article, thanks very much.

I need to carry out this exact procedure on a server, but its not sparc, its a Sun Blade, X6220.

Have you any experience of doing such a thing ?

Thanks
Al

troy said...

Al,

I personally do not have any experience doing this on an X6220 blade, however, after some quick research, I may have a solution for you. I'm assuming you have an LSI controller, based on various docs related to the X6220. By way of a chain of pdfs starting with the X6220 maint. guide, I came to the "Sun LSI 106x RAID User's Guide" (http://docs.oracle.com/cd/E19591-01/820-4933/820-4933.pdf). Specifically, on pages 19, 20, it details the setup of a RAID 1 config from the BIOS, noting that the first disk selected you can optionally choose not to overwrite its data. While the doc doesn't detail array deletion from the BIOS config, providing the array in question is a simple RAID 1 array, your data shouldn't be otherwise munged or lost. At that point, it would be you choice of how to further proceed, with "disk 1" or "disk 2". Though I haven't done this for an X6220 blade, I have done similar on other x86 hardware and haven't yet seen an issue. With that in mind, if available, I would still try it out on a test blade if you can.

Hope that helps, and honestly, I wouldn't mind knowing how this works out for you.

--troy

Tan said...

Hi Troy,

Really Great post. Thanks.

I have the exact same setup and I need to do a major upgrade in this production environment and am quite concerned about it. Also, I cannot afford to go thru the above procedure as it is a very critical environment and do not want to break the existing setup.

Fortunately, I managed to get a similar disk for 73G and planning to insert it in the server. And then use SVM to mirror the current volume, then unmirror it, boot from either volumes and then proceed with the upgrade. Do you know if I can use SVM to mirror a volume under raidctl?

Would appreciate your feedback.

troy said...

Tan,

I don't see why not, I don't know of anything that would prohibit you from creating an SVM mirror using a LUN under hardware RAID control. At the same time, if the disks under hardware RAID control are simply mirrored, I've never personally seen an instance where breaking a mirror has caused loss of data, you just end up with two copies, one for each side of the mirror. The only point of note in that situation would be the vfstab and boot devices would need to be updated to account for the possible change in CTDs but you'll need to do that anyway if you layer SVM over the top of the hardware volume. Hope that helps. Let me know how things turn out.

--troy

Tan said...

Hi Troy,

Thanks for the clarification. Since I have never previously "played" with raidctl, I am planning to have the additional disk setup on my DEv server, create a 3-way mirror of the boot disk with SVM (currently already mirrored), then break all 3 mirrors, boot from disk #3 , create a logical volume using disk 1 & 2 , boot from it then try your abovementionned procedure just to feel comfortable with it before doing same on Prod server.

BTW, can you tell me what does the '-rsmverbose' argument used for. I have not been able to find any documentation about it.

Thanks again.

troy said...

Tan,

Not a problem at all. Good idea with testing in dev first, whenever I'm unfamiliar with a procedure, even if I have no reason to doubt it, I try to test it first. Hope all goes well for you.

Regarding '-rsmverbose', prior to Solaris 10, the option would have been '-rsv'. When Solaris 10 came into being, Sun added the SMF framework so '-v' was replaced with '-m verbose' for verbose boot time output of Solaris. The '-r' tells Solaris to perform a reconfiguration boot to identify the "new" disks / devices and populate them into the /dev tree. Option '-s' tells Solaris to boot into single user mode. I simply ran all three together to become '-rsmverbose', which Solaris will recognize appropriately. Let me know if you if I can be of any further assistance and how your upgrade goes.

--troy

Tan said...

Hi Troy,

I am ready to setup mirroring at the hardware level with raidctl. I have managed to boot both from my 3rd disk (disk2) as well as from the SVM mirrored volume (disk 0 &1).

My setup now is as follows:
CMD >df -h /
Filesystem size used avail capacity Mounted on
/dev/dsk/c1t2d0s0 59G 32G 27G 54% /

CMD >format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0

1. c1t1d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@1,0

2. c1t2d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@2,0

So to setup the hw raid, WHILE PRESERVING THE DATA ON DISK0, should I just issue the following command from the OK prompt:
{0} ok 0 1 create-im-volume

Or should I boot from disk2 and issue the followng command:

raidctl -c c1t0d0 c1t1d0.

Please note that my intend is to still have the OS images present on disk0 and disk1 to be available on the new logical volume that will be created by the raidctl.

Thanks

troy said...

Tan,

A couple of things. If you are booted off of disk2 (c1t2) and you are looking to hardware mirror disk0 and disk1 together, then you will not want disk0 and disk1 mirrored together via SVM. The hardware mirror, assuming disk0 is the disk of data to be mirrored, will present then c1t0d0 to be used by Solaris. This means SVM would lose a disk in its current configuration. If my understanding of your existing configuration is accurate, break your SVM mirror first before hardware mirroring disk0 and disk1.

As for your question, you are currently booted from a root mirror via disk2. From here, yes, you could use:

raidctl -c c1t0d0 c1t1d0

to create your hardware mirror. The alternative is to reboot the host and work from OBP as you've shown:

{0} ok 0 1 create-im-volume

In either case, the data on disk0 is preserved and replicated / mirrored to disk1. In both cases, the resulting disk should present as c1t0d0. Hope that helps.

--troy

Tan said...

Hi,

Had all 3 disks out of SVM and managed to boot from each one of them in turn. I then proceeded to create the hw raid after booting from disk2. Here are my outputs

CMD >raidctl -c c1t0d0 c1t1d0
Creating RAID volume will destroy all data on spare space of member disks, proceed (yes/no)? yes
Volume c1t0d0 is created successfully!

CMD >format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0
1. c1t1d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@1,0
2. c1t2d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@2,0
Specify disk (enter its number):

CMD >raidctl -l
Controller: 1
Volume:c1t0d0
Disk: 0.0.0
Disk: 0.1.0
Disk: 0.2.0

CMD >raidctl -l -g 0.0.0 0
Controller device can not be found.

CMD >raidctl -l -g 0.0.0 1
Disk Vendor Product Firmware Capacity Status HSP
-----------------------------------
0.0.0 FUJITSU MAY2073RCSUN72G 0401 68.3G GOOD N/A

CMD >raidctl -l -g 0.1.0 1
Disk Vendor Product Firmware Capacity Status HSP
-----------------------------------0.1.0 FUJITSU MAY2073RCSUN72G 0701 68.3G GOOD N/A

CMD >raidctl -l -g 0.2.0 1
Disk Vendor Product Firmware Capacity Status HSP
-----------------------------------0.2.0 FUJITSU MAY2073RCSUN72G 0501 68.3G GOOD N/A
GUID:500000e018f50290

When I tried to boot from disk0, the following messages are displayed:

os-io WARNING: /pci@7c0/pci@0/pc
SC Alert: Host System has Reset
i@1/pci@0,2/LSILogic,sas@2/sd@0,0 (sd1):
Corrupt label - bad geometry

Label says 143359488 blocks; Drive says 143243264 blocks
WARNING: /pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0 (sd1):
Corrupt label - bad geometry

Label says 143359488 blocks; Drive says 143243264 blocks
WARNING: /pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0 (sd1):
Corrupt label - bad geometry

Label says 143359488 blocks; Drive says 143243264 blocks
Cannot mount root on /pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/disk@0,0:a fstype ufs

panic[cpu0]/thread=180e000: vfs_mountroot: cannot mount root

After rebooting server from disk2:

CMD >format
Searching for disks...done

c1t0d0: configured with capacity of 68.00GB

AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0
1. c1t2d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@2,0
Specify disk (enter its number): 0
selecting c1t0d0
[disk formatted]
Disk not labeled. Label it now?


Here I did labelled the disk.

I think that when raidctl is used for the first time , it wipes out all the data from the target disk to create the volume. Can you confirm please ?.

Also, I wanted to use the newly added logical volume and mirror it with disk2 using SVM. To do so, I tried to format the disk c1t0d0 so as to have the same layout as c1t2d0.

CMD >prtvtoc /dev/rdsk/c1t0d0s2 > vtoc.c1t0d0s2

CMD >fmthard -s vtoc.c1t2d0s2 /dev/rdsk/c1t0d0s2
Partition 0 not aligned on cylinder boundary: " 0 2 00 0 126569088 126569087 /"

I am getting the above message. Any idea please?

Thanks.

Tan said...

Hi Troy,

I found the following on Pg 4 of "Sun Fire™ T2000 Server Disk
Volume Management Guide"

Caution – Creating RAID volumes using the on-board disk controller destroys all data on the member disks. The disk controller’s volume initialization procedure
reserves a portion of each physical disk for metadata and other internal information
used by the controller. Once the volume initialization is complete, you can configure
the volume and label it using format(1M). You can then use the volume in the
Solaris operating system.

So, I understand that: (i) when raidctl is used for the 1st time, it wipes out all data on the memeber disks.
(ii) it reserves part of the disk in teh newly created volume for metadata. Thus we cannot have it with the same layout as desired with another disk. Given this restriction, I doubt that we can mirror a volume under raidctl with another disk using SVM.

I would apprecaite your views/comments please.

Thanks.

troy said...

Tan,

Sorry for not having the time to get back to you sooner. In your last comment, you've included a bit about destruction of data, this point is correct. If the disks in question have not been configured for HW raid, then any data on them will be at least corrupted if not destroyed. If however, you have had both disks under HW raid control and have subsequently sliced up the raid presented mirror for filesystems, etc, (which I believe was your system state you started this) then you shouldn't have any data loss or destruction since the space reserved by the raid controller would still be "reserved" even if it isn't being used. Thus when you would subsequently try to mirror those disks, the controller won't be overwriting any of your data, only the previous raid reservation on disk.

Regarding your current situation, you have the following:

- c1t2d0 (stand alone root "mirror")
- c1t0d0 (HW raid mirror of "c1t0d0" and "c1t1d0")

Keep your existing setup as is, don't change it. Using 'format' on c1t0d0, create the same filesystems as you have on c1t2d0. (If all disk space is used, at least one slice on the raid presented disk will be slightly smaller due to the raid reservation on c1t0d0 and c1t1d0. I'll normally short "swap" if it is available as one of the slices.) Once you have recreated your filesystems with 'format' on c1t0d0, label c1t0d0 and quit format and use 'newfs' to set up UFS filesystems on each of the new slices on c1t0d0. Now you'll need to re-mirror your root data from c1t2d0 to c1t0d0. I'll personally tend to use ufsdump and ufsrestore (for reference, see from "step 3" here. Once you have "re-mirrored" your root data from c1t2d0 to the HW raid presented c1t0d0, you should be back to where you were before you started. At that point, the steps within this post / article will be relevant again to what you're trying to accomplish (breaking your HW mirror to patch while still having the availability to back out). In fact, once you are to this point, c1t2d0 is completely unnecessary (but I can understand keeping it in case things similarly go awry again).

Hopefully this helps to resolve your situation. My apologies if I wasn't clear before, I was assuming when you asked about 'raidctl -c c1t0d0 c1t1d0' you had simply broken the original mirror leaving the raid reservation space on the disks still intact.

Let me know if I can be of any further assistance and how things work out for you.

--troy

Tan said...

Hi Troy,

I managed to run the process and it worked very smoothly end-to-end, including the cloning of the root disk, which was very helpful.

Thanks Buddy, Great job.

Bye

troy said...

Tan,

Excellent news. I'm glad I could help and to hear that it all worked out for you.

--troy

Suman Kumar Mandal said...

Hi Troy,

Thanks for such a wonderful document.

In your example you have used the following disk

0. c0t0d0 [LSILOGIC-LogicalVolume-3000 cyl 65533 alt 2 hd 16 sec 136]
/pci@780/pci@0/pci@9/scsi@0/sd@0,0
1. c0t1d0 [LSILOGIC-LogicalVolume-3000 cyl 65533 alt 2 hd 16 sec 136]
/pci@780/pci@0/pci@9/scsi@0/sd@1,0

c0t1d0 has the latest data and system is booted with this disk. To create HW raid mirror, is it necessary to go to OK prompt and create mirror or it can be done at the OS prompt itself?

troy said...

Suman,

This write up assumes that the disks in question contain data and are currently or were just recently under HW raid control as the RAID process stores some data on the disks. Because the raid process stores some configuration data on the disks, if they were not previously under HW raid control, that disk "reservation" doesn't exist thus you risk data loss in performing only these steps.

With the disclaimer out of the way, because you are dealing with the root disk, yes, you will need to perform the steps from either the OK prompt or else booted from a recovery / install [CD|DVD]ROM (or an alternate root volume not on c0t0d0 or c0t1d0). If memory serves, 'raidctl' will fail upon determining the disk is in use and notify you of such. If you had the availability to remove the disk in question from current usage then yes, you could do this from the OS prompt, but since you are booted from this disk the only way to remove it from usage is to boot to an alternate root medium or perform the tasks from the OK prompt.

Hope that helps.

--troy

Suman Kumar Mandal said...

Hi Troy,

Thanks a lot for the reply and for the clear explanation.

I need to do this to patch a critical server and roll back to old if something goes wrong. Your steps are very clear. I am thinking of another way, please correct if i am wrong.

Lets take your example

prefect [0] /usr/sbin/raidctl -l c0t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c0t0d0 68.3G N/A OPTIMAL OFF RAID1
0.0.0 68.3G GOOD
0.1.0 68.3G GOOD

What i am thinking, just plug out the second disk from the server keep it safe and continue with the patching so only first disk will have the latest patch. In case of something goes wrong remove the first disk and plug in the second disk, boot the system which will have old patch levels. Then plug in the first disk to sync the data from second disk to first disk. Please let me know if it is possible.

troy said...

Suman,

Technically speaking, I believe the way you are thinking of going should work. You're effectively forcing a "faulted disk" scenario; I know I've done similar in the past. It's been quite a while, however, since I've gone that route so I'm not sure what, if any, pitfalls you may encounter with that approach. It should just be a matter of accurately maintaining which disk is which and syncing the correct one depending on the outcome (but again, it's been a while).

The above write up keeps a sane and stable environment throughout; what you are speaking of, while feasible, diminishes that (not to say that it won't work). Out of curiosity, why would you consider simply pulling the disk rather than maintaining full control of the process?

Anyhow, whichever method you choose, I hope all goes well for you in your patching. I wouldn't mind knowing which method you decide to go with, how it turns out, and if you run into any unexpected handling.

--troy

Anonymous said...

God bless you this procedure has saved a massive amount of grief!!

I can't thank you enough.