19 October 2010

Creating a Mirrored Volume with Geom (FreeBSD)

INTRO

While it's not uncommon to find a Solaris host with some sort of RAID
configuration in place, within some environments, RAID configurations
on FreeBSD seem to almost be held with trepidation.  The information
detailed below aims to overcome this hesitancy with the creation of a
simple mirrored (RAID1) volume using FreeBSD's current standard, GEOM.
Though the following details GEOM on FreeBSD 8.1, GEOM first appeared
in FreeBSD 5.3.  The following should be usable on any recent FreeBSD
revision, though perhaps with minor revision.

HOST INFO

        Host:                   beastie
        Shell Prompt:           beastie [0]
        OS:                     FreeBSD 8.1
        Volume Disks:           da1
                                da2
        Slice / Partitions:     s1a (12 MB (both disks))
                                s1e (500 MB (both disks))
        Disk Size:              512 MB
        Geom Mirror Vol:        mirror0
        Geom pseduo devpath:    /dev/mirror/mirror0
        Mount Point:            /mnt/mirror0

COMMANDS (quick)

The following details just the commands, no output for creating and
killing a mirrored (RAID1) volume with geom.  For additional details
including output and status checks, see 'COMMANDS (detailed)' in the
section following this one.

Create a mirrored volume:

        beastie [0] /bin/dd if=/dev/zero of=/dev/da1 bs=512 count=32
        beastie [0] /sbin/fdisk -BI /dev/da1
        beastie [0] /bin/dd if=/dev/zero of=/dev/da1s1 bs=512 count=32
        beastie [0] /sbin/bsdlabel -w da1s1
        beastie [0] /sbin/bsdlabel -e da1s1
        beastie [0] /sbin/bsdlabel -A /dev/da1s1 >> /tmp/da1s1.out
        beastie [0] /bin/dd if=/dev/zero of=/dev/da2 bs=512 count=32
        beastie [0] /sbin/fdisk -BI /dev/da2
        beastie [0] /bin/dd if=/dev/zero of=/dev/da2s1 bs=512 count=32
        beastie [0] /sbin/bsdlabel -w da2s1
        beastie [0] /sbin/bsdlabel -R da2s1 /tmp/da1s1.out
        beastie [0] /sbin/gmirror load
        beastie [0] /sbin/gmirror label -hvb round-robin mirror0 /dev/da1s1e
        beastie [0] /sbin/gmirror insert -h mirror0 da2s1e
        beastie [0] /sbin/newfs /dev/mirror/mirror0
        beastie [0] /bin/mkdir /mnt/mirror0
        beastie [0] /sbin/mount /dev/mirror/mirror0 /mnt/mirror0
        beastie [0] echo "`/sbin/mount -p | /usr/bin/grep mirror0`" >> /etc/fstab
        beastie [0] echo 'geom_mirror_load="YES"' >> /boot/loader.conf

Stop and start a mirrored volume:

        beastie [0] /sbin/umount /mnt/mirror0
        beastie [0] /sbin/gmirror stop mirror0
        beastie [0] /sbin/gmirror activate -v mirror0 da1s1e
        beastie [0] /sbin/gmirror activate -v mirror0 da2s1e
        beastie [0] /sbin/mount /dev/mirror/mirror0 /mnt/mirror0

Destroy a mirrored volume, free its disks:

        beastie [0] /sbin/umount /mnt/mirror0
        beastie [0] /sbin/gmirror stop mirror0
        beastie [0] /sbin/gmirror clear da1s1e
        beastie [0] /sbin/gmirror clear da2s1e
        beastie [0] /sbin/gmirror dump da2s1e

COMMANDS (detailed)

This section of commands details 'Setting Up the Disks'.  For the
purpose of discussion, it is assumed that both da1 and da2 are freshly
installed disks, thus need to be configured first.  To start, zero out
any potential initial partition (slice) tables and use 'fdisk' to create
a single FreeBSD slice encompassing the entire disk.

        beastie [0] /bin/dd if=/dev/zero of=/dev/da1 bs=512 count=32
        32+0 records in
        32+0 records out
        16384 bytes transferred in 0.017386 secs (942370 bytes/sec)
        beastie [0] /sbin/fdisk -BI /dev/da1
        ******* Working on device /dev/da1 *******
        fdisk: invalid fdisk partition table found
        fdisk: Class not found
        beastie [0]

The errors above from 'fdisk' can be safely ignored.  For the curious,
the first, 'invalid fdisk ...' merely states that 'fdisk' doesn't see
a valid partition table on the disk.  If one were to rerun the 'fdisk'
command again, this error will not appear.  The second, 'Class not found'
is actually thanks to GEOM and the obsoleting / removal of some of its
modules while the underlying fdisk code hasn't been updated with knowledge
of such.

Follow the above with zeroing out the first blocks on the newly created
slice and partition the disk with 'bsdlabel':

        beastie [0] /bin/dd if=/dev/zero of=/dev/da1s1 bs=512 count=32
        32+0 records in
        32+0 records out
        16384 bytes transferred in 0.018533 secs (884045 bytes/sec)
        beastie [0] /sbin/bsdlabel -w da1s1
        beastie [0] /sbin/bsdlabel -e da1s1

The first 'bsdlabel' creates a generic label for da1s1 which we further
edit with the second 'bsdlabel' command.  The 'edit' version effectively
opens a 'vi' session (or whatever your text editor is set to) so that
you can directly edit the partition table.  The following is an example
of this edit session:

        # /dev/da1s1:
        8 partitions:
        #        size   offset    fstype   [fsize bsize bps/cpg]
          a:  12m           16    unused        0     0
          c:  1048544        0    unused        0     0         # "raw" part, don't edit
          e:    *            *    unused        0     0

Of note, for partition 'e', the first * tells fdisk to use the remaining
unallocated portion of the disk for this partition.  The second * tells fdisk
to automagically determine the appropriate offset based on preceding
configured partitions.

Save and quit the editor when done, verify the new partition table with:

        beastie [0] /sbin/bsdlabel -A /dev/da1s1
        # /dev/da1s1:
        type: unknown
        disk: amnesiac
        label:
        flags:
        bytes/sector: 512
        sectors/track: 32
        tracks/cylinder: 64
        sectors/cylinder: 2048
        cylinders: 511
        sectors/unit: 1048544
        rpm: 3600
        interleave: 1
        trackskew: 0
        cylinderskew: 0
        headswitch: 0           # milliseconds
        track-to-track seek: 0  # milliseconds
        drivedata: 0

        8 partitions:
        #        size   offset    fstype   [fsize bsize bps/cpg]
          a:    24576       16    unused        0     0      
          c:  1048544        0    unused        0     0         # "raw" part, don't edit
          e:  1023952    24592    unused        0     0       

Since both da1 and da2 have the same geometry, and our intent is to
configure them identically, we save a copy of the newly defined partition
table to be used as input to da2.  The following fully configures da2
based on da1:

        beastie [0] /sbin/bsdlabel -A /dev/da1s1 >> /tmp/da1s1.out
        beastie [0] /bin/dd if=/dev/zero of=/dev/da2 bs=512 count=32
        32+0 records in
        32+0 records out
        16384 bytes transferred in 0.017199 secs (952611 bytes/sec)
        beastie [0] /sbin/fdisk -BI /dev/da2
        ******* Working on device /dev/da2 *******
        fdisk: invalid fdisk partition table found
        fdisk: Class not found
        beastie [0] /bin/dd if=/dev/zero of=/dev/da2s1 bs=512 count=32
        32+0 records in
        32+0 records out
        16384 bytes transferred in 0.017200 secs (952559 bytes/sec)
        beastie [0] /sbin/bsdlabel -w da2s1
        beastie [0] /sbin/bsdlabel -R da2s1 /tmp/da1s1.out
        beastie [0] /sbin/bsdlabel -A da2s1
        # /dev/da2s1:
        type: unknown  
        disk: amnesiac
        label:
        flags: 
        bytes/sector: 512
        sectors/track: 32
        tracks/cylinder: 64
        sectors/cylinder: 2048
        cylinders: 511
        sectors/unit: 1048544
        rpm: 3600
        interleave: 1
        trackskew: 0
        cylinderskew: 0
        headswitch: 0           # milliseconds
        track-to-track seek: 0  # milliseconds
        drivedata: 0

        8 partitions:
        #        size   offset    fstype   [fsize bsize bps/cpg]
          a:    24576       16    unused        0     0
          c:  1048544        0    unused        0     0         # "raw" part, don't edit
          e:  1023952    24592    unused        0     0

This section of commands commences with 'Geom Configuration.' With our
disks configured, the mirror is actually simple create:

        beastie [0] /sbin/gmirror load
        beastie [0] /sbin/gmirror load
        gmirror: Command 'load' not available.
        beastie [1] /sbin/kldstat | /usr/bin/grep mirror
        2    1 0xc4ba3000 15000    geom_mirror.ko
        beastie [0] /sbin/gmirror label -hvb round-robin mirror0 /dev/da1s1e
        Metadata value stored on da1s1e.
        Done.
        beastie [0] /sbin/gmirror insert -h mirror0 da2s1e

In the above, the first 2 commands commands, 'gmirror load', simply
loads the geom mirroring kernel module.  (This should be unnecessary on
recent versions of FreeBSD as geom will automatically load it for you,
though is included for older BSD hosts.  Alternatively, '/sbin/kldload
geom_mirror' could also have been used in place of 'gmirror load'.)
The second load fails because the module is already loaded and was
included just to illustrate what to expect if the module has been loaded.
The first 'gmirror' command states:

        label           create mirror
        -h              hardcode the provider name (mirror0) in metadata
                        on the component (consumer) device (da1s1e);
                        without this, the mirror configuration is
                        effectively stored in memory and thus won't be
                        remembered if the mirror is torn down
        -v              verbose output
        -b round-robin  balance method for read I/O
        mirror0         name of the new provider
        /dev/da1s1e     first device in the mirror

The second 'gmirror' command simply adds da2s1e to the existing provider,
mirror0, hardcoding the metadata to disk da2s1e.  (Of note, executing
'gmirror [cmd args]' is the same as executing 'geom mirror [cmd args]'.)
Output to the system console suggests all things went well:

        GEOM_MIRROR: Device mirror/mirror0 launched (1/1).
        GEOM_MIRROR: Device mirror0: rebuilding provider da2s1e.
        GEOM_MIRROR: Device mirror0: rebuilding provider da2s1e finished.

We now have a fully functional geom mirror that can be seen in the dev tree:

        beastie [0] /bin/ls -ld /dev /dev/mirror /dev/mirror/mirror0
        dr-xr-xr-x  8 root  wheel          512 Sep 30 07:18 /dev/
        dr-xr-xr-x  2 root  wheel          512 Sep 30 11:44 /dev/mirror/
        crw-r-----  1 root  operator    0,  89 Sep 30 11:44 /dev/mirror/mirror0

To get current status of the geom mirror:

        beastie [0] /sbin/gmirror list mirror0
        Geom name: mirror0
        State: COMPLETE
        Components: 2
        Balance: round-robin
        Slice: 4096
        Flags: NONE
        GenID: 0
        SyncID: 1
        ID: 1723101773
        Providers:
        1. Name: mirror/mirror0
           Mediasize: 524262912 (500M)
           Sectorsize: 512
           Mode: r0w0e0
        Consumers:
        1. Name: da1s1e
           Mediasize: 524263424 (500M)
           Sectorsize: 512
           Mode: r1w1e1
           State: ACTIVE
           Priority: 0
           Flags: HARDCODED
           GenID: 0
           SyncID: 1
           ID: 3286280006
        2. Name: da2s1e
           Mediasize: 524263424 (500M)
           Sectorsize: 512
           Mode: r1w1e1
           State: ACTIVE
           Priority: 0
           Flags: HARDCODED
           GenID: 0
           SyncID: 1
           ID: 3273941562

Brief status output can be obtained via:

        beastie [0] /sbin/gmirror status mirror0
                  Name    Status  Components
        mirror/mirror0  COMPLETE  da1s1e
                                  da2s1e

You can also get the metadata information from both of the mirror
components (da1s1e, da2s1e):

        beastie [0] /sbin/gmirror dump da1s1e
        Metadata on da1s1e:
             magic: GEOM::MIRROR
           version: 4
              name: mirror0
               mid: 1723101773
               did: 3286280006
               all: 2
             genid: 0
            syncid: 1
          priority: 0
             slice: 4096
           balance: round-robin
         mediasize: 524262912
        sectorsize: 512
        syncoffset: 0
            mflags: NONE
            dflags: NONE
        hcprovider: da1s1e
          provsize: 524263424
          MD5 hash: b4f64055926d840b1048eba31646118a

        beastie [0] /sbin/gmirror dump da2s1e
        Metadata on da2s1e:
        <snip...>
               did: 3273941562
        <snip...>
        hcprovider: da2s1e
        <snip...>
          MD5 hash: 1d254a5324ba9276807376d16780bc77

In the above, I have truncated the output of the dump of da2s1e simply
illustrate the differences in metadata between it and da1s1e regarding
volume mirror0.  At this point, a filesystem can be laid on top of the
mirror or it can be further partitioned with 'bsdlabel'.  For simplicity,
we'll simply layer on an FS:

        beastie [0] /sbin/newfs /dev/mirror/mirror0
        /dev/mirror/mirror0: 500.0MB (1023948 sectors) block size 16384, fragment size 2048
                using 4 cylinder groups of 125.00MB, 8000 blks, 16000 inodes.
        super-block backups (for fsck -b #) at:
         160, 256160, 512160, 768160
        beastie [0] /bin/mkdir /mnt/mirror0
        beastie [0] /sbin/mount /dev/mirror/mirror0 /mnt/mirror0
        beastie [0] /bin/df -h /mnt/mirror0
        Filesystem             Size    Used   Avail Capacity  Mounted on
        /dev/mirror/mirror0    484M    4.0K    445M     0%    /mnt/mirror0

After 'newfs', /mnt/mirror0 was created for a new mount point and the
mirror mounted.  To add this to /etc/fstab, run:

        beastie [0] echo "`/sbin/mount -p | /usr/bin/grep mirror0`" >> /etc/fstab
        beastie [0] /usr/bin/grep mirror0 /etc/fstab
        /dev/mirror/mirror0     /mnt/mirror0        ufs     rw      2 2

To ensure that the mirror is brought up on system reboot, update
/boot/loader.conf:

        beastie [0] echo 'geom_mirror_load="YES"' >> /boot/loader.conf

At this point, the new geom mirror is fully setup and usable for storage.
For maintenance purposes, you can stop the mirror, work on the individual
disks, and restart it.  The following illustrates this:

        beastie [0] /sbin/umount /mnt/mirror0
        beastie [0] /sbin/gmirror stop mirror0
        beastie [0] /sbin/gmirror status mirror0
        gmirror: No such geom: mirror0.
        beastie [1] /sbin/gmirror dump da2s1e
        Metadata on da2s1e:
             magic: GEOM::MIRROR
           version: 4
              name: mirror0
               mid: 1723101773
               did: 3273941562
               all: 2
             genid: 0
            syncid: 1
          priority: 0
             slice: 4096
           balance: round-robin
         mediasize: 524262912
        sectorsize: 512
        syncoffset: 0
            mflags: NONE
            dflags: NONE
        hcprovider: da2s1e
          provsize: 524263424
          MD5 hash: 3408a5884b0103e11ee8de31670eb445

        beastie [0] /sbin/gmirror activate -v mirror0 da1s1e
        Provider da1s1e activated.
        Done.
        beastie [0] /sbin/gmirror activate -v mirror0 da2s1e
        Provider da2s1e activated.
        Done.
        beastie [0] /sbin/gmirror status mirror0
                  Name    Status  Components
        mirror/mirror0  COMPLETE  da1s1e
                                  da2s1e

To completely destroy the volume and free up the underlying disks:

        beastie [0] /sbin/umount /mnt/mirror0
        beastie [0] /sbin/gmirror stop mirror0
        beastie [0] /sbin/gmirror clear da1s1e
        beastie [0] /sbin/gmirror clear da2s1e
        beastie [0] /sbin/gmirror dump da2s1e
        Can't read metadata from da2s1e: Invalid argument.
        gmirror: Not fully done.
        beastie [1]

Don't forget to remove the entry for mirror0 from /etc/fstab and
optionally, remove 'geom_mirror_load' from /boot/loader.conf (assuming
no other geom mirrors are configured).  Also, while the above details
working with a mirrored (RAID1) volume, similar geom commands (with minor
revision) can be used for building RAID 0, 3, 01, and 10 volumes.
See the GEOM(8) for further details.  Enjoy!

2 comments:

123 said...

what about gjournal? When i make "gjournal label ..." it works fine? but i dont get ".journal" devices in /dev folder. What i am doing wrong?

troy said...

123,

Sorry for the delay in response, I've had quite a bit going on lately. As for the journal devices under /dev, I don't recall seeing that situation. Do either the log files or dmesg show any errors or issues that might explain the lack of /dev entries?

-troy