Recently we had a situation wherein Vertias Volume Manager (VxVM)
marked a disk as 'failed' while additionally presenting the disk
available for use though unconfigured. The weird part was that VxVM
continued to allow use of the volume until the volume was stopped.
Checks of the disk group (dg) would show 'NODEVICE' for the disk in
question, and a listing of the disk would show no Veritas header
information. Basically, it appeared that the VxVM private region
was completely corrupted, removed, or otherwise forgotten (by VxVM).
The following details the setup and recovery of this situation.
HOST INFO
Host: snorkle
Shell Prompt: snorkle [0]
OS: Solaris 10
VxVM Version: 5.0 MP3
Failed Disk: c3t20040E187BA326E1d12s2
Disk Media (dm) Name: disk6
Vx Disk Format: cdsdisk
DG Name: mydg
Volume (vol): myvol
Volume Layout: concat
Failover Host: indiana
DETAILS
After volume 'myvol' had been stopped, further attempts to otherwise muck
with the dg or vol would fail. As an example, a deport and import of
'mydg' kicked back the following error:
snorkle [0] /usr/sbin/vxdg deport mydg
snorkle [0] /usr/sbin/vxdg import mydg
VxVM vxdg WARNING V-5-1-560 Disk disk6: Not found, last known location: c3t20040E187BA326E1d12s2
That's an issue. A check of 'vxprint' shows 'NODEVICE', while 'vxdisk'
shows the disk as both failed and available to be initialized:
snorkle [0] /usr/sbin/vxprint -sd -g mydg
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
dm disk0 c3t202401B45E32210Ed0s2 auto 16112 1048526864 -
dm disk1 c3t202401B45E32210Ed1s2 auto 16112 1048526864 -
dm disk2 c3t202401B45E32210Ed2s2 auto 16112 1048526864 -
dm disk3 c3t202401B45E32210Ed3s2 auto 16112 947852944 -
dm disk4 c3t202401B45E32210Ed5s2 auto 16112 1023361040 -
dm disk5 c3t202401B45E32210Ed4s2 auto 16112 1754578960 -
dm disk6 - - - - NODEVICE
sd disk0-01 myvol-01 ENABLED 1048510464 0 - - -
sd disk0-02 myvol-01 ENABLED 16400 8827801024 - - -
sd disk1-01 myvol-01 ENABLED 1048510464 1048510464 - - -
sd disk2-01 myvol-01 ENABLED 1048510464 2097020928 - - -
sd disk3-01 myvol-01 ENABLED 947840384 3145531392 - - -
sd disk3-02 myvol-01 ENABLED 12192 8827817424 - - -
sd disk4-01 myvol-01 ENABLED 1023344640 4093371776 - - -
sd disk5-01 myvol-01 ENABLED 1754529792 5116716416 - - -
sd disk5-02 myvol-01 ENABLED 49168 8827829616 - - -
sd disk6-01 myvol-01 DISABLED 1956554816 6871246208 NODEVICE - -
snorkle [0] /usr/sbin/vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c2t4d0s2 auto:none - - online invalid
c3t20040E187BA326E1d12s2 auto:none - - online invalid
c3t202401B45E32210Ed0s2 auto:cdsdisk disk0 mydg online
c3t202401B45E32210Ed1s2 auto:cdsdisk disk1 mydg online
c3t202401B45E32210Ed2s2 auto:cdsdisk disk2 mydg online
c3t202401B45E32210Ed3s2 auto:cdsdisk disk3 mydg online
c3t202401B45E32210Ed4s2 auto:cdsdisk disk5 mydg online
c3t202401B45E32210Ed5s2 auto:cdsdisk disk4 mydg online
- - disk6 mydg failed was:c3t20040E187BA326E1d12s2
A subsequent 'vxdisk' of c3t20040E187BA326E1d12s2 resulted in output like
that of an uninitialized disk (though vxdmp did identify both I/O paths):
snorkle [0] /usr/sbin/vxdisk list c3t20040E187BA326E1d12s2
Device: c3t20040E187BA326E1d12s2
devicetag: c3t20040E187BA326E1d12
type: auto
info: format=none
flags: online ready private autoconfig invalid
pubpaths: block=/dev/vx/dmp/c3t20040E187BA326E1d12s2 char=/dev/vx/rdmp/c3t20040E187BA326E1d12s2
guid: -
udid: STK%5FBladeCtlr%20B210%5F60E187B000A326E1E00000000242E1022%5F60E187B000A326E1E0000001E0DB562A1
site: -
Multipathing information:
numpaths: 2
c3t20040E187BA326E1d12s2 state=enabled type=primary
c4t20050E187BA326E1d12s2 state=disabled type=secondary
A check of the filesystem prior to volume stop had shown it to be at about
85% capacity of 4.1 TB. Given the sizes of the individual disks used to
create the vol, data had definitely been stored on c3t20040E187BA326E1d12.
Since 'myvol' was a concat volume, the situation was either recover the
disk to a usable state or rely on backups.
Since VxVM 4.0, VxVM includes 'vxconfigbackupd' which will periodically
run to backup the DG config information, including details on the
individual disks comprising the DG. The resulting backup is stored
in /etc/vx/cbr/bk. Unfortunately, 'snorkle' didn't have a usable copy
retaining the config information for DM 'disk6'. The failover host
(indiana), however, did have a good copy. A review of the 'diskinfo'
contents included:
indiana [0] /usr/bin/less /etc/vx/cbr/bk/mydg.1239683178.42.indiana/1239683178.42.indiana.diskinfo
DISK_ATTRIBUTE
Number_of_disk_backup= 7
UUID=STK%5FBladeCtlr%20B210%5F60E187B000A326E1E00000000242E1022%5F60E187B000A326E1
E0000001E0DB562A1
Device: c2t20040E187BA326E1d12s2
devicetag: c2t20040E187BA326E1d12
type: auto
hostid: indiana
disk: name=disk6 id=1242838852.87.snorkle
group: name=mydg id=1239683178.42.indiana
info: format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags: online ready private autoconfig noautoimport imported
pubpaths: block=/dev/vx/dmp/c2t20040E187BA326E1d12s2 char=/dev/vx/rdmp/c2t20040
0A0B818481Fd12s2
guid: {c64dfb98-1dd1-11b2-bf6f-00144fe6e452}
udid: STK%5FBladeCtlr%20B210%5F60E187B000A326E1E00000000242E1022%5F600A0B800
018481E0000001E0DB562A1
site: -
version: 3.1
iosize: min=512 (bytes) max=2048 (blocks)
public: slice=2 offset=65792 len=1956554816 disk_offset=0
private: slice=2 offset=256 len=65536 disk_offset=0
update: time=1243490460 seqno=0.12
ssb: actual_seqno=0.0
headers: 0 240
configs: count=1 len=48144
logs: count=1 len=7296
Defined regions:
config priv 000048-000239[000192]: copy=01 offset=000000 enabled
config priv 000256-048207[047952]: copy=01 offset=000192 enabled
log priv 048208-055503[007296]: copy=01 offset=000000 enabled
lockrgn priv 055504-055647[000144]: part=00 offset=000000
Multipathing information:
numpaths: 2
c2t20040E187BA326E1d12s2 state=enabled type=primary
c3t20050E187BA326E1d12s2 state=enabled type=secondary
UUID=STK%5FFLEXLINE%20380%5F600A0B80001135520000000042C3A513%5F600A0B80001133A40
000FDCC4367B10B
Device: c2t202401B45E32210Ed0s2
devicetag: c2t202401B45E32210Ed0
type: auto
hostid: indiana
<snip...>
(Yes, 'indiana' saw 'disk6' on differing paths than 'snorkle'.) The astute
engineer might expect that the intention is to use 'vxconfigrestore'
to simply re-layer the config back onto the disk. Unfortunately, we did
try that but to no avail. Instead, we had to go the route of grabbing
the public offset (65792) from the retained diskinfo on 'indiana' and
specify it as an option to 'vxdisksetup':
snorkle [0] /usr/lib/vxvm/bin/vxdisksetup -i c3t20040E187BA326E1d12 puboffset=65792
The above may seem questionable but the only thing performed here was
simply the re-addition of the private region to the disk and specifically
telling VxVM where the public region begins. This is to ensure that
'vxdisksetup' understands the layout constraints relative to the stored
data. The following re-associates the "newly initialized" disk back to
'mydg' as 'disk6', starts the volume and verifies it's clean:
snorkle [0] /usr/sbin/vxdg -g mydg -k adddisk disk6=c3t20040E187BA326E1d12s2
snorkle [0] /usr/sbin/vxvol -g mydg -f start myvol
snorkle [0] /usr/sbin/fsck -F vxfs /dev/vx/rdsk/mydg/myvol
file system is clean - log replay is not required
After mounting of the volume and checking via 'vxdisk', all looks to be
well, and subsequent check of the data stored on the volume verified this:
snorkle [0] /usr/sbin/mount -F vxfs /dev/vx/dsk/mydg/myvol /mydg
snorkle [0] /usr/sbin/vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c2t4d0s2 auto:none - - online invalid
c3t20040E187BA326E1d12s2 auto:cdsdisk disk6 mydg online
c3t202401B45E32210Ed0s2 auto:cdsdisk disk0 mydg online
c3t202401B45E32210Ed1s2 auto:cdsdisk disk1 mydg online
c3t202401B45E32210Ed2s2 auto:cdsdisk disk2 mydg online
c3t202401B45E32210Ed3s2 auto:cdsdisk disk3 mydg online
c3t202401B45E32210Ed4s2 auto:cdsdisk disk5 mydg online
c3t202401B45E32210Ed5s2 auto:cdsdisk disk4 mydg online
A final review of 'vxdisk list' shows some differences, such as the 'guid'
and 'disk id', but otherwise the same data as the backup in the 'diskinfo'
from earlier:
snorkle [0] /usr/sbin/vxdisk list c3t20040E187BA326E1d12s2
Device: c3t20040E187BA326E1d12s2
devicetag: c3t20040E187BA326E1d12
type: auto
hostid: snorkle
disk: name=disk6 id=1288050593.56.snorkle
group: name=mydg id=1239683178.42.indiana
info: format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags: online ready private autoconfig noautoimport
pubpaths: block=/dev/vx/dmp/c3t20040E187BA326E1d12s2 char=/dev/vx/rdmp/c3t20040E187BA326E1d12s2
guid: {8f47c23a-1dd2-11b2-bf6f-00144fe6e452}
udid: STK%5FBladeCtlr%20B210%5F60E187B000A326E1E00000000242E1022%5F60E187B000A326E1E0000001E0DB562A1
site: -
version: 3.1
iosize: min=512 (bytes) max=2048 (blocks)
public: slice=2 offset=65792 len=1956554816 disk_offset=0
private: slice=2 offset=256 len=65536 disk_offset=0
update: time=1288052080 seqno=0.12
ssb: actual_seqno=0.0
headers: 0 240
configs: count=1 len=48144
logs: count=1 len=7296
Defined regions:
config priv 000048-000239[000192]: copy=01 offset=000000 enabled
config priv 000256-048207[047952]: copy=01 offset=000192 enabled
log priv 048208-055503[007296]: copy=01 offset=000000 enabled
lockrgn priv 055504-055647[000144]: part=00 offset=000000
Multipathing information:
numpaths: 2
c3t20040E187BA326E1d12s2 state=enabled type=primary
c4t20050E187BA326E1d12s2 state=disabled type=secondary
It should be noted that we took this route as a last resort prior to
restore from backup. We had already attempted other alternatives, such
as 'vxconfigrestore' and 'vxreattach', but to no avail. Should you have
any questions, comments, suggestions, etc, feel free to let me know.