03 December 2011

Fixing a Broken initrd in Linux

I recently had a situation where a host's disk controller failed.
The box would get partially through a BIOS post before throwing errors
about the failed controller and then fail to boot.  To resolve the
situation, we effectively replaced said controller.  This, however, lead
to the situation wherein the initial RAM disk (initrd) didn't load the
appropriate driver / kernel module for the new controller.  The following
details how to rectify the situation was rectified.  Our host details are:
        HOST:           humboldt
        PROMPT:         [boot: |sh-3.2# ]
        OS:             CentOS 5.6 Linux with a stock kernel
        MEDIA:          disk 1 of Linux install CDs / DVD
After replacing the failed disk controller, the box was powered on and
booted up.  The default (and only) boot entry was selected and the host
began to boot:
         GNU GRUB  version 0.97  (639K lower / 1047488K upper memory)

        CentOS 5.6 (2.6.18 238.el5)               <==================


           Use the <up> and <down> keys to select which entry is highlighted.
           Press enter to boot the selected OS, 'e' to edit the
           commands before booting, 'a' to modify the kernel arguments
           before booting, or 'c' for a command-line.

        The highlighted entry will be booted automatically in 5 seconds.
    # screen clears and Linux begins to boot:
        root (hd0,0)
         Filesystem type is ext2fs, partition type 0x83
        kernel /boot/vmlinuz-2.6.18-238.el5 ro root=LABEL=/
           [Linux-bzImage, setup=0x1e00, size=0x1fd63c]
        initrd /boot/initrd-2.6.18-238.el5.img
           [Linux-initrd @ 0x37d60000, 0x28f833 bytes]


        Kernel alive
        kernel direct mapping tables up to 100000000 @ 10000-15000
        <snip...>
        Waiting for driver initialization.
        Scanning and configuring dmraid supported devices
        Trying to resume from LABEL=SWAP-sda2
        Unable to access resume device (LABEL=SWAP-sda2)
        Creating root device.
        Mounting root filesystem.
        mount: could not find filesystem '/dev/root/
        Setting up other filesystems.
        Setting up new root fs
        setuproot: moving /dev failed: No such file or directory
        no fstab.sys, mounting internal defaults
        setuproot: error mounting /proc: No such file or directory
        setuproot: error mounting /sys: No such file or directory
        Switching to new root and running init.
        unmounting old /dev
        unmounting old /proc
        unmounting old /sys
        switchroot: mount failed: No such file or directory
        Kernel panic - not syncing: Attempted to kill init!
Unfortunately, the host panic'd so we're off to retrieve "disk 1" of the
Linux install media.  After resetting the host and booting from CDROM,
we come to the normal install screen where we enter 'linux rescue'
at the prompt:
         -  To install or upgrade in graphical mode, press the <ENTER> key.

         -  To install or upgrade in text mod, type: linux text <ENTER>.

         -  Use the function keys listed below for more information.

        [F1-Main] [F2-Options] [F3-General] [F4-Kernel] [F5-Rescue]
        boot: linux rescue
In the following screens select Language option, keyboard type, network
startup [Yes | No] (select No), whether to mount root rw or ro [Continue |
Read-Only | Skip] (select Continue), etc.  Once complete we're presented
with a shell and our filesystems mounted under "/mnt/sysimage":
        Your system is mounted under the /mnt/sysimage directory.
        When finished please exit from the shell and your system will reboot.

        sh-3.2#
With our filesystems mounted, we need 'chroot' to "/mnt/sysimage", back up
the existing "initrd" file, and finally create a new one with 'mkinitrd':
        sh-3.2# /usr/sbin/chroot /mnt/sysimage
        sh-3.2# /bin/ls /boot/initrd*
        /boot/initrd-2.6.18-238.el5.img
        sh-3.2# /bin/mv /boot/initrd-2.6.18-238.el5.img /boot/old-initrd-2.6.18-238.el5.img-old
        sh-3.2# /sbin/mkinitrd /boot/initrd-2.6.18-238.el5.img 2.6.18-238.el5
The parameters to 'mkinitrd' are the file to create and the kernel version
to load modules for.  (The version should match a module directory
under "/lib/modules" within the chroot'd environment.) A quick check
shows we now have two "initrd" files, the old, broken one and the newly
created file.  After our file check, we exit the "chroot" environment,
and then exit the "rescue" environment which initiates a system reboot.
While the system is resetting, remember to remove the CDROM and to boot
the host from disk:
        sh-3.2# /bin/ls /boot/*initrd*
        /boot/initrd-2.6.18-238.el5.img  /boot/old-initrd-2.6.18-238.el5.img-old
        sh-3.2# exit
        exit
        sh-3.2# exit
        exit
        sending termination signals...done
        disabling swap...
                /dev/sda2
        unmounting filesystems...
                  /mnt/runtime done
                  disabling /dev/loop0
                  /proc/bus/usb done
                  /proc done
                  /dev/pts done
                  /sys done
                  /tmp/ramfs done
                  /selinux done
                  /mnt/sysimage/var done
                  /mnt/sysimage/sys done
                  /mnt/sysimage/proc done
                  /mnt/sysimage/dev/pts done
                  /mnt/sysimage/dev done
                  /mnt/sysimage/selinux done
                  /mnt/sysimage done
        rebooting system
After the system reset, the default grub entry is booted (as before)
and this time, Linux is able to get through full system boot up:
        <snip...>
        Waiting for driver initialization.
        Scanning and configuring dmraid supported devices
        Trying to resume from LABEL=SWAP-sda2
        No suspend signature on swap, not resuming.
        Creating root device.
        Mounting root filesystem.
        kjournald starting.  Commit interval 5 seconds
        EXT3-fs: mounted filesystem with ordered data mode.
        Setting up other filesystems.
        Setting up new root fs
        no fstab.sys, mounting internal defaults
        Switching to new root and running init.
        unmounting old /dev
        unmmounting old /proc
        unmounting old /sys
        SELinux:  Disabled at runtim.
        type=1404 audit(1322935605.226:2): selinux=0 auid=4294967295 ses=4294967295
        INIT: version 2.86 booting
                        Welcome to  CentOS release 5.6 (Final)
                        Press 'I' to enter interactive startup.
        <snip...>
    # screen clears to console login:
        CentOS release 5.6 (Final)
        Kernel 2.6.18-238.el5 on an x86_64

        humboldt login: _
Our work is complete and our host is functional once again.