17 January 2011

Missing GRUB Config in Linux

Having recently written up how to restore GRUB to a missing / corrupted
master boot record (MBR)
, it seemed appropriate to follow up with
resolving a missing GRUB config file.  Our host details for this
situation are:
        HOST:           tux
        PROMPTS:        [grub> |tux [0] ]
        OS:             CentOS 5.4 Linux
        DISKS:          [sda (hd0|disk1)|sdb (hd1|disk2)]
        ROOT PARTITION: sdb1
While the host has two viable disks, we'll only focus on disk 2.  Also,
the following details a Red Hat variant of Linux, though may additionally
be usable with other distros or other UNIX variants using GRUB.

After powering on the host, we tell the BIOS to boot disk 2.  Once through
POST, the BIOS runs the boot code from the MBR of disk 2, only to display
the following:
            GNU GRUB  version 0.97  (639K lower / 1047488K upper memory)

         [ Minimal BASH-like line editing is supported.  For the first word, TAB
           lists possible command completions.  Anywhere else TAB lists the possible
           completions of a device/filename.]

        grub> _
Well, GRUB is installed on the MBR, otherwise we wouldn't have seen
the above.  The reason that we see it is likely due to a missing GRUB
configuration file (grub.conf or menu.lst).  Proceeding from that
assumption, we use 'find' to locate the "stage1" boot file on the
available disks, subsequently setting the root disk via 'root':
        grub> find /boot/grub/stage1
         (hd0,0)
         (hd1,0)

        grub> root (hd1,0)
         Filesystem type is ext2fs, partition type 0x83
The 'find' returns two disks.  As previously stated, we want the
correlative to our root disk on "sdb1" so we set 'root' to "hd1,0".
(We are actually setting the device that contains our /boot directory.)
The value breaks down to hd1 = disk 2, hd1,0 = disk2, first partition.
Now we need to tell grub where our kernel is.  In the first 'kernel'
example, I've only typed out the beginning of the kernel file and hit
[TAB] for autocompletion, resulting in the second 'kernel' command.
The second kernel command has additional options of "ro" to tell the
kernel to initially mount the root FS read only, while "root=/dev/sdb1"
identifies our root partition.  After 'kernel', we specify an initial
ramdisk to use via 'initrd' ([TAB] autocompletion can be used here
as well):
        grub> kernel /boot/vml[TAB]

        grub> kernel /boot/vmlinuz-2.6.18-164.el5 ro root=/dev/sdb1
           [Linux-bzImage, setup=0x1e00, size=0x1c31b4]

        grub> initrd /boot/initrd-2.6.18-164.el5.img
           [Linux-initrd @ 0x37d73000, 0x27c402 bytes]

        grub>
As an aside, if either the kernel or ramdisk cannot be found, grub will
return the following:
        grub> kernel /boot/trash[TAB]
        Error 15: File not found

        grub> kernel /boot/trashfile

        Error 15: File not found
In this case, if the file is your kernel and you are typing out the
correct path, you will need to either re-install your kernel or recover it
from backups.  The same applies for the ramdisk, though you may also have
the additional option of using 'mkinitrd' from an alternate boot disk.
After setting the root disk, the kernel file, and the ramdisk, we can
tell GRUB to boot the system which will bring us to our login prompt:
        grub> boot

        # system boots up

        CentOS release 5.4 (Final
        Kernel 2.6.18-164.el5 on an i686

        tux login: _
With the system up, we verify that neither grub.conf nor menu.lst exist,
check if there is a volume label on our root disk, and verify the file
names of our ramdisk and kernel:
        tux [0] /bin/ls -l /boot/grub/grub.conf /boot/grub/menu.lst
        /bin/ls: /boot/grub/grub.conf: No such file or directory
        /bin/ls: /boot/grub/menu.lst: No such file or directory
        tux [2] /sbin/tune2fs -l /dev/sdb1 | /bin/grep name:
        Filesystem volume name:   /2
        tux [0] /bin/ls /boot | /bin/egrep 'vmlinuz|initrd'
        initrd-2.6.18-164.el5.img
        vmlinuz-2.6.18-164.el5
        tux [0]
At this point, we need to create a new configuration file to boot the
host without futher intervention.  Using your favorite text editor
(vi?), create "grub.conf" with the following, or similar, tailoring it
to your needs:
        tux [0] /bin/cat /boot/grub/grub.conf
        default=0
        timeout=5
        title CentOS
                root (hd1,0)
                kernel /boot/vmlinuz-2.6.18-164.el5 ro root=LABEL=/2
                initrd /boot/initrd-2.6.18-164.el5.img
Of note, rather than specifying the exact disk for root as we did from
the GRUB prompt, we've configured "grub.conf" with root using the volume
label (/2) that was returned from 'tune2fs -l'.  Since we are using
CentOS (a Red Hat clone), "menu.lst" is a symlink back to "grub.conf".
Use 'ln -s' to recreate the symlink and verify it:
        tux [0] /bin/ln -s /boot/grub/grub.conf /boot/grub/menu.lst
        tux [0] /bin/ls -l /boot/grub/grub.conf /boot/grub/menu.lst
        -rw-r--r-- 1 root root 141 Jan 17 17:58 /boot/grub/grub.conf
        lrwxrwxrwx 1 root root  20 Jan 17 17:59 /boot/grub/menu.lst -> /boot/grub/grub.conf
With our work complete, the only thing left to do is reboot and verify
our work.  Aside from telling the BIOS to boot disk 2, no further
interaction occurs:
        tux [0] /sbin/reboot

    Output after telling the BIOS to use disk 2:

         GNU GRUB  version 0.97  (639K lower / 1047488K upper memory)

        CentOS


           Use the <up> and <down> keys to select which entry is highlighed.
           Press enter to boot the selected OS, 'e' to edit the
           commands before booting, 'a' to modify the kernel arguements
           before booting, or 'c' for a command-line.

        The highlighted entry will be booted automatically in 5 seconds.

    After timeout, the "CentOS" option is automatically booted:

          Booting 'CentOS'

        root (hd1,0)
         Filesystem type is ext2fs, partition type 0x83
        kernel /boot/vmlinuz-2.6.18-164.el5 ro root=LABEL=/2
           [Linux-bzImage, setup=0x1e00, size=0x1c31b4]
        initrd /boot/initrd-2.6.18-164.el5.img
           [Linux-initrd @ 0x37d73000, 0x27c402 bytes]
        <snip...> 

        CentOS release 5.4 (Final
        Kernel 2.6.18-164.el5 on an i686

        tux login: _

see also:
    GRUB, a Corrupted MBR, and Linux

4 comments:

Anonymous said...

Hello Troy.
Your site is helpful.
I do need more help. I performed what you mentioned. The last command I executed was:
grub>boot
it seemed to boot but in the end
it said something like
...panic...

anyways then I tried your "grub, a correuped MBR, adn Linux" page.
I am able to get to a black and white menue that allows me to choose with OS I want to load (centos 2.6.18-238.19.1.el5,
centos 2.6.18-164.el5, other)

However when I choose any OS it asks me to press any key to continue...
If I press a key it brings me back t the previous menu. I am going in circles.


When I do the "find /boot/grub/stage1" I get (hd0,2).
However, when I am now getting to the dual boot menu (which is black and white; it used to have a blue background) is shows "root (hd0,1)
fielsystem type, unknown, partition type 0x7
Kernel /boot/vmliuz....
Error 17: Cannot mount selected partition
Press Any key to continue"

Any suggestons?

Joseph

troy said...

Joseph,

So it looks like you had one problem (the panic during boot up), though inadvertently created a second one in following the "corrupted MBR" post. Since you were originally getting a panic after the boot process started, the MBR likely wasn't your issue. Additionally, depending on why the panic was occurring, it may still happen once your host is again bootable.

Skipping the panic bit for now, you also followed the "corrupted MBR" post, which is where your current problem lies. Based on the information you've already given, it looks like two centos kernels and maybe a windows bootable. Forgetting the "other" menu option (windows?) for the moment, when you performed "find /boot/grub/stage1" which returned "hd0,2", this is your linux boot partition (either / or /boot). Did you follow through the rest of the post (relevant to your host's configuration), "grub> root (hd0,2)" and "setup (hd0)"? Did 'grub' complain at all?

You mentioned that you were able to get back to a usable grub boot menu but it shows "root (hd0,1)" instead of "root (hd0,2)", so I assume you went to the edit options of the entry. Rather than doing that, from the boot menu, you may try simply opting for 'c' which should take you to a grub command line. From there, you would follow along this post (missing grub config), using "root (hd0,2)", etc., which should then boot into linux. Given that the boot menu you are seeing appears to be wrong, after boot up, you would then need to fix /boot/grub/grub.conf, updating the entries as appropriate, with centos as "root (hd0,2)" and windows(?) as "rootnoverify (hd0,1)".

Assuming the above works out, if you do see a panic after the kernel has loaded and linux starts to boot, you will need to troubleshoot the panic. Unfortunately, without additional details regarding your situation, I'm afraid I don't have anything else for you. If you don't mind, I would be curious to know how things turn out for you or if you resolved the situation in another manner.

--troy

Anonymous said...

Hello troy,
Thanks for the response. Would you mind emailing me your response? Or is there a feature on your blog that notifies me so that I know when you have responded?
My email is j_ 4 j u n k @ y a hoo
Please remove spaces.

To put more detail in my original post,
I first started had the following:
GNU GRUB version 0.97 (639K lower / 1047488K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]

grub> _

So I followed your "Missing GRUB Config in Linux" page. I followed your instructions still line "grub> boot".
After it booted, that is when I received the "Kernel panic - not syncing: Attempted to kill init" message.
Since that didn't work, I followed your instructions on "GRUB, a Corrupted MBR, and Linux" page.
I got upto the "grub> quit" command which gave me
Error 27: Unrecongnized command
So I skipped that "quit" command and executed the "grub> reboot" command. This got me to the circular mene/error.
"Press any key to enter the menu

Booting CentOS Mirror (2.6.18-238.19.1.el5) in 3 seconds...
Booting 'CentOS Mirror (2.6.18-238.19.1.el5)'

root (hd0,1)
Filesystem type unkonw, partition type 0x7
Kernel /boot/vmlinuz-2.6.238.19.1.el5
Error 17: Cannot mount selected partition
Press any key to continue ..."

Well if I press any key it brings me to the blk n white menu. If I choose CentOs version 2.6.18-238.19.1.el5, CentOs ver 2.6.18-164.el5, or Other (windows) it brings me back to the Booting CentOs ...Error 17: Cannot ...Press any key to continue.
Hopefully that was detailed enough.

So now to answer your message.
You asked: Did you follow through the rest of the post (relevant to your host's configuration), "grub> root (hd0,2)" and "setup (hd0)"? Did 'grub' complain at all?
I followed through with hd0,2. grub didn't complain. BUT when exectuing the 'setup (hd0,2)'
the 'Running "embed ...(hd0,2)"...failed (this is not fatal)' appeared instead of what is on your page "Running "embed /boot/grub/e2fs_stage1_5 (hd1)"... 15 sectores are embedded.
succeeded"

You suggested: so I assume you went to the edit options of the entry. Rather than doing that, from the boot menu, you may try simply opting for 'c' which should take you to a grub command line. From there, you would follow along this post (missing grub config), using "root (hd0,2)", etc., which should then boot into linux.

Well this is what I did originally and it didn't work. This is where I get the panic message.

thanks in advance
Joseph

troy said...

Joseph,

I've also emailed this to you at the address you provided.

As for the boot menu, it looks like the boot menu is wrong (given the partition type of 0x7). To resolve that so you can at least boot, I would probably do one of two things. Either navigate to the intended boot option and press 'e' instead of 'enter'. This will provide you the option of modifying the boot attributes including root device, kernel, and initrd file. Since it is trying to use hd0,1 as the root device, you could then change it to hd0,2. (Since changes made here or in the alternate approach only affect the boot up this one time, you would still need to fix the grub menu (menu.lst or grub.conf) after successful boot up. This will likely be after the panic is resolved.) The alternative would be to choose no menu option and press 'c' instead, which would bring you to a grub command line. From there, you would simply follow along the "Missing GRUB Config in Linux" post, using 'root (hd0,2)', 'kernel /boot/vmlinuz-2.6.238.19.1.el5 ro root=ROOTDEV', 'initrd /boot/initrd-2.6.18-238.19.1.el5.img'. (ROOTDEV is the device and partition that '/' exists on, so /dev/sda1 for example. Aside from 'root (hd0,2)', I'm making a guess as to the actual file names on your system based on the kernel version you've given. You'll need to adjust the filenames to match those on your system.) Either of these should prevent you from attempting to boot using 'root (hd0,1)', since we know that doesn't work.

The next part that concerns me, though may not be an issue, is when you responded to my "follow through the rest of the post" question. I included 'root (hd0,2)' and 'setup (hd0)', since they are relevant to your setup. In your response, you include 'setup (hd0,2)' which fails. The issue is that 'setup (hd0)' is the part that actually install the necessary boot blocks to the MBR which is relative to the disk, not the partition. Running 'setup (hd0,2)' instead attempted to install the boot blocks to the first sectors of the second partition of the disk. If this failed and did nothing, there shouldn't be much of an issue. If it failed but still mucked up those sectors, your file system on that partition could be somewhat corrupt. If it is corrupt, you may be able to fix it with a simple 'fsck', you will possibly need to boot from a CD / DVD install disk to do this. Additionally, since grub appears to be installed to the MBR at this point, you probably don't need to go back through installing it again as detailed in the "corrupt MBR" post. You do need to verify that your file system on hd0,2 is sane, though.

Now, for the last step, I've seen similar kernel panics (kernel panic - not syncing) though normally if something is wrong with initrd (Fixing a Broken initrd in Linux). To resolve that usually means booting from an install CD / DVD and rebuilding the initrd file. If you are having to rebuild the initrd file, then either a hardware or kernel change likely occurred on the system. With that thought, did you make any changes to the system just prior to the problems starting? Unfortunately, there are several things that can actually cause the panic message you are seeing so this might not be your issue. If you can provide more of the lines preceding the panic, that would be helpful to figure out what is causing it.

--troy