09 November 2010

Removing / Recovering an Open File (Linux)

Following up on "Removing / Recovering an Open File (Solaris)," the aim
here is to do the same with Linux.  The setup is the same, wherein a
'tail' is run in another terminal to hold open a file descriptor to a
file, while said file is accidentally removed.  Though the following
details CentOS 5.4, the same steps should be applicable on current and
previous versions of like distros.

INFO

        host:           tux
        OS:             Linux (CentOS 5.4)
        prompt:         tux [0]
        device:         /dev/sdc1
        mount point:    /mnt/logfiles
        file:           /mnt/logfiles/messages

SETUP

In a separate terminal, we have a tail running against
'/mnt/logfiles/messages'.  As mentioned above, this holds open a file
descriptor to the file, similar to a process such as 'syslog'.  Though
'syslog' would hold it open for write only, the details provided are
the same.

        tux [0] /usr/bin/tty
        /dev/pts/1
        tux [0] /usr/bin/tail -f /mnt/logfiles/messages
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.

DETAILS

Similar to the Solaris setup, we have sysadmin Winston logged into 'tux'
(tty: /dev/pts/0) when he notices the following:

        tux [0] /bin/df -h /mnt/logfiles
        Filesystem            Size  Used Avail Use% Mounted on
        /dev/sdc1             496M  265M  207M  57% /mnt/logfiles

Not much of an issue but he decides to have a look at the files stored
there to see if there's anything he can clean up:

        tux [0] cd /mnt/logfiles
        tux [0] /bin/ls -li
        total 268107
        13 -rw-r--r-- 1 root root    118157 Nov  6 21:19 ftplog
        15 -rw-r--r-- 1 root root  22351863 Nov  6 21:28 httpd
        11 drwx------ 2 root root     12288 Nov  6 21:02 lost+found
        16 -rw-r--r-- 1 root root    667767 Nov  6 21:29 maillog
        12 -rw-r--r-- 1 root root 249583452 Nov  6 21:26 messages
        14 -rw-r--r-- 1 root root    726031 Nov  6 21:27 messages.0.gz

Seeing messages at about 239 MB in size, Winston decides to rotate out
the file to be compressed for retention but in haste, removes it.

        tux [0] /bin/rm messages
        tux [0] /bin/ls -li
        total 23415
        13 -rw-r--r-- 1 root root   118157 Nov  6 21:19 ftplog
        15 -rw-r--r-- 1 root root 22351863 Nov  6 21:28 httpd
        11 drwx------ 2 root root    12288 Nov  6 21:02 lost+found
        16 -rw-r--r-- 1 root root   667767 Nov  6 21:29 maillog
        14 -rw-r--r-- 1 root root   726031 Nov  6 21:27 messages.0.gz
        tux [0] /bin/df -h /mnt/logfiles
        Filesystem            Size  Used Avail Use% Mounted on
        /dev/sdc1             496M  265M  207M  57% /mnt/logfiles

So Winston accidentally removed the 'messages' file, as seen above.
Additionally, no space was freed up on the filesystem, odd since he
just removed a file consuming almost half of the available capacity.
Time to see if there are any processes running in '/mnt/logfiles':

        tux [0] /sbin/fuser -cu /mnt/logfiles
        /mnt/logfiles:        2119c(root)  2464(root)
        tux [0] /bin/ps -fp `/sbin/fuser -c /mnt/logfiles 2>/dev/null`
        UID        PID  PPID  C STIME TTY          TIME CMD
        root      2119  2117  0 22:23 pts/0    00:00:00 /bin/ksh
        root      2464  2209  0 22:09 pts/1    00:00:00 /usr/bin/tail -f /mnt/logfiles/messages

(Passing 'fuser' STDERR output to '/dev/null' leaves only the returned
process IDs.) It would appear that there is currently a process (2464)
holding open '/mnt/logfiles/messages'.  Off to /proc to see if we can
find out anything else:

        tux [0] cd /proc/2464/fd
        tux [0] /bin/ls -l
        total 0
        lrwx------ 1 root root 64 Nov  6 22:09 0 -> /dev/pts/1
        lrwx------ 1 root root 64 Nov  6 22:09 1 -> /dev/pts/1
        lrwx------ 1 root root 64 Nov  6 22:09 2 -> /dev/pts/1
        lr-x------ 1 root root 64 Nov  6 22:09 3 -> /mnt/logfiles/messages (deleted)

A listing of the file descriptors for process 2464 shows descriptor
3, linking back to our messages file and also noting that it has
been deleted.  We can also see the controlling process holding open
the descriptor is running as root:root.  Since the file descriptor is
still open, the contents of messages are still in memory and accessible
from /proc.  Time for Winston to recover the file:

        tux [0] /bin/cp 3 /tmp/messages.orig
        tux [0] /bin/ls -l /tmp/messages.orig
        -rw-r--r-- 1 root root 249583452 Nov  6 22:20 /tmp/messages.orig
        tux [0] tail -10 /tmp/messages.orig
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.
        WAR IS PEACE, FREEDOM IS SLAVERY, and IGNORANCE IS STRENGTH.

The file has been recovered to '/tmp/messages.orig', showing the same
size as seen before it was removed and the contents appear appropriate
for this file.  (Of note, if the holding process was still writing to the
file, the size may have since increased.)  Nothing left but recreating the
'messages' file under '/mnt/logfiles', killing the 'tail' process, and
compressing our recovered 'messages' file to be moved to '/mnt/logfiles':

        tux [0] /bin/touch /mnt/logfiles/messages
        tux [0] /bin/ls -li /mnt/logfiles/messages
        17 -rw-r--r-- 1 root root 0 Nov  6 22:26 /mnt/logfiles/messages
        tux [0] /bin/kill 2464
        tux [0] /bin/ps -ef | grep 2464
        tux [1] /bin/ls /proc/2464
        /bin/ls: /proc/2464: No such file or directory
        tux [2] /bin/gzip /tmp/messages.orig
        tux [0] /bin/mv /tmp/messages.orig.gz /mnt/logfiles/.
        tux [0] /bin/ls -li /mnt/logfiles
        total 24129
        13 -rw-r--r-- 1 root root   118157 Nov  6 21:19 ftplog
        15 -rw-r--r-- 1 root root 22351863 Nov  6 21:28 httpd
        11 drwx------ 2 root root    12288 Nov  6 21:02 lost+found
        16 -rw-r--r-- 1 root root   667767 Nov  6 21:29 maillog
        17 -rw-r--r-- 1 root root        0 Nov  6 22:26 messages
        14 -rw-r--r-- 1 root root   726031 Nov  6 21:27 messages.0.gz
        12 -rw-r--r-- 1 root root   726034 Nov  6 22:20 messages.orig.gz
        tux [0] /bin/df -h /mnt/logfiles   
        Filesystem            Size  Used Avail Use% Mounted on
        /dev/sdc1             496M   26M  445M   6% /mnt/logfiles

At this point, 'messages' has been restored to 'messages.orig.gz', a
new messages file created, the holding process has been killed, and
we've even reclaimed almost half of the available filesystem capacity.
As detailed above, the procedure for Linux is almost the same as
it is under Solaris.

see also:
    Removing / Recovering an Open File (Solaris)
    Finding Open Files in Linux