21 November 2010

Overlaid Mount Problem

In UNIX environments, it's not uncommon to see various file systems (FS)
mounted, be it local storage, fiber attached from a SAN, network based
like NFS, etc.  These mounts may be exist for any number of reasons,
such as application data, user home directories, additional log space,
and so on.  On occasion, such mounts can obfuscate an otherwise easily
identifiable problem.  So here's our host details:
        HOSTS:          tux, beastie
        PROMPT:         host [0]
        OSes:           CentOS 5.4, FreeBSD 8.1 (respectively)
Of note, for point of discussion Linux (tux) is used in the examples.
The steps for FreeBSD (beastie) follow at the bottom, though without
the main discussion.  Additionally, the setup detailed below should be
usable on previous versions of both Linux and FreeBSD, and possibly
other OSes.  (Notably, the following will not work on Solaris, though
it was attempted.)

Our situation is that we've been paged on an FS (/mnt) near capacity on
host 'tux'.  Logging in, we check things out (step 1):
        tux [0] /bin/df -h
        Filesystem            Size  Used Avail Use% Mounted on
        /dev/sda1             7.1G  1.6G  5.2G  24% /
        /dev/sda2             2.0G   71M  1.8G   4% /var
        tmpfs                 506M     0  506M   0% /dev/shm
        /dev/sdc1             496M  410M   61M  88% /mnt
        /dev/sdd1             496M   32M  440M   7% /mnt/logs
        tux [0] cd /mnt
        tux [0] /bin/ls -aF
        ./  ../  bin/  data/  lib/  logs/  lost+found/
        tux [0] /usr/bin/du -sh ./*
        782K    ./bin
        61M     ./data
        2.1M    ./lib
        29M     ./logs
        12K     ./lost+found
In the 'df' output, we can see that '/mnt' is at 88% capacity, though the
subsequent 'du' shows nothing even close to consuming that much space.
Perhaps a file has been removed that's still held open (step 2):
        tux [0] /sbin/fuser -cu /mnt 
        /mnt:                 2094c(root)  2484e(root)
        tux [0] /bin/ps -fp `/sbin/fuser -c /mnt 2>/dev/null`
        UID        PID  PPID  C STIME TTY      STAT   TIME CMD
        root      2094  2092  0 22:26 pts/0    Ss+    0:00 /bin/ksh
        root      2484  2226 37 23:05 pts/1    S+     1:46 myapp -f /mnt/logs/app.log
        tux [0] /usr/bin/tty
        /dev/pts/0
Given the above, we see that PID 2484 is currently running from '/mnt'.
Having a look below at /proc, it appears that PID 2484 has an open file
descriptor, FD 3.  We get the inode number of the file '/mnt/logs/app.log'
that FD 3 is linked against in the subsequent 'ls' (step 3):
        tux [0] /bin/ls -l /proc/2484/fd
        total 0
        lrwx------ 1 root root 64 Nov 21 23:08 0 -> /dev/pts/1
        lrwx------ 1 root root 64 Nov 21 23:08 1 -> /dev/pts/1
        lrwx------ 1 root root 64 Nov 21 23:08 2 -> /dev/pts/1
        lr-x------ 1 root root 64 Nov 21 23:08 3 -> /mnt/logs/app.log
        tux [0] /bin/ls -liL /proc/2484/fd/3
        83971 -rw-r--r-- 1 root root 354418688 Nov 21 23:02 /proc/2484/fd/3
        tux [0] /usr/bin/find /mnt -inum 83971 -exec /bin/ls -li {} \;
        /usr/bin/find: /a/lost+found: Permission denied
        tux [1]
The file size referenced by FD 3 is about 338 MB (354418688 bytes),
which on this FS is quite a bit.  Alright, smoking gun, the file no
longer exists though there is an open FD to it.  We get approval to kill
'myapp' so that the FD is released, thus reclaiming our space (step 4):
        tux [1] /bin/kill 2484
        tux [0] /bin/ps -fp 2484
        UID        PID  PPID  C STIME TTY          TIME CMD
        tux [1] /bin/df -h /mnt
        Filesystem            Size  Used Avail Use% Mounted on
        /dev/sdc1             496M  410M   61M  88% /mnt
Now we definitely have an issue.  We killed the offending process
and verified it is dead, yet we haven't freed up any space.  It seems
we may have overlooked something, or maybe have a corrupt or really
fragmented FS.  We do have a couple of clues, however.  In our first
'df' output, we saw (step 5):
        <snip...>
        /dev/sdc1             496M  410M   61M  88% /mnt
        /dev/sdd1             496M   32M  440M   7% /mnt/logs
From the 'df', we have 'sdc1' attached at '/mnt'; we also have 'sdd1'
subsequently attached at '/mnt/logs'.  It is possible that our issue lies
under the mount of '/mnt/logs'.  Our second clue, at least in Linux,
come from our 'find' of the inode.  Our 'find' can't seem to locate
the inode on '/mnt' though there isn't any message in our 'ls' under
'/proc' indicating the file was previously deleted like we'd expect.
(If the file linked to by an FD has been deleted, Linux tends to tag
"(deleted)" at the end of the line in an 'ls' long listing as was seen
in the writeup for "Removing / Recovering an Open File (Linux)").

Fortunately, we can have a look at any possible file(s) that '/mnt/logs'
may be mounted over and hiding by remounting '/mnt' via NFS.  Assuming NFS
isn't running, we need to set it up with an entry in '/etc/exports' and
starting the necessary services.  After which, we can mount '/mnt'
via NFS elsewhere, such as under '/a' (step 6):
        tux [0] /bin/cat /etc/exports
        /mnt localhost(ro) 127.0.0.1(ro)
        tux [0] /etc/init.d/portmap start
        Starting portmap:                                          [  OK  ]
        tux [0] /etc/init.d/nfs start
        Starting NFS services:                                     [  OK  ]
        Starting NFS quotas:                                       [  OK  ]
        Starting NFS daemon:                                       [  OK  ]
        Starting NFS mountd:                                       [  OK  ]
        Starting RPC idmapd:                                       [  OK  ]
        tux [0] [ ! -d /a ] && /bin/mkdir /a
        tux [0] /bin/mount -t nfs -o ro,nolock localhost:/mnt /a
        tux [0] /bin/df -h /a
        Filesystem            Size  Used Avail Use% Mounted on
        localhost:/mnt        496M  410M   61M  88% /a
After mounting '/mnt' to '/a' we can now have a look at anything that
may reside beneath the '/mnt/logs' mount point that 'sdd1' is mounted
to (step 7):
        tux [0] /bin/ls -aF /a/logs
        ./  ../  app.log  app.log.0.gz  app.log.1.gz  app.log.2.gz  app.log.3.gz
        tux [0] /usr/bin/du -sh /a/logs/*
        340M    /a/logs/app.log
        1.1M    /a/logs/app.log.0.gz
        1.2M    /a/logs/app.log.1.gz
        1.3M    /a/logs/app.log.2.gz
        1.4M    /a/logs/app.log.3.gz
        tux [0] /usr/bin/find /a -inum 83971 -exec /bin/ls -li {} \;
        /usr/bin/find: /a/lost+found: Permission denied 
        83971 -rw-r--r-- 1 root root 354418688 Nov 21 23:02 /a/logs/app.log
Well, there's your problem.  It appears that with 'sdd1' mounted
to '/mnt/logs', files were hidden below it including one at 340 MB.
It is possible that at some point the FS on 'sdd1' wasn't available
and 'myapp' instead was writing files to the directory structure on
'sdc1'.  In addition, from the above we've also verified that 'app.log'
has the same inode number as we saw earlier from the details in '/proc'
regarding FD 3 of PID 2484.  A quick review of the contents on 'sdd1' at
'/mnt/logs' for comparison further helps to verify our current analysis
(step 8):
        tux [1] /bin/ls -aF /mnt/logs
        ./   app.log       app.log.1.gz  app.log.3.gz
        ../  app.log.0.gz  app.log.2.gz  lost+found/
        tux [0] /usr/bin/du -sh /mnt/logs/*
        25M     /mnt/logs/app.log
        1.1M    /mnt/logs/app.log.0.gz
        1.2M    /mnt/logs/app.log.1.gz 
        1.3M    /mnt/logs/app.log.2.gz 
        1.4M    /mnt/logs/app.log.3.gz
        12K     /mnt/logs/lost+found
Since we've identified our issue, it's time for resolution.  We need
to unmount '/a', kill our NFS services, unmount '/mnt/logs', and remove
the problem files (step 9):
        tux [0] /bin/umount /a
        tux [0] /etc/init.d/nfs stop
        Shutting down NFS mountd:                                  [  OK  ]
        Shutting down NFS daemon:                                  [  OK  ]
        Shutting down NFS quotas:                                  [  OK  ]
        Shutting down NFS services:                                [  OK  ]
        tux [0] /etc/init.d/portmap stop
        Stopping portmap:                                          [  OK  ]
        tux [0] /bin/umount /mnt/logs   
        tux [0] pwd
        /mnt
        tux [0] /bin/ls -li logs/app.log
        83971 -rw-r--r-- 1 root root 354418688 Nov 21 23:02 logs/app.log  
        tux [0] /bin/rm logs/app.log*
        tux [0] /bin/ls -aF logs
        ./  ../
In the above, just prior to file removal, we've confirmed one last
time that app.log is indeed our problem file.  Since there should be no
files under the mount point, we've removed all of them.  Nothing left
but to remount 'sdd1' to '/mnt/logs', verify our FS are online and that
we freed up some space, and restarting our app (step 10):
        tux [0] /bin/mount /dev/sdd1 /mnt/logs
        tux [0] /bin/df -h /mnt /mnt/logs
        Filesystem            Size  Used Avail Use% Mounted on
        /dev/sdc1             496M   66M  405M  14% /mnt
        /dev/sdd1             496M   32M  440M   7% /mnt/logs
        tux [0] myapp -f /mnt/logs/app.log
        tux [0] /bin/ps -ef | /bin/grep my[a]pp
        root     11942  2226 18 23:47 pts/1    00:00:07 myapp -f /mnt/logs/app.log
FreeBSD Steps:
Step 1:

        beastie [0] /bin/df -h
        Filesystem     Size    Used   Avail Capacity  Mounted on
        /dev/da0s1a    7.0G    1.2G    5.2G    19%    /
        devfs          1.0K    1.0K      0B   100%    /dev
        /dev/da0s1d    1.9G    2.5M    1.8G     0%    /var
        procfs         4.0K    4.0K      0B   100%    /proc
        /dev/da2s1a    496M    408M     48M    89%    /mnt
        /dev/da3s1a    496M     29M    427M     6%    /mnt/logs
        beastie [0] cd /mnt
        beastie [0] /bin/ls -aF
        ./      ../     .snap/  bin/    data/   lib/    logs/
        beastie [0] /usr/bin/du -sh ./*
        778K    ./bin
         60M    ./data
        2.0M    ./lib
         29M    ./logs

Step 2:

        beastie [0] /usr/bin/fstat -fn /mnt
        USER     CMD          PID   FD  DEV    INUM       MODE SZ|DV R/W
        root     myapp       2677 text  0,91  32769     100555  17720  r
        root     myapp       2677    3  0,91  16394     100600 356515840  r
        root     ksh         1879   wd  0,91      2      40755    512  r
        beastie [0] /bin/ps ajwwxp 2677 1879
        USER   PID  PPID  PGID   SID JOBC STAT  TT       TIME COMMAND
        root  1879  1876  1879  1879    0 Is+    1    0:01.04 /bin/ksh
        root  2677  2668  2677  2668    1 I+     2    1:14.85 myapp -f /mnt/logs/app.log
        beastie [0] tty
        /dev/pts/1

Step 3:
   Of note, FreeBSD doesn't give access to the open FDs via '/proc'
   as Linux does.  'fstat', however, does allow us to output the device
   numbers of the disk that a file resides on as with it's size.  The 'ls'
   output below is to verify the disk in question:
        beastie [0] /bin/ls -ld /dev/da2s1a
        crw-r-----  1 root  operator    0,  91 Nov 21 20:21 /dev/da2s1a
        beastie [0] /usr/bin/find /mnt -inum 16394 -exec /bin/ls -li {} \;
        beastie [0]

Step 4:

        beastie [0] /bin/kill 2677
        beastie [0] /bin/ps ajwwxp 2677
        USER   PID  PPID  PGID   SID JOBC STAT  TT       TIME COMMAND
        beastie [1] /bin/df -h /mnt
        Filesystem     Size    Used   Avail Capacity  Mounted on
        /dev/da2s1a    496M    408M     48M    89%    /mnt

Step 5:

        <snip...>
        /dev/da2s1a    496M    408M     48M    89%    /mnt
        /dev/da3s1a    496M     29M    427M     6%    /mnt/logs

Step 6:

        beastie [0] cat /etc/exports
        /mnt -ro localhost 127.0.0.1
        beastie [0] /usr/sbin/rpcbind -L
        beastie [0] /usr/sbin/nfsd -u -t -n 4
        beastie [0] /usr/sbin/mountd -r
        beastie [0] [ ! -d /a ] && /bin/mkdir /a
        beastie [0] /sbin/mount_nfs -o ro,udp,nolockd localhost:/mnt /a
        beastie [0] /bin/df -h /a
        Filesystem        Size    Used   Avail Capacity  Mounted on
        localhost:/mnt    496M    408M     48M    89%    /a

Step 7:

        beastie [0] /bin/ls -aF /a/logs
        ./              app.log         app.log.1.gz    app.log.3.gz
        ../             app.log.0.gz    app.log.2.gz
        beastie [0] /usr/bin/du -sh /a/logs/*
        340M    /a/logs/app.log
        1.0M    /a/logs/app.log.0.gz
        1.1M    /a/logs/app.log.1.gz
        1.2M    /a/logs/app.log.2.gz
        1.3M    /a/logs/app.log.3.gz
        beastie [0] /usr/bin/find /a \( -inum 16394 -or -inum 32769 \) -exec /bin/ls -li {} \;
        32769 -r-xr-xr-x  1 root  wheel  17720 Nov 21 20:36 /a/bin/myapp
        16394 -rw-------  1 root  wheel  356515840 Nov 21 20:45 /a/logs/app.log

Step 8:

        beastie [0] /bin/ls -aF /mnt/logs
        ./              app.log         app.log.1.gz    app.log.3.gz
        ../             app.log.0.gz    app.log.2.gz  
        beastie [0] /usr/bin/du -sh /mnt/logs/*  
         24M    /mnt/logs/app.log
        1.1M    /mnt/logs/app.log.0.gz
        1.2M    /mnt/logs/app.log.1.gz
        1.3M    /mnt/logs/app.log.2.gz
        1.4M    /mnt/logs/app.log.3.gz

Step 9:
   
        beastie [0] /sbin/umount /a
        beastie [0] /bin/pkill -9 mountd nfsd rpcbind
        beastie [0] /sbin/umount /mnt/logs
        beastie [0] pwd
        /mnt
        beastie [0] /bin/ls -li logs/app.log
        16394 -rw-------  1 root  wheel  356515840 Nov 21 20:45 logs/app.log
        beastie [0] /bin/rm logs/app.log*
        beastie [0] /bin/ls -aF logs
        ./      ../ 

Step 10:

        beastie [0] /sbin/mount /dev/da3s1a /mnt/logs 
        beastie [0] /bin/df -h /mnt /mnt/logs  
        Filesystem     Size    Used   Avail Capacity  Mounted on
        /dev/da2s1a    496M     63M    393M    14%    /mnt
        /dev/da3s1a    496M     29M    427M     6%    /mnt/logs
        beastie [0] myapp -f /mnt/logs/app.log
        beastie [0] /bin/ps awx | /usr/bin/grep my[a]pp 
         2810   2  I+     0:06.19 myapp -f /mnt/logs/app.log