While working with filesystems, one will inevitably have to deal with a
file that is quickly growing and becoming unmanageable. Unfortunately,
this situation normally becomes evident because the space of a volume
has been nearly exhausted. If retention is necessary, you could set up
the file for log rotation and compression. If the data isn't crucial,
you could simply zero out the file or remove it. The details below
concern a file removed while still open by a process, and that still
needs to be retained. An easy example would be /var/adm/messages held
open by syslogd. While the following uses Solaris 10, the same steps
should be applicable on Solaris 8 and 9, as well.
INFO
host: snorkle
OS: Solaris 10
prompt: snorkle [0]
device: c1t2d0s0
file: /mnt/logfiles/messages
DETAILS
To set up our test case above, I've simply run a tail in another terminal
against /mnt/logfiles/messages. The purpose is to hold open a file
descriptor to the file, similar to how syslogd would (though syslogd
would hold it open for writes):
snorkle [0] /usr/bin/tty
/dev/pts/3
snorkle [0] pwd
/
snorkle [0] /usr/bin/tail -f /mnt/logfiles/messages
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
In the details that follow, our determined admin (Wendy) wouldn't
initially have knowledge of the above. For reference, the following is
relevant to our admin who has just logged in:
snorkle [0] /usr/bin/tty
/dev/pts/1
snorkle [0] pwd
/
In checking out the filesystem, Wendy notes that the FS on c1t2d0s0 is
at 47%, not necessarily a problem, but she checks it out anyway:
snorkle [0] /usr/sbin/df -h /mnt/logfiles
Filesystem size used avail capacity Mounted on
/dev/dsk/c1t2d0s0 477M 198M 231M 47% /mnt/logfiles
snorkle [0] cd /mnt/logfiles
snorkle [0] /usr/bin/ls -li
total 404092
8 -rw-r--r-- 1 root root 71680 Oct 21 16:16 ftplog
4 -rw-r--r-- 1 root root 307200 Oct 21 16:15 httpd
3 drwx------ 2 root root 8192 Oct 21 16:11 lost+found/
6 -rw-r--r-- 1 root root 507904 Oct 21 16:16 maillog
5 -rw-r--r-- 1 root root 205541050 Oct 21 18:42 messages
10 -rw-r--r-- 1 root root 23985 Oct 21 16:23 messages.0.gz
7 -rw-r--r-- 1 root root 282624 Oct 21 16:17 snmpd.log
Being the good admin that she is, upon seeing messages at 196 MB
(205541050 bytes), Wendy decides to rotate out the file. Instead of
moving the file so it can be compressed, in her haste Wendy instead
removes the file:
snorkle [0] /usr/bin/rm messages
snorkle [0] /usr/bin/ls -li
total 2412
8 -rw-r--r-- 1 root root 71680 Oct 21 16:16 ftplog
4 -rw-r--r-- 1 root root 307200 Oct 21 16:15 httpd
3 drwx------ 2 root root 8192 Oct 21 16:11 lost+found/
6 -rw-r--r-- 1 root root 507904 Oct 21 16:16 maillog
10 -rw-r--r-- 1 root root 23985 Oct 21 16:23 messages.0.gz
7 -rw-r--r-- 1 root root 282624 Oct 21 16:17 snmpd.log
Well, that's a predicament. A momentary lapse in thought before typing
and we see above that 'messages' has indeed been removed; so much for
rotating it out. Well, at least it should have reduced the currently
used space on the filesystem:
snorkle [0] /usr/sbin/df -h /mnt/logfiles
Filesystem size used avail capacity Mounted on
/dev/dsk/c1t2d0s0 477M 198M 231M 47% /mnt/logfiles
Huh, no good. The used space hasn't diminished at all. The 'fuser'
command will come in handy here (see note 1):
snorkle [0] /usr/sbin/fuser -cu /mnt/logfiles
/mnt/logfiles: 1073c(root) 1062o(root) 499c(root)
Our 'fuser' output shows three active processes under /mnt/logfiles, one
of them may be our problem by holding open 'messages'. Also, we see that
the three processes are euid root in the parens '()' following the PIDs.
Of note, the options passed to 'fuser' and the characters after the PIDs:
options:
-c reports on a mount point and the files contained within
-u reports the user ID in parens following the PIDs
output chars:
c the process is using the file as its current directory
o the process is holding open a file
Supporting the output from 'fuser', Wendy runs 'ps' to determine any
further information about the processes:
snorkle [0] /usr/bin/ps -f -p "$(/usr/sbin/fuser -c /mnt/logfiles 2>/dev/null)"
UID PID PPID C STIME TTY TIME CMD
root 499 493 0 15:13:00 pts/1 2:33 /bin/ksh
root 1062 704 0 18:43:52 pts/3 0:00 /usr/bin/tail -f /mnt/logfiles/messages
The 'ps' above requests a full listing (-f) for the selected PIDs (-p),
using 'fuser' output for the PID selection. (Redirecting STDERR from
'fuser' results in only the PIDs being displayed, thus being usable for
input to 'ps'.) Since we saw earlier that Wendy is logged into tty1,
we can disregard PID 499. Next, it seems that a process run by 'root'
with a PID of 1062 is running a 'tail' against 'messages'. To quote
Adam Savage, "well that's your problem." For our purposes, we'll
see why this isn't actually a problem, but instead saves the day.
In UNIX, if a file is still held open when it is removed, the filename
and inode are cleared. Until the process holding open said file closes
the file's file descriptor, the space consumed by the file still remains.
It is also potentially possible to recover that file.
Using 'pfiles', Wendy checks out any opened files by process 1062:
snorkle [0] /usr/bin/pfiles /proc/1062
1062: /usr/bin/tail -f /mnt/logfiles/messages
Current rlimit: 256 file descriptors
0: S_IFREG mode:0644 dev:32,128 ino:5 uid:0 gid:0 size:205541050
O_RDONLY|O_LARGEFILE
1: S_IFCHR mode:0620 dev:287,0 ino:12582922 uid:0 gid:7 rdev:24,3
O_RDWR|O_NOCTTY|O_LARGEFILE
/devices/pseudo/pts@0:3
2: S_IFCHR mode:0620 dev:287,0 ino:12582922 uid:0 gid:7 rdev:24,3
O_RDWR|O_NOCTTY|O_LARGEFILE
/devices/pseudo/pts@0:3
Alright, based on the output, entry '0' is our file. It has the same
inode number (ino:5) and file size (size:205541050) that we saw in our
earlier directory listing prior to the file's removal. (If the file
descriptor is still being written to, the file size will have increased
since the previous 'ls' output.) Since the file descriptor is still
open on what used to be 'messages', we can still access the file contents
in memory thanks to the /proc pseudo filesystem:
snorkle [0] /usr/bin/ls -l /proc/1062/fd
total 401680
-r--r--r-- 0 root root 205541050 Oct 21 18:42 0
c--------- 1 root tty 24, 3 Oct 21 18:43 1
c--------- 1 root tty 24, 3 Oct 21 18:43 2
Above, Wendy runs 'ls' on the file descriptors directory for process
1062. Lo and behold, our file is still there only as '0', which we
identified with 'pfiles'. (The other two files are character devices
relevant to the tty associated with the 'tail' process.) Time to recover
it before we kill that 'tail' process:
snorkle [0] /usr/bin/cp /proc/1062/fd/0 /tmp/messages.recov
snorkle [0] /usr/bin/ls -l /tmp/messages.recov
-r--r--r-- 1 root root 205541050 Oct 21 18:51 /tmp/messages.recov
snorkle [0] /usr/bin/tail -10 /tmp/messages.recov
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boy
all work and no play makes jack a dull boysnorkle [0]
It looks like our file is good. The output of messages.recov corresponds
to what we expect for this file as well as the original output we saw
from the initial setup. Cool, now that we recovered 'messages', it's time
for cleanup. Assuming the process that was holding open this file was a
daemon, the file 'messages' may or may not be recreated during a restart
or a HUP of the process. For the sake of tidiness, recreate the file.
(Of note, you may need to reset the appropriate permissions on the file,
which we already gathered from our 'pfiles' command):
snorkle [0] /usr/bin/touch messages
snorkle [0] /usr/bin/ls -l messages
-rw-r--r-- 1 root root 0 Oct 21 18:55 messages
Since it's no longer needed and is tailing a "non-existent" file anyway,
we can 'kill' the 'tail' process from earlier:
snorkle [0] /usr/bin/kill 1062
snorkle [0] /usr/bin/ls /proc/1062
/proc/1062: No such file or directory
snorkle [2] /usr/bin/df -h /mnt/logfiles
Filesystem size used avail capacity Mounted on
/dev/dsk/c1t2d0s0 477M 2.2M 428M 1% /mnt/logfiles
The 'tail' process is now dead, the /proc directory for the process is
gone, and our 'df' of the filesystem on c1t2d0s0 shows the reclaimed
space. Below, Wendy compresses the recovered messages file in /tmp
and moves it to its rightful place under /mnt/logfiles:
snorkle [0] /usr/bin/gzip /tmp/messages.recov
snorkle [0] /usr/bin/ls -l /tmp/messages.recov.gz
-r--r--r-- 1 root root 598060 Oct 21 18:51 /tmp/messages.recov.gz
snorkle [0] /usr/bin/mv /tmp/messages.recov.gz /mnt/logfiles/.
snorkle [0] /usr/bin/ls -li
total 3612
8 -rw-r--r-- 1 root root 71680 Oct 21 16:16 ftplog
4 -rw-r--r-- 1 root root 307200 Oct 21 16:15 httpd
3 drwx------ 2 root root 8192 Oct 21 16:11 lost+found/
6 -rw-r--r-- 1 root root 507904 Oct 21 16:16 maillog
5 -rw-r--r-- 1 root root 0 Oct 21 19:00 messages
10 -rw-r--r-- 1 root root 23985 Oct 21 16:23 messages.0.gz
9 -r--r--r-- 1 root root 598060 Oct 21 18:51 messages.recov.gz
7 -rw-r--r-- 1 root root 282624 Oct 21 16:17 snmpd.log
snorkle [0] /usr/bin/df -h /mnt/logfiles
Filesystem size used avail capacity Mounted on
/dev/dsk/c1t2d0s0 477M 2.8M 427M 1% /mnt/logfiles
A few closing notes. The above recovery was possible only because another
process held open a file descriptor to the file removed. Had that not
been the case, file recovery would have been much more difficult, if
not impossible. Additionally, the setup shows one reason why removing
a file may not initially reclaim filesystem space.
Note 1: Thanks to Blog O' Matty for noting the 'fuser trick' picked up
from Pete Shanahan.
Note 2: Mike Tremell's 'file recovery by inode number' somewhat inspired
this particular post. Have a look.
see also:
Find Open Files in Solaris
Removing / Recovering an Open File (Linux)