30 January 2012

Extracting Individual Files from an Archive

Too often, I only need a single file or a couple of files from an archive.
One answer to this is to just extract the entire archive so that I can
get to those particular files.  The more precise answer is to simply
extract only those files that I need and place them where I want them.
The following illustrates a few examples of how to do this with various
archive utlities.  Our details for this are:
        HOSTS:          FreeBSD, Linux, Solaris
        PROMPT:         HOST [0]
        OSes:           FreeBSD 8.2, CentOS 6.2 / Red Hat EL 6.2, Solaris 11
        ARCHIVE TYPES:  tar, cpio, pax, zip
        NOTES:          The details that follow should likely be relevant
                        on previous OS versions and possibly other OSes
Before getting to details, the section for each archive type starts
with the type in uppercase, thus the section for 'tar' is 'TAR'.  Also,
the following archival commands are part of the listed packages for the
respective OS:
        tar:
                Solaris:        pkg:/system/core-os@0.5.11-0.175.0.0.0.2.1
                Solaris (gnu):  pkg:/archiver/gnu-tar@1.26-0.175.0.0.0.2.537
                Linux:          tar-1.23-3.el6.x86_64
                FreeBSD:        base install
        cpio:
                Solaris:        pkg:/system/core-os@0.5.11-0.175.0.0.0.2.1
                Linux:          cpio-2.10-9.el6.x86_64
                FreeBSD:        base install
        pax:
                Solaris:        pkg:/system/core-os@0.5.11-0.175.0.0.0.2.1
                Linux:          pax-3.4-10.1.el6.x86_64
                FreeBSD:        base install
        unzip:
                Solaris:        pkg:/compress/unzip@6.0-0.175.0.0.0.2.537
                Linux:          unzip-6.0-1.el6.x86_64
                FreeBSD:        unzip-6.0
TAR (back to top)

As seen below, I have tarball "app1.0.tar" that I've moved to "/tmp/apps"
and listed the contents of:
        FreeBSD [0] pwd
        /tmp/apps
        FreeBSD [0] /bin/ls
        app1.0.tar
        FreeBSD [0] /usr/bin/tar tvf app1.0.tar
        drwxr-xr-x  0 root   wheel       0 Jan 28 19:16 app1.0/
        drwxr-xr-x  0 root   wheel       0 Jan 28 19:17 app1.0/bin/
        drwxr-xr-x  0 root   wheel       0 Jan 28 19:17 app1.0/etc/
        -rw-r--r--  0 root   wheel    1839 Jan 16 18:01 app1.0/etc/soc.conf
        -rw-r--r--  0 root   wheel    1699 Jan 16 18:01 app1.0/etc/README
        -r-xr--r--  0 root   wheel   77151 Jan 16 18:01 app1.0/bin/audit.soc
        -rwxr-xr-x  0 root   wheel    8484 Jan 16 18:01 app1.0/bin/flud
        -rwxr-xr-x  0 root   wheel   19507 Jan 16 18:01 app1.0/bin/getldp.pl
        -r-xr-xr-x  0 root   wheel   40729 Jan 16 18:01 app1.0/bin/ioDev
        -rwxr-xr-x  0 root   wheel    2399 Jan 16 18:01 app1.0/bin/ipfwstate
        -rwxr-xr-x  0 root   wheel    4054 Jan 16 18:01 app1.0/bin/locate_path.ksh
        -r-xr-xr-x  0 root   wheel   13433 Jan 16 18:01 app1.0/bin/lprtdiag.pl
        -r-xr-xr-x  0 root   wheel    4663 Jan 16 18:01 app1.0/bin/op82dec.pl
        -rwxr-xr-x  0 root   wheel     634 Jan 16 18:01 app1.0/bin/randpass
        -r-xr-xr-x  0 root   wheel   25902 Jan 16 18:01 app1.0/bin/showdisk.pl
        -rwxr-xr-x  0 root   wheel    4427 Jan 16 18:01 app1.0/bin/what_ports
        <snip...>
Given the above, I only want to extract 'getldp.pl' and 'lprtdiag.pl'.
The following commands will extract these two files into the directory
structures under which they are found in the tarball:
        FreeBSD [0] /usr/bin/tar xf app1.0.tar app1.0/bin/getldp.pl *lprtdiag.pl
        Linux [0] /bin/tar xf app1.0.tar app1.0/bin/getldp.pl *lprtdiag.pl
        Solaris [0] /usr/bin/tar xf app1.0.tar `/usr/bin/tar tf app1.0.tar |
        > /usr/bin/grep 'lprtdiag.pl'` app1.0/bin/getldp.pl
The flags to 'tar' are "x" (extract) and "f" (from file app1.0.tar).
Unfortunately, the default 'tar' command in Solaris doesn't handle
wildcards well, thus as a file pattern option, we've reread the tarball
and used 'grep' to identify 'lprtdiag.pl'.  If we used the gnu version
of 'tar' instead, we can use wildcards as we did with FreeBSD and Linux
("--wildcards" enables wildcards to be used on Solaris with gnu 'tar'):
        Solaris [0] /usr/gnu/bin/tar xf app1.0.tar --wildcards app1.0/bin/getldp.pl *lprtdiag.pl
In each of the above commands, we've provided the direct path for
'getldp.pl' and simply wildcarded 'lprtdiag.pl'.  Our end result is only
these files are extracted from tarball and placed into their respective
file paths that 'tar' was kind enough to create for us:
        FreeBSD [0] /usr/bin/tar xf app1.0.tar app1.0/bin/getldp.pl *lprtdiag.pl
        FreeBSD [0] /usr/bin/find . -print
        FreeBSD [0] /bin/find . -print
        .
        ./app1.0.tar
        ./app1.0
        ./app1.0/bin
        ./app1.0/bin/getldp.pl
        ./app1.0/bin/lprtdiag.pl
This is fine if we don't mind our files extracted to their contained
paths, but now I simply want the files to be dropped into the present
directory.  To do this, we can add "--strip components N" to the command,
which will remove "N" number leading components, in this case 2 of them.
(This will not work with the default Solaris 'tar', though it will work
with the gnu version.):
        FreeBSD [0] /usr/bin/tar xf app1.0.tar --strip-components 2 *ioDev app1.0/bin/showdisk.pl
        Linux [0] /bin/tar xf app1.0.tar --strip-components 2 *ioDev app1.0/bin/showdisk.pl
        Solaris [0] /usr/gnu/bin/tar xf app1.0.tar --wildcards --strip-components 2 \
        > *ioDev app1.0/bin/showdisk.pl
        FreeBSD [0] /usr/bin/find . -print
        .
        ./app1.0.tar
        ./app1.0
        ./app1.0/bin
        ./app1.0/bin/getldp.pl
        ./app1.0/bin/lprtdiag.pl
        ./ioDev
        ./showdisk.pl
        FreeBSD [0]

CPIO (back to top)

Like our 'tar' example, I have cpio archive "app1.0.cpio" that I've
moved to "/tmp/apps" and listed the contents of:
        Linux [0] pwd
        /tmp/apps
        Linux [0] /bin/ls
        app1.0.cpio
        Linux [0] /bin/cpio -itvF app1.0.cpio
        FreeBSD [0] /usr/bin/cpio -itvF app1.0.cpio
        Solaris [0] /usr/bin/cpio -itvI app1.0.cpio
        drwxr-xr-x   4 root     root            0 Jan 28 19:16 app1.0
        drwxr-xr-x   2 root     root            0 Jan 28 19:17 app1.0/bin
        -rwxr-xr-x   1 root     root        19507 Jan 16 16:32 app1.0/bin/getldp.pl
        -rwxr-xr-x   1 root     root          634 Jan 16 16:32 app1.0/bin/randpass
        -r-xr-xr-x   1 root     root        40729 Jan 16 16:32 app1.0/bin/ioDev
        -rwxr-xr-x   1 root     root         2399 Jan 16 16:32 app1.0/bin/ipfwstate
        -r-xr-xr-x   1 root     root        25902 Jan 16 16:32 app1.0/bin/showdisk.pl
        -rwxr-xr-x   1 root     root         4427 Jan 16 16:32 app1.0/bin/what_ports
        -rwxr-xr-x   1 root     root         8484 Jan 16 16:32 app1.0/bin/flud
        -r-xr-xr-x   1 root     root         4663 Jan 16 16:32 app1.0/bin/op82dec.pl
        -rwxr-xr-x   1 root     root         4054 Jan 16 16:32 app1.0/bin/locate_path.ksh
        -r-xr--r--   1 root     root        77151 Jan 16 16:32 app1.0/bin/audit.soc
        -rwxr-xr-x   1 root     root          488 Jan 16 16:32 app1.0/bin/transdate.perl
        -r-xr-xr-x   1 root     root        13433 Jan 16 16:32 app1.0/bin/lprtdiag.pl
        drwxr-xr-x   2 root     root            0 Jan 28 19:17 app1.0/etc
        -rw-r--r--   1 root     root         1839 Jan 16 16:32 app1.0/etc/soc.conf
        -rw-r--r--   1 root     root         1699 Jan 16 16:32 app1.0/etc/README
        <snip...>
        466 blocks
Of note, the traditional handling of cpio archives is to use redirects,
as in:
        Linux [0] /bin/cpio -itv < app1.0.cpio
The "-I" flag in Solaris and the "-F" flag in Linux and FreeBSD allows
us to specify the cpio archive rather than using redirects.  Either way
is acceptable, it's your choice.  Moving along, to extract 'op82dec.pl'
and 'getldp.pl' into their archive contained directory structures:
        Linux [0] /bin/cpio -F app1.0.cpio -idv *op82dec.pl app1.0/bin/getldp.pl
        FreeBSD [0] /usr/bin/cpio -F app1.0.cpio -idv *op82dec.pl app1.0/bin/getldp.pl
        Solaris [0] /usr/bin/cpio -I app1.0.cpio -idv *op82dec.pl app1.0/bin/getldp.pl
        app1.0/bin/getldp.pl
        app1.0/bin/op82dec.pl
        466 blocks
        Linux [0] /bin/find . -print
        .
        ./app1.0.cpio
        ./app1.0
        ./app1.0/bin
        ./app1.0/bin/getldp.pl
        ./app1.0/bin/op82dec.pl
The flags to 'cpio' are "-i" (extract), "-d" (to create leading
directories as needed), and "-v" (list files as they are processed).
Assuming that I want to extract a few files into my current directory
(or anywhere else for that matter), "-r" will allow me to interactively
rename the files to be extracted, including directory paths:
        Linux [0] /bin/cpio -F app1.0.cpio -ivr *ioDev app1.0/bin/showdisk.pl
        FreeBSD [0] /usr/bin/cpio -F app1.0.cpio -ivr *ioDev *app1.0/bin/showdisk.pl
        Solaris [0] /usr/bin/cpio -I app1.0.cpio -ivr *ioDev app1.0/bin/showdisk.pl
        Rename "app1.0/bin/ioDev"? ioDev
        ioDev
        Rename "app1.0/bin/showdisk.pl"? showdisk.pl
        showdisk.pl
        408 blocks
        Linux [0] /bin/find . -print
        .
        ./app1.0.cpio
        ./app1.0
        ./app1.0/bin
        ./app1.0/bin/getldp.pl
        ./app1.0/bin/op82dec.pl
        ./ioDev
        ./showdisk.pl
        Linux [0]
It's notable, that for some reason under FreeBSD while using "-r",
I needed to precede the path for 'showdisk.pl' with a wildcard (*).

PAX (back to top)

To start, pax doesn't really have its own archive format, instead it is a
utility capable of handling several archive formats.  The default format
is "ustar".  With that, we have a "pax" archive that I've placed under
"/tmp/apps" and listed its contents:
        Linux [0] pwd
        /tmp/apps
        Linux [0] /bin/ls
        app1.0.pax
        Linux [0] /usr/bin/pax -vf app1.0.pax
        FreeBSD [0] /bin/pax -vf app1.0.pax
        Solaris [0] /usr/bin/pax -vf app1.0.pax
        drwxr-xr-x  2 root     root             0 Jan 28 19:16 app1.0
        drwxr-xr-x  2 root     root             0 Jan 28 19:17 app1.0/bin
        -rwxr-xr-x  1 root     root         19507 Jan 16 16:32 app1.0/bin/getldp.pl
        -rwxr-xr-x  1 root     root           634 Jan 16 16:32 app1.0/bin/randpass
        -r-xr-xr-x  1 root     root         40729 Jan 16 16:32 app1.0/bin/ioDev
        -rwxr-xr-x  1 root     root          2399 Jan 16 16:32 app1.0/bin/ipfwstate
        -r-xr-xr-x  1 root     root         25902 Jan 16 16:32 app1.0/bin/showdisk.pl
        -rwxr-xr-x  1 root     root          4427 Jan 16 16:32 app1.0/bin/what_ports
        -rwxr-xr-x  1 root     root          8484 Jan 16 16:32 app1.0/bin/flud
        -r-xr-xr-x  1 root     root          4663 Jan 16 16:32 app1.0/bin/op82dec.pl
        -rwxr-xr-x  1 root     root          4054 Jan 16 16:32 app1.0/bin/locate_path.ksh
        -r-xr--r--  1 root     root         77151 Jan 16 16:32 app1.0/bin/audit.soc
        -rwxr-xr-x  1 root     root           488 Jan 16 16:32 app1.0/bin/transdate.perl
        -r-xr-xr-x  1 root     root         13433 Jan 16 16:32 app1.0/bin/lprtdiag.pl
        drwxr-xr-x  2 root     root             0 Jan 28 19:17 app1.0/etc
        -rw-r--r--  1 root     root          1839 Jan 16 16:32 app1.0/etc/soc.conf
        -rw-r--r--  1 root     root          1699 Jan 16 16:32 app1.0/etc/README
        <snip...>
        pax: ustar vol 1, 22 files, 256000 bytes read, 0 bytes written.
To simply extract 'getldp.pl' and 'lprtdiag.pl' into their archive
contained directory structures, we can use the following:
        Linux [0] /usr/bin/pax -rvf app1.0.pax app1.0/bin/getldp.pl *lprtdiag.pl
        FreeBSD [0] /bin/pax -rvf app1.0.pax app1.0/bin/getldp.pl *lprtdiag.pl
        Solaris [0] /usr/bin/pax -rvf app1.0.pax app1.0/bin/getldp.pl \
        > `/usr/bin/pax -f app1.0.pax | /usr/bin/grep 'lprtdiag.pl'`
        app1.0/bin/getldp.pl
        app1.0/bin/lprtdiag.pl
        pax: ustar vol 1, 22 files, 256000 bytes read, 0 bytes written.
        Linux [0] /bin/find . -print
        .
        ./app1.0.pax
        ./app1.0
        ./app1.0/bin
        ./app1.0/bin/getldp.pl
        ./app1.0/bin/lprtdiag.pl
(Of note, 'pax' doesn't handle wildcards very well under Solaris.
Like we did earlier with 'tar', as a file pattern option, we've reread
the archive and used 'grep' to identify 'lprtdiag.pl'.)  The options to
'pax' are "-r" (read/extract), "-v" (list the files being processed), and
"-f" (from file "app1.0.pax").  By adding the "-s" flag, as seen below,
we can use a basic regex to rename the files or paths.  This allows us
to extract 'ioDev' and 'showdisk.pl' to our current directory:
        Linux [0] /usr/bin/pax -rs :app1.0/bin/::p -vf app1.0.pax \
        > app1.0/bin/showdisk.pl *ioDev
        FreeBSD [0] /bin/pax -rs :app1.0/bin/::p -vf app1.0.pax \
        > app1.0/bin/showdisk.pl *ioDev
        FreeBSD [0] /bin/pax -rs :app1.0/bin/::p -vf app1.0.pax \
        > app1.0/bin/showdisk.pl `/usr/bin/pax -f app1.0.pax | /usr/bin/grep 'ioDev'`
        app1.0/bin/ioDev >> ioDev
        ioDev
        app1.0/bin/showdisk.pl >> showdisk.pl
        showdisk.pl
        pax: ustar vol 1, 22 files, 256000 bytes read, 0 bytes written.
        Linux [0] /bin/find . -print
        .
        ./app1.0.pax
        ./app1.0
        ./app1.0/bin
        ./app1.0/bin/getldp.pl
        ./app1.0/bin/lprtdiag.pl
        ./ioDev
        ./showdisk.pl
        Linux [0]
In the regex pattern above, any non-null character can be a delimiter,
I've used ":".  The "p" (optional) at the end of the regex pattern will
print the result of the substitution pattern to STDERR, thus:
        app1.0/bin/ioDev >> ioDev

ZIP (back to top)

Seen below, I've placed zip file "app1.0.zip" under "/tmp/apps" and
listed its contents:
        Solaris [0] pwd
        /tmp/apps
        Solaris [0] /usr/bin/ls
        app1.0.zip
        Solaris [0] /usr/bin/unzip -l app1.0.zip
        Linux [0] /usr/bin/unzip -l app1.0.zip
        FreeBSD [0] /usr/local/bin/unzip -l app1.0.zip
        Archive:  app1.0.zip
          Length      Date    Time    Name
        ---------  ---------- -----   ----
                0  01-29-2012 10:33   app1.0/
                0  01-29-2012 10:34   app1.0/etc/
             1699  01-16-2012 17:09   app1.0/etc/README
             1839  01-16-2012 17:09   app1.0/etc/soc.conf
                0  01-29-2012 11:40   app1.0/bin/
             2399  01-16-2012 17:09   app1.0/bin/ipfwstate
            19507  01-16-2012 17:09   app1.0/bin/getldp.pl
            77151  01-16-2012 17:09   app1.0/bin/audit.soc
            40729  01-16-2012 17:09   app1.0/bin/ioDev
            13433  01-16-2012 17:09   app1.0/bin/lprtdiag.pl
              634  01-16-2012 17:09   app1.0/bin/randpass
             4427  01-16-2012 17:09   app1.0/bin/what_ports
             4054  01-16-2012 17:09   app1.0/bin/locate_path.ksh
             4663  01-16-2012 17:09   app1.0/bin/op82dec.pl
            25902  01-16-2012 17:09   app1.0/bin/showdisk.pl
             8484  01-16-2012 17:09   app1.0/bin/flud
              488  01-16-2012 17:09   app1.0/bin/transdate.perl
        ---------                     -------
           205409                     17 files
The following extracts 'getldp.pl' and 'lprtdiag.pl' to their archive
contained directory structures:
        Solaris [0] /usr/bin/unzip app1.0.zip app1.0/bin/getldp.pl *lprtdiag.pl
        Linux [0] /usr/bin/unzip app1.0.zip app1.0/bin/getldp.pl *lprtdiag.pl
        FreeBSD [0] /usr/local/bin/unzip app1.0.zip app1.0/bin/getldp.pl *lprtdiag.pl
        Archive:  app1.0.zip
          inflating: app1.0/bin/getldp.pl
          inflating: app1.0/bin/lprtdiag.pl
        Solaris [0] /usr/bin/find . -print
        .
        ./app1.0.zip
        ./app1.0
        ./app1.0/bin
        ./app1.0/bin/getldp.pl
        ./app1.0/bin/lprtdiag.pl
If instead, I wanted to extract files 'showdisk.pl' and 'ioDev' to
the current directory structure, I can use "-j" which suppresses the
directory creation:
        Solaris [0] /usr/bin/unzip -j app1.0.zip *ioDev app1.0/bin/showdisk.pl
        Linux [0] /usr/bin/unzip -j app1.0.zip *ioDev app1.0/bin/showdisk.pl
        FreeBSD [0] /usr/local/bin/unzip -j app1.0.zip *ioDev app1.0/bin/showdisk.pl
        Archive:  app1.0.zip
          inflating: ioDev
          inflating: showdisk.pl
        Solaris [0] /usr/bin/find . -print
        .
        ./app1.0.zip
        ./app1.0
        ./app1.0/bin
        ./app1.0/bin/getldp.pl
        ./app1.0/bin/lprtdiag.pl
        ./ioDev
        ./showdisk.pl
        Solaris [0]