One Friday afternoon: Tar options script

I’ve been using tar for more years than I care to remember, and most of the time I get by with just a handful of options. And if I can’t remember, well I can google it as easily as anyone else. The XKCD comic ‘tar’ reassures me that I’m not alone.

Looking around I find files with names ending in “.gz”, “.tgz”, “.taz”, “.Z”, “.taZ”, “.bz2”, “.tz2”, “.tbz2”, “.tbz”, “.lz”, “.lzma”, “.tlz”, “.lzo” and “.xz”. I suspect most of them are compressed tar files of one sort or another, but I couldn’t really tell you much more than that.

I’ve recently been writing some bash shell scripts, and thought it would be easy enough to write something which would extract files from a tar archive. I could use the suffix of the file name to select the necessary tar option, and I wouldn’t have to resort to google when confronted with a strange file type.

As usual, Wikipedia has an informative article, from which I learn that the version of GNU tar on my system is out-of-date. I had v1.15 and it seems that lots of compression and decompression options were added between v1.20 and v1.23.  And so I update to the latest version (v1.26) and start to investigate.

I’m old enough that I learnt commands of the form “tar xf file.tar”, where the options don’t even begin with a “-“.  I’m happy to report that tar still understands that style, but can also take “tar -x -f file.tar”
or even “tar –extract –file file.tar”.

More to type, but perhaps easier to remember. (But my fingers seem to automatically type tar xf .)

But then I start to read the manual and discover that:
“Reading a compressed archive is even simpler: you don’t need to specify any additional options as GNU tar recognizes its format automatically. The format recognition algorithm is based onsignatures, special byte sequences in the beginning of file, that are specific for certain compression formats. If this approach fails, tar falls back to using archive name suffix to determine its format for a list of recognized suffixes.”

Hardly able to believe that it could be that simple, I find the first compressed tar file that I can, and type

tar xf OpenCV-2.4.4a.tar.bz2

and am amazed to see the files extracted. No need for me to write a script. I just need to leave out any options that might, or might not, be correct, and let tar work it out for itself.

Less is more.


Richard Jordan is a developer at Ellexus.

Ellexus are the developers of Breeze, a Linux dependency tracing tool that shows you what your programs are doing as they run. You can quickly search trace data to trouble shoot a problem build or installation.