Nice Things About Perl – Searching Gzip Files

Recently, I needed to search for a string across log files that were compressed by gzip. The gzipped log files span across different directories and nested subdirectories. Uncompressing the files and then using grep would not be a nice option, so what came to my mind immediately was zgrep.

Zgrep works like grep but with the ability to search compressed files. I thought this was pretty neat, until I needed to perform more complex search using regular expressions.

For some reason, I couldn’t make zgrep perform regex like \s+ or $ (end of line anchor). Rather than spend more time researching and experimenting what possible tweaks I should do, I decided to just use Perl and get it over with.

Below is my crude implementation of zgrep with full Perl regex capability.

for i in `find /var/log -name "*.gz"`; do export file=$i; gzip -dc $i | perl -ne 'print "$ENV{'file'}: $_" if /write\s+failure/'; done

It’s still a one-liner command, however, I agree it looks ugly. But once you figured out how it works, you’d realize it’s really simple.

The find command lists all the gzip files under the directory you want to search. If you don’t want to search in subdirectories, you can supply the -maxdepth option. The for loop iterates through each gzip file. Each gzip file is then uncompressed inline by gzip. But instead of uncompressing to a file, it uncompresses to stdout. The output is then fed to the Perl script. The string between the forward slashes is the regex pattern that you supply for searching. The $ENV{‘file’} is a way to access the gzip file it is operating so that it can print with the line when a match is found. Without this, you wouldn’t be able to know which gzip file a match was found.

Given access to Perl’s regex, you now have a powerful search tool.

Nice things about Perl

There are many nice things about Perl. But what I like best are one-liner scripts. I think nothing beats Perl when it comes to this feature.

1. A quick way of counting lines of code in your project:

$ find myproject -type f -name “*.java” | xargs perl -ne ‘print if !/^\s*$/’ | wc -l

You’ll notice that this also counts comment lines. For me, I also want to count lines of comments because they’re also work done.

2. Sometime ago, a friend asked me how to display a date two days earlier. She needed this in a shell script so the output has got to be STDOUT and can be invoked like a shell command.

$ date
Thu May 8 13:20:04 PDT 2008

$ TWO_DAYS_AGO=`perl -e ‘print scalar localtime(time() – 86400 * 2);’`

$ echo $TWO_DAYS_AGO
Tue May 6 13:20:10 2008

One can definitely write this in other languages, but I’m sure it would take more that one line of code. In addition, it would also require creating a separate script file, thus, it’s one more file to manage and maintain. While this way, it can be directly embedded in her shell script.

The key to this one-liner scripts is the -n and -e options in Perl. The -e option gives the ability to execute Perl commands inline. The -n option gives the ability to iterate over each line in STDIN. This is similar to AWK and SED but with the power of a full blown programming language.