Nice Things About Perl – Searching Gzip Files

Recently, I needed to search for a string across log files that were compressed by gzip. The gzipped log files span across different directories and nested subdirectories. Uncompressing the files and then using grep would not be a nice option, so what came to my mind immediately was zgrep.

Zgrep works like grep but with the ability to search compressed files. I thought this was pretty neat, until I needed to perform more complex search using regular expressions.

For some reason, I couldn’t make zgrep perform regex like \s+ or $ (end of line anchor). Rather than spend more time researching and experimenting what possible tweaks I should do, I decided to just use Perl and get it over with.

Below is my crude implementation of zgrep with full Perl regex capability.

for i in `find /var/log -name "*.gz"`; do export file=$i; gzip -dc $i | perl -ne 'print "$ENV{'file'}: $_" if /write\s+failure/'; done

It’s still a one-liner command, however, I agree it looks ugly. But once you figured out how it works, you’d realize it’s really simple.

The find command lists all the gzip files under the directory you want to search. If you don’t want to search in subdirectories, you can supply the -maxdepth option. The for loop iterates through each gzip file. Each gzip file is then uncompressed inline by gzip. But instead of uncompressing to a file, it uncompresses to stdout. The output is then fed to the Perl script. The string between the forward slashes is the regex pattern that you supply for searching. The $ENV{‘file’} is a way to access the gzip file it is operating so that it can print with the line when a match is found. Without this, you wouldn’t be able to know which gzip file a match was found.

Given access to Perl’s regex, you now have a powerful search tool.

Advertisements

Websphere’s implementation of Jython in wsadmin broke sys.argv

In CPython and JPython, the first index of sys.argv points to the name of the Python script, while the rest of the array elements refer to the arguments. However, in Websphere’s Jython, the first index already points to the first argument of the script. This means that none of the elements of sys.argv has the information about the name of the script it is running.

I think this is a serious drawback. I have some functionalities where I need access to the name of the script. For example, I need to know the directory where that script is located. This will allow me to navigate relatively to other files where the script is located like additional path to libraries (pythonpath ) or configuration files and be able to access them at runtime.

I have yet to figure out a trivial way to access the script name at runtime in wsadmin. One workaround I am planning is to write a wrapper script to grab the script name and then pass that information to wsadmin’s Jython script as the first parameter, so that would become sys.argv[0]. Or perhaps as an option parameter with the value telling it that is the script name.

However, the problem with a wrapper script is that your wsadmin Jython scripts won’t be portable since it would be dependent on the wrapper to run.

If anyone out there has ideas other than a wrapper script, I’d love to hear it.

Western Digital 1.5 TB My Book World Edition II Hard Drive

I recently purchased a personal network attached storage (NAS). It’s a 1.5TB external hard drive from Western Digital. It’s called – Western Digital 1.5 TB My Book World Edition II Hard Drive. [1] Thanks to my friend, Don, for telling me about this.

There are many good things about this product. Unfortunately, there are some bad things too. Let’s start with the good things.

The good things:

1. Cheap!

I purchased this from Amazon at US$319. [2] At this time, a standalone 500GB hard drive costs around $99. I thought this was impressive. It was the cheapest Linux-based NAS I could find out there.

2. Easy to setup.

It uses a web interface for setup and administration. Setting it up was as easy as setting up a wireless router. By default, it can do Windows-sharing type of access (CIFS/SMB). It also has this proprietary type of file access called MioNet (WDAnywhere), but I didn’t bother using it since I was only interested in using Samba and NFS. I actually disabled this hoping it would help speed up the machine.

3. It can do NFS.

If you’re going to use this with Unix, you would definitely want to have NFS access to this device. However, this is no longer available by default in the new firmware, 2.0.15+. [3] I had to install some nfs files manually (exportfs, rpc.*) to get this running. See my other post for details on how to set this up.

4. It’s a Linux machine.

It’s a small computer in itself and one of the compelling reasons I bought this was that it uses Linux. Therefore, in addition to disk storage, you can use it as a Linux machine with all the goodies like ssh, lighttpd, Perl, etc.

5. Quiet and cool.

Some reviews out there claim that this device is noisy. With two 750GB disks and a Linux machine in itself, I thought this was relatively more quiet than running your own Samba/NFS server on a desktop PC. It uses a small and low-power CPU (ARM926EJ). [6]

6. Plenty of Linux support.

You don’t want to be alone hacking this device. There are plenty of people already hacking this NAS. [4,5]

Now the bad things:

1. It’s slow!

I don’t understand what went wrong here because this NAS has a Gigabit ethernet and according to the specs, the drive speed is 7200RPM. However, I can only get a maximum of 7MB/sec of write speed. Note that this is through local access on the machine itself and not even across the network. I use the Unix command dd to measure write speed.

dd if=/dev/zero of=/tmp/x bs=1024 count=10240

10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 1.6501 seconds, 6.4 MB/s

I suspected that this is due to the CPU or IO of the machine. Therefore, even with a Gigabit ethernet and 7200 RPM drives, I doubt you can get it any faster than 7MB/s if the bottleneck is the CPU or IO.

I tested NFS write speed on a 100Mb/sec network and I can only get a maximum of 5 MB/sec.

2. NFS is disabled.

NFS used to be installed by default. All you need to do to use it was enable it. However, I learned that they removed this in versions v2.00.15 and later. [3] I had to install NFS files manually to get it up and running. I wonder why WD had to remove this. It appears that they don’t want users to be using this NAS with NFS. In addition to not supporting Linux access, WD also makes sure that it’s not there in case you’d want to use it.

3. Certain types of files cannot be shared

The software WDAnywhere restricts certain types of files from being shared. According to WD website, “Due to unverifiable media license authentication, the following file types cannot be shared…” [7] What this means is that you cannot share media files like mp3 or mpeg files even if they are your own personal audio or movie files.

I don’t have this problem with Samba and NFS. This has nothing to do with technology so I don’t want to comment about it. But I thought I find this feature amusing.

Some notes:

It’s probably a good idea to disable access to your network’s Internet gateway. If you only want to access this device on your intranet (LAN), there is no need to have it access your gateway and then the Internet. This makes it more safe in case you accidentally open up your router to your machines on the local network. This also prevents viruses or trojans from running on this device and contacting someone out there on the Internet, although highly unlikely because it uses Linux.

References:

[1] http://www.wdc.com/en/products/Products.asp?DriveID=318

[2] http://www.amazon.com/gp/offer-listing/B000RZ68IG/ref=dp_olp_2

[3] http://mybookworld.wikidot.com/nfs-server

[4] http://mybookworld.wikidot.com/

[5] http://martin.hinner.info/mybook/

[6] http://en.wikipedia.org/wiki/Western_Digital_My_Book#Internals

[7] WD FAQ – What files cannot be shared by WD Anywhere Access?

Embedded chat-room window

I added a feature to Yaploud where you can embed a chat-room window into your web page. Instead of a popup window, the chat-room window can be accessed directly from your web page. Visitors of your site can view and send chat messages on your web page.

To do this, all you need is to generate a small piece of html code which you will then insert into your web page. The html code is a simple iframe tag that you insert into your web page. For details, see

http://www.yaploud.com/chat/embed_code.php

Note: This only works if you can embed an iframe html element into your web page. For example, this would be nice to have in your web blogs, if it allows it. Unfortunately, most blogs don’t allow this like WordPress, so this won’t be possible.

Nice things about Perl

There are many nice things about Perl. But what I like best are one-liner scripts. I think nothing beats Perl when it comes to this feature.

1. A quick way of counting lines of code in your project:

$ find myproject -type f -name “*.java” | xargs perl -ne ‘print if !/^\s*$/’ | wc -l

You’ll notice that this also counts comment lines. For me, I also want to count lines of comments because they’re also work done.

2. Sometime ago, a friend asked me how to display a date two days earlier. She needed this in a shell script so the output has got to be STDOUT and can be invoked like a shell command.

$ date
Thu May 8 13:20:04 PDT 2008

$ TWO_DAYS_AGO=`perl -e ‘print scalar localtime(time() – 86400 * 2);’`

$ echo $TWO_DAYS_AGO
Tue May 6 13:20:10 2008

One can definitely write this in other languages, but I’m sure it would take more that one line of code. In addition, it would also require creating a separate script file, thus, it’s one more file to manage and maintain. While this way, it can be directly embedded in her shell script.

The key to this one-liner scripts is the -n and -e options in Perl. The -e option gives the ability to execute Perl commands inline. The -n option gives the ability to iterate over each line in STDIN. This is similar to AWK and SED but with the power of a full blown programming language.

YapLoud

YapLoud.com will launch tonight at midnight.

http://www.yaploud.com/

I will enable the index.php to make it live at exactly 12MN PDT. I hope I don’t doze off.

YapLoud is a web-based chat application where people can chat about any webpage. It’s like an IRC (Internet Relay Chat) with multiple chat rooms but each chat-room is associated to a specific webpage on the Internet.

This is written in PHP with MySQL as the database. I’m using YUI toolkit for AJAX functionalities.

Chaos in server-side or client-side of web development, which do you prefer?

On the server-side of web development, there are dozens of programming languages you can use. And for each language, there are dozens of frameworks to choose from. This has created language and framework wars. It’s a war zone out there.

On the client-side of web development, things might be less stressful you might say, because there is only one programming language. Yes, Javascript! Thank God, people agreed on using just one language.

Oh wait, we have different Javascript frameworks and toolkits. Developers on the client-side of web development now have toolkit wars. But that is not the biggest problem. In spite of having only one programming language, that doesn’t mean it works perfectly on all browsers. Developers have to battle browser incompatibilities in their Javascript code everyday. Browser wars is very much alive!

OS2008 for Nokia N800

I’ve recently upgraded my Nokia N800 to OS2008. There are a lot of improvements in this new OS. The things I liked best are:

1. No need for red pill

There is no need to enable a hidden hack to be able to add new Application Catalogs. Previously, adding a new Application Catalog was not possible by default. You had to perform a secret sequence of operations in order to activate this feature. It was called the red pill because part of the activation process prompts the user with a question whether to take the blue pill or the red pill (taken from the movie The Matrix).

Not only that is now enabled by default, the maemo repository catalog is also added. The only thing left to do is enable it so that you can start adding applications.

2. Xterm is installed by default

This is a great relief of having not to install xterm manually and go over the trouble of finding which catalog repository it is kept. It is already there and ready to use. You can launch “apt-get install” right away of your favorite applications.

There is one minor quirk though. Previously, it was installed under Extras. Now it’s under Utilities and I thought it was much harder to find since you have to scroll down the list. Believe it or not, I didn’t know that the list was scrollable. I had to Google it to find out where to access Xterm.

3. Openssh

Installing Openssh server automatically sets up the root password for you. Previously, you had to install becomeroot or sudogain-root to get root access so that you can change its password.

4. Browser

The new browser is much better and more stable.