UPDATE: HOWTO: Make Your Own Tiny URL Alias Service (TinyURL, Bit.ly, Twurl.cc, etc.) With Apache RewriteMap (Bad URL Escaping, using prg:)

| 2 TrackBacks
Yesterday I posted a HOWTO that explained how to use Apache's mod_rewrite RewriteMap to create your own tiny URL alias redirect service (like bit.ly, tinyurl.com, etc.).  A few hours after I posted this HOWTO, I realized that using mod_rewrite's RewriteMap with the "txt:" map-type doesn't always work the way you want it to.  Come to find out, when you use RewriteMap with a "txt:" map file, Apache uses its own internal URL escaping mechanism (Adrian Sutton) that tweaks the URL you're trying to redirect to.  For example, a URL that contains ...

"© Mark Kolich" becomes "%C2%A9%20Mark%20Kolich"
or
"^DJI" becomes "%5eDJI"


This is OK, except that a lot of web-apps I'm trying to redirect to don't properly un-escape the URL before processing it (like Yahoo's interactive charts).  I did a little more research on this and realized the only solid alternative was to write a Perl script that mapped a key to a URL using mod_rewrite's "prg:" map-type.  Hence, avoiding Apache's own internal URL escaping mechanism.
To get this to work, I had to tweak my httpd.conf file a little bit.  My VirtualHost for kolich.cc (my own URL alias redirect service) is now:

<VirtualHost *:80>

DocumentRoot /my/server/root/kolich.cc/
ServerName www.kolich.cc
ServerAlias kolich.cc
ServerAdmin support@kolich.com

RewriteEngine On
RewriteMap mapper prg:/my/server/root/kolich.cc/mapper.pl
RewriteRule ^/(.*)$ ${mapper:$1|http://mark.koli.ch} [R=301,L]

ErrorLog logs/kolich.cc-error_log
CustomLog logs/kolich.cc-access_log combined

</VirtualHost>

Note that the RewriteMap directive now references /my/server/root/kolich.cc/mapper.pl using the "prg:" map-type.  Using this mechanism, Apache won't apply its own URL escaping scheme to the URL's I'm redirecting the user to.  Instead, it redirects directly to the URL I want based on the output of my mapper.pl Perl script (which can escape URL's if I want it to).

Here's my mapper.pl.  Note that the code is well documented so you should be able to follow along.  This Perl script is called by mod_rewrite to determine which URL the user should be redirected to based on the alias (the key).  If you don't understand all of this, you might want to read my previous post to see what I'm doing.

#!/usr/bin/perl

use constant MAPFILE =>
"/my/server/root/kolich.cc/map.txt";

## The URL to redirect to if there was
## an error reading/opening the MAPFILE
use constant DEFAULT_URL =>
"http://mark.koli.ch";

## A common mistake is to use buffered I/O on stdout.
## Avoid this, as it will cause a deadloop!
## $|=1 is used to prevent this.
$|=1;

while(<STDIN>){
print lookup($_);
}

sub lookup {

## BTW, returning "NULL" to mod_rewrite means
## that the lookup/key wasn't found so then
## mod_rewrite will fallback to its backup lookup
## mechanism.
my $key = shift;
my $url = "NULL\n";
chomp $key;

## Since I'm taking the key and sticking it
## right into a regular expression, there's a
## chance that someone would do something nasty
## So I need strip out any non word characters
## from the user input. This avoids any of those
## annoying regular expression injection problems.
$key =~ s/\W//sg;

## If the key is empty, bail
return $url if $key =~ /^$/;

## You DONT want to open() then ||die if there was any
## problem reading the MAPFILE. In this case, if the
## mapper couldn't open the map, then just return our
## DEFAULT_URL
open( HANDLE, "< " . MAPFILE ) || return DEFAULT_URL."\n";
while(<HANDLE>){

## Skip empty lines, and lines with comments
## in the mapfile. Comment lines start with a #
next if $_ =~ /^$/ || $_ =~ /^#/;

if($_=~m/^($key)\s+/i){
@line = split(/\s{2,}/);
chomp $line[1];
$url = $line[1]."\n";
last;
}

}
close( HANDLE );

## You need a \n newline at the end of a string
## your gonna send back to Apache with the URL
## to redirect to. I'm not sure if you need a
## \r too, but \n alone seems to work fine in
## my case.
return $url;

}


You can download my mapper.pl here.  This code works much better than my previous solution.  I can now redirect users to more complex URL's without having to worry about Apache tweaking them before they're sent to the browser.

Did You Find this Helpful?

Did you find this post helpful, or at least, interesting?

  

Send Mark a Direct Message

If you'd like to send me a direct message, please do so below. However, I do not publicly post comments or messages submitted directly to me. So, if you're going to try to SPAM me, or my blog, you're pretty much wasting your time.

400 characters remaining

Error

About Mark

A Silicon Valley native, Mark Kolich is a full-time Software Engineer, a casual entrepreneur, and a consultant for hire. A web technologies expert, his current focus is on building powerful and robust cloud-driven web-applications using Java, PHP, Perl, AJAX, DHTML, CSS, and JavaScript. His favorite programming languages are PHP, Java and JavaScript. He uses Linux, enjoys biking to work, loves building great software, and always writes elegant, readable, and maintainable code.

2 TrackBacks

About This Site from Mark S. Kolich on March 18, 2010 11:57 PM

I'm Mark Kolich, and this is my personal weblog.  I usually write about my technical interests, which mostly relate to client and server-side web technologies on various platforms and in numerous programming languages.  I enjoy solving tough ... Read More

A blog reader recently contacted me via email and asked, "hey, how does your koli.ch tiny URL thing work?"  Well, I would be happy to explain.  As previously discussed here, and here, I'm not using Apache's mod_rewrite RewriteMap engine. ... Read More

Twitter (@markkolich)

Translate

About this Entry

This page contains a single entry by Mark Kolich published on March 6, 2009 10:15 AM.

HOWTO: Make Your Own Tiny URL Alias Service (TinyURL, Bit.ly, Twurl.cc, etc.) With Apache RewriteMap was the previous entry in this blog.

Max Size of a MySQL UNSIGNED BIGINT(20) Relating to Hibernate is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.