Here's the situation, I'm often on a network that does not allow outbound traffic on port 22.  Meaning, I cannot directly "SSH out" from that network to my Linux box at home. Fair enough. However, this network does allow outbound traffic on ports 80, 443, and 8443 via a web-proxy.  That said, if I want to "SSH out" from this network to my Linux box at home, I can do so with a little tweaking of my remote Apache server and my local SSH client.

Here's how ...
Spring 3 is great at automatically resolving standard arguments into a controller request method.  For example, a primitive Spring controller might look like this ...

@Controller
@RequestMapping(value="/somepath")
public class MyController {

@RequestMapping(method={RequestMethod.GET, RequestMethod.HEAD}) public ModelAndView someMethod(final HttpServletRequest request, final Principal principal) {
// Extract some special object needed to process the request from
// the session -- this object is bound to the session elsewhere on
// a successful authentication.
final MyObject obj = (MyObject)request.getSession().getAttribute("myobjkey");
// Do actual work.
/* ... */
return new ModelAndView("someview");
}

}

In this case, Spring knows the HttpServletRequest argument represents the incoming Servlet request, and the Principal argument is the object representing the authenticated user (in the event that you're using Spring Security to manage authentication in your web-application).  On method invocation, Spring automatically resolves these arguments for you.  Neat!

However, the repetition becomes obvious where in every controller, you need to fetch the same MyObject from the session, over and over again.  Instead of repeating that line of code in every method of every controller that needs access to MyObject, what if you could tell Spring how to resolve MyObject automatically on invocation?
In July of '09, when I first learned of the "data" URL scheme, I was pumped.  With a little work, my web-applications could use the "data" URL scheme to embed actual base-64 encoded binary image data directly inside of my HTML and CSS.  In the same post, I subsequently commented on why this scheme can be incredibly useful, especially for mobile web-applications or API's that service mobile apps.  Even with significant advances in wireless networks over the past several years, traditional HTTP continues to lag (for the most part) over poor 3G and 4G networks.  For this reason, the "data" URL scheme can be a life saver -- you can embed binary image data directly inside of your HTML and CSS, freeing the device from initiating wasteful HTTP transactions to load these images later.

Today marked yet another personal milestone for my usage of the "data" URL scheme.  Building an API that services a mobile app for the HP/Palm webOS platform, I quickly rediscovered the importance of this scheme.  It turns out I can embed base-64 encoded binary image data in a JSON response payload that is sent directly to a wireless webOS device!  What this means, is that I can build my API resource to send everything the app requested, including any additional external resources like images, in a single HTTP response!
For more than a year, I got away with forgetting to close my standard I/O streams when spawning a process in Java with Runtime.getRuntime().exec().  On Linux, I was using exec() to spawn the df command to check my file system disk space usage.  Standard out from df was piped into the parent (Java) where I parsed the output to see if any partitions were getting full.  Simple enough, right?

In December 2010, I began experimenting with Java's next generation garbage collection engine, aptly named G1 (a.k.a., Garbage First).  Assuming you have Java 6 Update 14 or later you can enable the next-generation G1 garbage collector (still experimental as of Jan 2011) using the following JVM options:

-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

This post isn't about G1, so I'm not going to dig into the nitty-gritty on garbage collection.  However, I discovered that G1 isn't as aggressive as the current Java garbage collector with regards to cleaning up streams.  Bug in G1?  Maybe, maybe not.  Regardless, my app ran for a week or two with G1 enabled then I started to see all sorts of silly java.net.SocketException's claiming I had "Too many open files".  Using the trusty lsof command, I saw that my Java process had left open a ton of stranded pipes.  Definitely an indication of a leak somewhere ...

#/> lsof -p 23064 | grep pipe
...
java 23064 mark 996w FIFO 0,7 152581 pipe
java 23064 mark 997r FIFO 0,7 152309 pipe
java 23064 mark 998r FIFO 0,7 152448 pipe
java 23064 mark 999w FIFO 0,7 152720 pipe
java 23064 mark 1000w FIFO 0,7 152859 pipe
java 23064 mark 1001r FIFO 0,7 152583 pipe
java 23064 mark 1002w FIFO 0,7 153134 pipe
java 23064 mark 1003r FIFO 0,7 152722 pipe
java 23064 mark 1004w FIFO 0,7 154801 pipe
java 23064 mark 1005r FIFO 0,7 152861 pipe
java 23064 mark 1006w FIFO 0,7 152997 pipe
java 23064 mark 1007w FIFO 0,7 153564 pipe
java 23064 mark 1008r FIFO 0,7 152999 pipe
java 23064 mark 1009r FIFO 0,7 153136 pipe
java 23064 mark 1010r FIFO 0,7 153278 pipe
java 23064 mark 1011w FIFO 0,7 153406 pipe
java 23064 mark 1012w FIFO 0,7 153713 pipe
...

With a little persistence, I crawled through my code looking for any obvious problem spots -- places where I forgot to close a stream -- and discovered that my calls to exec() were problematic.  Calling exec() returns a Process object for the child where all standard I/O ops are redirected to the parent through three streams: STDOUT, STDIN, STDERR.  It turns out you have to explicitly close these streams when you're done with the child otherwise they are left open!  And, as you can see in the lsof output above, I was not closing these streams causing a nasty leak which eventually brought down my application.

Going back to differences in the garbage collectors, it seems that the current default garbage collector cleaned up after my mess (closed the streams for me), but G1 did not.  Hence why I never saw the "Too many open files" exception until I enabled G1.

That said, the undocumented proper way of handing a Process object and its corresponding I/O streams is to wrap the exec() call in a try-finally block, closing the STDOUT, STDIN, and STDERR streams when you're done with the Process object.  The abstract class java.lang.Process exposes these three streams to you via getOutputStream(), getInputStream() and getErrorStream() which you must explicitly close.

Here's the pseudo code:

import static org.apache.commons.io.IOUtils.closeQuietly;

Process p = null;
try {
p = Runtime.getRuntime().exec(...);
// Do something with p.
} finally {
if(p != null) {
closeQuietly(p.getOutputStream());
closeQuietly(p.getInputStream());
closeQuietly(p.getErrorStream());
}
}

Note that closeQuietly() is part of the Apache Commons IOUtils library -- it's a helper method to close a stream ignoring nulls and exceptions.  With this change in place, I redeployed my app and sure enough the problem was resolved.

Lesson learned: regardless of what garbage collector you're using, it's always a good idea to explicitly close the STDOUT, STDIN, and STDERR streams associated with a Process object when you are done with it.

Enjoy.
Let me start by saying that most cloud storage solutions are relatively cheap to begin with, so compressing entities or streams stored in a database table or an elastic cloud store may not save you all that much.  Purely in terms of cloud storage costs, you might save pennies or at most dollars if you compress optimally.  In this case, optimally meaning you know your payloads will GZIP compress nicely and it makes sense to do so.  In fact, if you forcefully compress entities unnecessarily you may actually increase their size!

Let's say you have a "bucket" in which you plan to store hundreds or even thousands of cached HTML documents.  Common sense might tell you that HTML compresses well.  Your tiger like Computer Science instincts were right: HTML, generally speaking, does compress well and is a great candidate for compression.  Obviously, less bytes stored in the cloud usually means slightly reduced storage costs.

In Java, most applications represent HTML as a String literal which is really just a sequence of characters.  Or to look at it another way, HTML can be thought of an array of bytes with a known character encoding.  Thinking in terms of bytes, it's quite easy to GZIP compress and uncompress byte[] arrays on the fly in Java.  Meet GZIPInputStream and GZIPOutputStream.

GZIP compress an InputStream and return the result as a new byte[] array:

public static final byte[] compress(final InputStream is)
throws IOException {
GZIPOutputStream gzos = null;
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
gzos = new GZIPOutputStream(baos);
copy(is, gzos);
gzos.finish(); // Important!
return baos.toByteArray();
} finally {
closeQuietly(gzos);
}
}

GZIP uncompress an InputStream and return the result as a new byte[] array:

public static final byte[] uncompress(final InputStream is)
throws IOException {
GZIPInputStream gzis = null;
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
gzis = new GZIPInputStream(is);
copy(gzis, baos);
return baos.toByteArray();
} finally {
closeQuietly(gzis);
}
}

First, note that I'm using ByteArrayOutputStream's to store the resulting compressed or uncompressed byte[] array in memory.  Naturally, this means that this solution may not be ideal for you depending on your application.  If you're planning to compress gigs of data in memory, that could be a bad idea unless you really know what you're doing.  Proper usage of this really depends on your application and your intentions.

Second, the copy() and closeQuietly() pseudo methods above are implemented for you here in my GzipCompressor utility class.

Looping back to our cached HTML example, let's compress a tiny HTML document represented here as a String:

// Note that this HTML is tiny, and probably won't compress well at all.
// In fact, the "compressed" result may actually be larger in size than
// the uncompressed original String. This is just an example however,
// to show you how to compress a String literal.
final String html = "<html><body><h1>Horrible HTML</h1></body></html>";

// Get the UTF-8 encoded bytes from the input String. I'm assuming
// that my HTML document is UTF-8 encoded.
final byte[] uncompressed = html.getBytes(UTF_8);
System.out.println("Uncompressed: " + uncompressed.length + "-bytes.");

// Compress and report the result.
final byte[] compressed = compress(uncompressed);
System.out.println("Compressed: " + compressed.length + "-bytes.");

Now that you have a compressed byte[] array, it should be trivial to store it in the cloud using your favorite cloud storage engine or database.  Ideally, you'll want to tweak your entities so that PUT's and GET's automatically compress and uncompress these entities on the fly for you.

My full GzipCompressor utility class can be found here.

Enjoy.
In many web-service infrastructures, it's often desirable to disable the caching of redirects.  Specifically, you might want to set the Expires or Cache-Control headers so that your 301 or 302 redirects from Apache's mod_rewrite are never cached upstream.  Off the top of my head, I can think of a number of reasons why you might want to prevent the caching of a redirect:

  • Your redirect may change from one request, to the next.  Disable caching so the client (the browser) isn't redirected to the same destination every time.

  • Your web-application is behind a reverse caching proxy, and you don't want the caching proxy to cache the redirect.

  • In development, you're sitting behind a corporate web-proxy that is notorious for caching content when it really shouldn't.  Disable caching on the redirects so you can verify that your web-application is working as expected during testing (assuming the web-proxy obeys your Cache-Control and Expires headers).

  • Your web-application counts how many times someone is redirected.  Disable caching so your click-through statistics are a bit more accurate.

Surprisingly, this seemingly common need isn't well documented in the official Apache docs.  So, here's how to do it.

In this example, I'm redirecting based on the Host.  If the incoming request does not match the Host I require, mod_rewrite triggers a 301 redirect to the correct Host.  Of course, your RewriteCond's might be different.

RewriteCond %{HTTP_HOST} !^mark\.koli\.ch [NC]
RewriteRule ^/(.*)$ http://mark.koli.ch/$1 [R=301,L,E=nocache:1]

## Set the response header if the "nocache" environment variable is set
## in the RewriteRule above.
Header always set Cache-Control "no-store, no-cache, must-revalidate" env=nocache

## Set Expires too ...
Header always set Expires "Thu, 01 Jan 1970 00:00:00 GMT" env=nocache

In this example, when the RewriteRule is fired the "nocache" environment variable is set.  Note the E=nocache:1 rewrite flag in the RewriteRule.  Subsequently, mod_headers will set the Cache-Control and Expires headers only if this "nocache" environment variable is set.  In other words, "nocache" is only set on a 301 redirect from the RewriteRule.

This works nicely.

GET /wombat HTTP/1.1
Host: koli.ch

HTTP/1.1 301 Moved Permanently
Date: Sat, 11 Dec 2010 19:36:09 GMT
Location: http://mark.koli.ch/wombat
Server: Apache
Cache-Control: no-store, no-cache, must-revalidate
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-Length: 230
Content-Type: text/html; charset=iso-8859-1
Connection: close

Yay for HTTP.
For months, I've had a spare 20" HP LCD-2065 display sitting under my desk at the office.  With a few extra cycles on my hands, I decided to take half-a-day and setup a truly bad ass developers workstation: three, 20-inch monitors, Xinerama'ed to produce a single 4800x1200 pixel desktop (each display driving 1600x1200 @ 60 Hz).  And, best of all, the HP Z600 Workstation powering this monster is running 64-bit 10.04 Ubuntu Linux.

ubuntu-hp-z600-nvidia-fx-1800.jpg

Not bad, eh?

Here's how I did it ...
Amazon's SQS (Simple Queue Service) uses XML formatted payloads to push and pop messages to and from an SQS queue.  In other words, the underlying request and response bodies are XML payloads, containing among other things, a message ID and a message receipt handle.  This means that the message body itself (the thing you're pushing onto the queue) ultimately has to be properly XML escaped to work right.  Well it turns out, Amazon's own AWS library is very flaky here, and does not do the right thing when unmarshalling an XML SQS response back into an Object.  Essentially, it gets confused when it encounters an XML escaped ampersand in the message body.  For example, this popped message (in pseudo XML) fails miserably to unmarshall:

<sqs><message>{"music":"rock &amp; roll"}</message></sqs>

Note that the ampersand in the message body is properly XML escaped, however, Amazon's own AWS library returns this as the message body once unmarshalled:

{"music":"rock &

In other words, the message body ends abruptly at the ampersand, and needless to say, any reasonable JSON library will fail to parse this malformed block of nonsense.

Solution: if you are going to use SQS, and there's a possibility the messages you're going to push onto a queue will have issues with XML escaping, it appears you should always base-64 encode the message body, then base-64 decode it when popping.  Fortunately, the Apache Commons Codec library has a wonderful Base64 class to handle this encoding and decoding mess for you.

Bottom line, if you're encountering encoding issues with Amazon's SQS, better check if your messages are able to make it through the XML marshalling and unmarshalling process.  If not, you'll need to base-64 encode and decode your messages to dance around bugs in Amazon's AWS library.

Good luck, and hope this helps.