Let me start by saying that most cloud storage solutions are relatively cheap to begin with, so compressing entities or streams stored in a database table or an elastic cloud store may not save you all that much. Purely in terms of cloud storage costs, you might save pennies or at most dollars if you compress optimally. In this case, optimally meaning you know your payloads will GZIP compress nicely and it makes sense to do so. In fact, if you forcefully compress entities unnecessarily you may actually increase their size!
Let's say you have a "bucket" in which you plan to store hundreds or even thousands of cached HTML documents. Common sense might tell you that HTML compresses well. Your tiger like Computer Science instincts were right: HTML, generally speaking, does compress well and is a great candidate for compression. Obviously, less bytes stored in the cloud usually means slightly reduced storage costs.
In Java, most applications represent HTML as a String literal which is really just a sequence of characters. Or to look at it another way, HTML can be thought of an array of bytes with a known character encoding. Thinking in terms of bytes, it's quite easy to GZIP compress and uncompress byte[] arrays on the fly in Java. Meet GZIPInputStream and GZIPOutputStream.
GZIP compress an InputStream and return the result as a new byte[] array:
GZIP uncompress an InputStream and return the result as a new byte[] array:
First, note that I'm using ByteArrayOutputStream's to store the resulting compressed or uncompressed byte[] array in memory. Naturally, this means that this solution may not be ideal for you depending on your application. If you're planning to compress gigs of data in memory, that could be a bad idea unless you really know what you're doing. Proper usage of this really depends on your application and your intentions.
Second, the copy() and closeQuietly() pseudo methods above are implemented for you here in my GzipCompressor utility class.
Looping back to our cached HTML example, let's compress a tiny HTML document represented here as a String:
Now that you have a compressed byte[] array, it should be trivial to store it in the cloud using your favorite cloud storage engine or database. Ideally, you'll want to tweak your entities so that PUT's and GET's automatically compress and uncompress these entities on the fly for you.
My full GzipCompressor utility class can be found here.
Enjoy.
Let's say you have a "bucket" in which you plan to store hundreds or even thousands of cached HTML documents. Common sense might tell you that HTML compresses well. Your tiger like Computer Science instincts were right: HTML, generally speaking, does compress well and is a great candidate for compression. Obviously, less bytes stored in the cloud usually means slightly reduced storage costs.
In Java, most applications represent HTML as a String literal which is really just a sequence of characters. Or to look at it another way, HTML can be thought of an array of bytes with a known character encoding. Thinking in terms of bytes, it's quite easy to GZIP compress and uncompress byte[] arrays on the fly in Java. Meet GZIPInputStream and GZIPOutputStream.
GZIP compress an InputStream and return the result as a new byte[] array:
public static final byte[] compress(final InputStream is)
throws IOException {
GZIPOutputStream gzos = null;
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
gzos = new GZIPOutputStream(baos);
copy(is, gzos);
gzos.finish(); // Important!
return baos.toByteArray();
} finally {
closeQuietly(gzos);
}
}
GZIP uncompress an InputStream and return the result as a new byte[] array:
public static final byte[] uncompress(final InputStream is)
throws IOException {
GZIPInputStream gzis = null;
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
gzis = new GZIPInputStream(is);
copy(gzis, baos);
return baos.toByteArray();
} finally {
closeQuietly(gzis);
}
}
First, note that I'm using ByteArrayOutputStream's to store the resulting compressed or uncompressed byte[] array in memory. Naturally, this means that this solution may not be ideal for you depending on your application. If you're planning to compress gigs of data in memory, that could be a bad idea unless you really know what you're doing. Proper usage of this really depends on your application and your intentions.
Second, the copy() and closeQuietly() pseudo methods above are implemented for you here in my GzipCompressor utility class.
Looping back to our cached HTML example, let's compress a tiny HTML document represented here as a String:
// Note that this HTML is tiny, and probably won't compress well at all.
// In fact, the "compressed" result may actually be larger in size than
// the uncompressed original String. This is just an example however,
// to show you how to compress a String literal.
final String html = "<html><body><h1>Horrible HTML</h1></body></html>";
// Get the UTF-8 encoded bytes from the input String. I'm assuming
// that my HTML document is UTF-8 encoded.
final byte[] uncompressed = html.getBytes(UTF_8);
System.out.println("Uncompressed: " + uncompressed.length + "-bytes.");
// Compress and report the result.
final byte[] compressed = compress(uncompressed);
System.out.println("Compressed: " + compressed.length + "-bytes.");
Now that you have a compressed byte[] array, it should be trivial to store it in the cloud using your favorite cloud storage engine or database. Ideally, you'll want to tweak your entities so that PUT's and GET's automatically compress and uncompress these entities on the fly for you.
My full GzipCompressor utility class can be found here.
Enjoy.

