Content tagged: java

Introducing Boildown

654f24ba6f56141fbb36d3291aa5af29d6c42098

Fri Mar 25 12:31:08 2016 -0700

I’ve been poking at a fun side project lately, exploring how to compress/uncompress arbitrary streams flowing between two sockets. I ended up with something that’s a little hacky, but surprisingly works quite well.

Introducing Boildown.

From a remote location (usually from work or on the road), I SSH home quite regularly and port-forward to several services behind NAT on my home network: SSH, remote desktop, web-cams, etc. I was curious to see if I could write something general that compresses traffic flowing between two sockets in an attempt to improve the overall “remote experience”. That is, compress the bidirectional traffic flowing over-the-wire to see if I could make things “faster”.

Orthogonally, I kinda wanted an excuse to play with LZF and Snappy.

How it works

Boildown listens on a local port, compresses (or decompresses) incoming traffic, and forwards the result to its destination. It’s like SSH port-forwarding, but the bidirectional network traffic flowing through Boildown is automatically compressed or decompressed, depending on how it’s configured. In essence, Boildown provides a compressed “pipe” connecting two nodes on a network.

Boildown is entirely protocol agnostic — it knows nothing about the protocol of the data flowing through it, and works transparently with any protocol that can be expressed over TCP/IP. The most common being HTTP (port 80), HTTPS (port 443), and SSH (port 22). This was key for me, because I wanted to build something general — a tool that isn’t protocol or application specific, and an app I could just stick between a sender and a receiver on a network and (ideally) see some sort of performance benefit with compression.

And so, Boildown v1 supports the following framed or “block” codecs:

Usage

There’s two sides (or “modes”) to Boildown:

  • Compressor — listens on a local port, compresses outgoing traffic, and forwards the compressed data to another host.
  • Decompressor — listens on a local port, decompresses incoming traffic, and forwards the original (uncompressed) result to its destination.

Assuming you’d want to SSH to remote:22, here’s how you’d create a compressed pipe using Boildown for an SSH session between localhost:10022 and remote:22:

+--------- [localhost] ---------+                               +----------- [remote] ------------+
| --compress 10022:remote:10022 | <---- (compressed pipe) ----> | --decompress 10022:localhost:22 |
+-------------------------------+                               +---------------------------------+

A Boildown compressor listens at localhost:10022 and forwards compressed traffic to the decompressor listening at remote:10022. Any bytes received by the decompressor at remote:10022 are decompressed and forwarded to the SSH server daemon listening locally on localhost:22. Of course, traffic flowing the other way, remote:22 back to localhost:10022, is compressed and decompressed in the same way.

Hence, a bidirectional, compressed network pipe.

On localhost

Start a compressor on localhost:10022, forwarding compressed traffic to remote:10022:

java -jar boildown-0.1-SNAPSHOT-runnable.jar --compress 10022:remote:10022 --zlib

On remote

Start a decompressor on remote:10022, forwarding decompressed traffic to localhost:22:

java -jar boildown-0.1-SNAPSHOT-runnable.jar --decompress 10022:localhost:22 --zlib

Connect-the-dots

On localhost, start a new SSH session, funneling traffic through the Boildown managed compressed pipe:

ssh -p 10022 localhost

Compression codecs

Specify --zlib, --snappy, or --lzf on the command line to use any of the 3 supported compression codecs.

Note, both sides of the pipe need to be using the same codec (obviously).

Thread pool

The compressor and decompressor implementations run within threads. The size of the internal thread pool used by Boildown can be controlled with the --poolSize N argument, where N is the maximum number of desired threads in the pool.

By default, if --poolSize is omitted, the internal thread pool is sized to match the number of available cores.

On-the-wire

Seeing what’s happening on-the-wire, over the Boildown compressed pipe, is quite easy with nc (netcat), telnet and tcpdump.

Spin up a compressor listening at localhost:20000 that forwards compressed traffic to localhost:30000:

java -jar boildown-0.1-SNAPSHOT-runnable.jar --compress 20000:localhost:30000 --zlib &

Spin up a decompressor listening at localhost:30000 that forwards uncompressed traffic back to localhost:30001:

java -jar boildown-0.1-SNAPSHOT-runnable.jar --decompress 30000:localhost:30001 --zlib &

In a separate terminal, spin up an instance of tcpdump that dumps traffic on port 30000. On Mac OS X:

sudo /usr/sbin/tcpdump -i lo0 -nnvvXXSs 1514 port 30000

In another terminal, launch nc to open up a socket and listen on port 30001 (where the decompressed/original bytes will be forwarded to):

nc -l 30001

And finally, in yet another terminal window, launch telnet and connect to localhost:20000:

telnet localhost 20000

Magic

Click to enlarge.

In the left panel, we’re using telnet to connect to the Boildown compressor listening at localhost:20000. Anything typed into this telnet session is routed through Boildown, compressed, and forwarded to localhost:30000.

The middle panel, we’re running nc which is listening at localhost:30001. This is the decompressed side. Anything from the telnet session at localhost:20000 is seen here, and consequently, anything we type into this session is forwarded (and compressed) back to localhost:20000.

In the right panel, notice the bidirectional compressed traffic captured by tcpdump flowing over localhost:30000. The astute reader will notice the Z? header in the tcpdump output given we’re running Boildown with --zlib.

Next steps

  • Java NIO — eventually I want to explore how to use Java’s non-blocking I/O paradigm in lieu of threads to manage data flowing over-the-wire, similar to Jetty’s NIO org.eclipse.jetty.server.ServerConnector.
  • Specify Multiple Compressors/Decompressors — as of now you can only specify a single --compress or --decompress route on the command line, but I’d eventually like to rework the app to support an arbitrary number of routes similiar to SSH’s -L.

Open Source

Boildown is free on GitHub and licensed under the popular MIT License.

Issues and pull requests welcome.

Maven: Add Local JAR Dependency to Classpath

ed8f6b7f55088c791fa46a57461eb84fafc6faa5

Sun Sep 27 16:53:05 2015 -0700

I’ve been getting back into Maven lately, converting the build system behind several of my personal projects on GitHub into something a little more sane and well-travelled. For reasons yet-to-be formally discussed, I’ve embarked on a mass migration away from SBT — albeit, I still have a number of published projects backed by SBT.

SBT Rant

Although I’m still using it sparingly, SBT has left a bitter taste in my mouth. The long-and-short of it is that I’m tired of everything in SBT being a complicated puzzle backed by poor documentation — I just want to get stuff done. I wish I had the countless hours of my life back that I spent figuring out how to accomplish very specific (yet seemingly common) tasks with SBT. Build definitions do not need to be written in Turing-complete languages, and in my humble opinion, SBT is a perfect example of what not to do.

</rant>

Maven

I was refactoring a personal project to use Maven the other day, and stumbled across a need to “add a local JAR to my classpath”. That is, I have a .jar file on disk from many moons ago, that is not in any public Maven repository yet I need to add it to the compile scope of my project.

Bad: The system Scope

A quick search of the Interwebz clearly calls out a worst practice: using the Maven system scope.

The system scope was designed to deal with “system” related files — files sitting in some fixed location, like Java’s core rt.jar. To discourage bad behavior, the Maven contributors intentionally refused to make pathname expansion work correctly in the context of the <systemPath> tag in the system scope. In other words, ${basedir}/lib/foo.jar below will not resolve:

<dependencies>
    <!-- WRONG: DON'T DO THIS -->
    <dependency>
        <groupId>com.foo</groupId>
        <artifactId>bar</artifactId>
        <version>1.0</version>
        <scope>system</scope>
        <systemPath>${basedir}/lib/bar-1.0.jar</systemPath>
    </dependency>
</dependencies>

Don’t do this.

Good: Use a Local Repository

The best practice is to “publish” the .jar file to a local Maven repository nested within the project. Yes, you read that correctly, publish the .jar to a ~/.m2 like repo within your project that is checked into SCM!

Here’s how…

On disk, your project probably looks something like this:

project/
  src/main/java
  src/main/resources
  src/test/java
  pom.xml

1) Create a lib directory in your project root — this lib directory will act as a local Maven repository within the project.

cd project
mkdir lib

2) Download the .jar file to disk, and use mvn to publish the .jar to the lib directory.

In this example, I’m publishing the Gagawa library I wrote and open-sourced many years ago.

mvn org.apache.maven.plugins:maven-install-plugin:2.5.2:install-file \
  -Dfile=~/Desktop/gagawa-1.0.1.jar \
  -DgroupId=com.hp \
  -DartifactId=gagawa \
  -Dversion=1.0.1 \
  -Dpackaging=jar \
  -DlocalRepositoryPath=lib

If all went well, you can find your artifact published inside of lib.

project$ find lib
lib
lib/com
lib/com/hp
lib/com/hp/gagawa
lib/com/hp/gagawa/1.0.1
lib/com/hp/gagawa/1.0.1/gagawa-1.0.1.jar
lib/com/hp/gagawa/1.0.1/gagawa-1.0.1.pom
lib/com/hp/gagawa/maven-metadata-local.xml

Note the structure here mimics what you’d find in ~/.m2.

3) Now, in your pom.xml, declare the lib directory in your project a Maven repository.

<repositories>
    <repository>
        <id>my-local-repo</id>
        <url>file://${basedir}/lib</url>
    </repository>
</repositories>

4) And lastly, in your pom.xml declare a dependency on the local .jar like you would for any other classpath dependency.

<dependencies>
    <dependency>
        <groupId>com.hp.gagawa</groupId>
        <artifactId>gagawa</artifactId>
        <version>1.0.1</version>
    </dependency>
</dependencies>

At runtime, Maven will consult the local repo at ${basedir}/lib in addition to ~/.m2 and any other remote repositories you have defined.

Ship it!

Fail your Build on Java Compiler Warnings

2c8127e956e463d7dd061b870cc9fce3d256f974

Sat Sep 12 16:37:41 2015 -0700

I hate seeing compiler warnings in code, and anyone who argues that ignoring them is a fine software engineering policy, should be swiftly relieved of their position. Warnings call out mistakes, and occasional blatant stupidity that should not be ignored — heck just Google “pay attention to compiler warnings” for some fun anecdotes. Folks in software are generally helpful, and compiler writers don’t inject annoying warnings because they are mean-spirited. Instead, people want to help, and consequently, compiler warnings are there to help. I’ve personally worked on several stacks with literally thousands of compiler warnings peppered throughout the code — it’s miraculous that some of those applications worked at all.

To combat warning hell, I’ve made it a personal best practice to do two things:

  1. In mature code bases, never introduce more warnings and never ignore existing warnings I happen to stumble across. Ever. If I see a warning in an area of code that I’m working on, I’ll clean it up — no excuses.
  2. In new projects, starting from scratch, set my build tool to immediately fail the build on any warning. In other words, treat warnings as compilation errors!

The latter is surprisingly easy and very effective at forcibly setting a high bar before entropy can gain a foothold.

Here’s how, with a few popular build tools:

Ant

If you’re still using Ant, set a series of <compilerarg> tags in your <javac> tasks. Of course, this goes in your build.xml:

<javac srcdir="${src.dir}" destdir="${classes.dir}" classpathref="libraries">
  <compilerarg value="-Xlint:all"/>
  <compilerarg value="-Xlint:-processing"/>
  <compilerarg value="-Xlint:-serial"/>
  <compilerarg value="-Werror"/>
</javac> 

Maven

Using the maven-compiler-plugin add a few <compilerArgs> to your configuration within your pom.xml:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-compiler-plugin</artifactId>
  <version>3.3</version>
  <configuration>
    <source>1.8</source>
    <target>1.8</target>
    <compilerArgs>
      <arg>-Xlint:all</arg>
      <arg>-Xlint:-processing</arg>
      <arg>-Xlint:-serial</arg>
      <arg>-Werror</arg>
    </compilerArgs>
  </configuration>
</plugin>

Gradle

To fail the build on any compiler warning, in main source and in test source, set this in your build.gradle:

tasks.withType(JavaCompile) {
  options.compilerArgs << "-Xlint:all" << "-Xlint:-processing" << "-Xlint:-serial" << "-Werror"
}

SBT

In the unlikely event that you’re building a pure Java project or Java source with SBT, set this in your project/Build.scala:

lazy val projectSettings = Defaults.coreDefaultSettings ++ Seq(
  scalacOptions ++= Seq(
    "-deprecation", "-unchecked", "-feature", "-Xlint", "-Xfatal-warnings", "-encoding", "utf8"
  ),
  javacOptions ++= Seq(
    "-Xlint:all,-processing,-serial", "-Werror", "-encoding", "utf8", "-g"
  )
)

The Scala compiler scalac equivalent to -Werror is -Xfatal-warnings (apparently).

A few notes

The magic is in -Werror which is documented here. When set, -Werror terminates compilation when warnings are found.

I’m also passing -Xlint:-processing which disables any annotation processor warnings from JARs on the compile classpath. And lastly, -Xlint:-serial disables any warnings complaining of Serializable classes that do not have an explicit serialVersionUID field. Ok, yes, certainly one could argue that ignoring complaints about a missing serialVersionUID field is dangerous, but I’ll let you be the judge.

Cheers!

Writing Versioned Service APIs With Curacao: Part 2

cabd8b34841bc0a3b34fe11f3d707aba1d67ef68

Sun Oct 26 12:31:30 2014 -0700

In Part 1 of this series, I covered how to use Curacao to handle versioned resource requests. That is, “how do clients specify the version of the resource they’re asking for” given a number of implementation possibilities. In this Part 2, let’s talk routing and a few respective implementation strategies using Curacao.

Routing

How does the API route version specific requests?

Path

Without question, the most common mechanism used to “route” requests to the right controller is through the URI path.

By default, Curacao accomplishes this using regular expressions in conjunction with Java’s named capture groups (in Java 7+) to pull values out of the path as needed. For instance, consider the following RESTful like API requests that manipulate users in a data store.

GET:/users.json?lastName=jones
GET:/user/76234849.json
POST:/user
DELETE:/user/76219057

With a relatively simple set of regular expressions can write a controller that support each of these requests.

import static com.kolich.curacao.annotations.methods.RequestMapping.RequestMethod.*;

@Controller
public final class SampleController {

  private final DataSource ds_;

  @Injectable
  public SampleController(final DataSource ds) {
    ds_ = ds;
  }

  // GET:/users.json?lastName=jones
  // Note the @Query annotation on the 'lastName' argument
  @RequestMapping(value="^\\/users\\.json$", methods={GET})
  public final List<User> getUsers(@Query("lastName") final String lastName) {
    return ds_.getUsersWithLastName(lastName);
  }
  
  // GET:/user/76234849.json
  // Note the @Path annotation on the 'userId' argument
  @RequestMapping(value="^\\/user\\/(?<userId>\\d+)\\.json$", methods={GET})
  public final User getUser(@Path("userId") final String userId) {
    return ds_.getUserById(userId);
  }
  
  // POST:/user
  @RequestMapping(value="^\\/user$", methods={POST})
  public final User createUser(@RequestBody final String createUserRequest) {
    return ds_.createUser(createUserRequest);
  }

  // DELETE:/user/76219057
  @RequestMapping(value="^\\/user\\/(?<userId>\\d+)$", methods={DELETE})
  public final User deleteUser(@Path("userId") final String userId) {
    return ds_.getAndDeleteUser(userId);
  }

}

Nothing too surprising here, but lets walk through it anyways.

This @Controller declares a dependency on DataSource through its constructor — the immutable singleton DataSource will be injected automatically into the constructor when Curacao instantiates an instance of this controller on application startup.

Subsequent methods like getUsers and getUser are only invoked on incoming GET requests whose path matches the regular expression provided in the value attribute of the @RequestMapping annotation.

The @Query controller argument annotation is used to extract the values of query parameters, if any. If @Query references a query parameter that is not present on the request, the argument value will be null. Likewise, the @Path controller argument annotation is used to extract values from the path, if any. If @Path references a named capture group that is not present in the path, or the provided regular expression was unable to extract a value for the given capture group, the argument value will be null.

Beautifully simple — no awful DSL’s to learn, and completely interoperable with other languages for the JVM like Scala and Clojure.

Custom Header

In the odd event that you’d like to route requests based on something other than the path, Curacao supports the implementation of a custom CuracaoPathMatcher to be used within your @RequestMapping annotations. For instance, consider a controller that routes requests based on a custom value within an HTTP request header — it’s easy to implement a custom CuracaoPathMatcher to achieve this behavior.

import com.google.common.collect.ImmutableMap;

public final class MyCustomHeaderMatcher implements CuracaoPathMatcher {

  private static final String MY_CUSTOM_HEADER = "X-Custom-Header";

  @Override @Nullable
  public Map<String,String> match(final HttpServletRequest request,
                                  final String value, // From your @RequestMapping
                                  final String path) throws Exception {
    final String header = request.getHeader(MY_CUSTOM_HEADER);
    if (header != null && header.contains(value)) {
      // If the custom contains the provided value from the annotation,
      // then we have a match!  Note the value argument here is the
      // "value" from the controller method @RequestMapping annotation.
      // For example:
      // @RequestMapping("foo") the value is "foo"
      // @RequestMapping(value="bar", methods=POST) the value is "bar"
      return ImmutableMap.of(MY_CUSTOM_HEADER, value);
    } else {
      return null; // No match!
    }
  }

}

And simply reference your custom CuracaoPathMatcher in your controllers using the matcher attribute on Curacao’s @RequestMapping annoation.

@Controller
public final class SampleController {

  @RequestMapping(value="foo", matcher=MyCustomHeaderMatcher.class)
  public final String foo() {
    // Will only be invoked when an 'X-Custom-Header' request header is
    // present that contains "foo".
    return "foo";
  }
  
  @RequestMapping(value="bar", matcher=MyCustomHeaderMatcher.class)
  public final String bar() {
    // Will only be invoked when an 'X-Custom-Header' request header is
    // present that contains "bar".
    return "bar";
  }

}

In the example above, the foo method will only be invoked when the X-Custom-Header HTTP request header contains the string “foo”. Likewise, the bar method will only be invoked when the X-Custom-Header contains the string “bar”.

You can, of course, implement your own logic to “pull apart” a custom header value and route requests as desired using any custom CuracaoPathMatcher implementation. But, always remember that the first “matcher” to return a non-null map indicating a match, wins. In other words, if you have two custom CuracaoPathMatcher implementations that could potentially match the same “value”, the first matcher that matches will win — the ordering in which matchers are interrogated to find a controller method to invoke is nondeterministic. This is by design.

Part 3

In the upcoming Part 3 of this series, I’ll cover the creation and serving of versioned response objects using Curacao.

Stay thirsty, my friends.

Writing Versioned Service APIs With Curacao: Part 1

89c0761ee3faa681cbb81eb9038160e1c8f6c5c0

Wed Oct 22 19:11:19 2014 -0700

Curacao is a beautifully simple toolkit for the JVM that lets you write highly concurrent services on the common and trusted J2EE stack. While you can use it to build a full web-application, Curacao is fundamentally designed to support highly asynchronous REST/HTTP-based integration layers on top of asynchronous Servlets that are easy to understand, maintain, and debug. At its core, Curacao completely avoids the mental overhead of passing of messages between actors or within awful event loops — yet, given its simplicity, performs very well in even the most demanding applications.

Quite often, one of the most difficult problems to solve when designing an API is resource versioning.

As I see it, there’s several aspects to the API versioning problem:

  1. how do clients specify the version of the resource they’re asking for?
  2. how does the API route version specific requests?
  3. how does the API manage and respond with versioned responses?
  4. how does the API gracefully sunset deprecated resource versions?

Solutions to these questions have been debated, ad nauseam, into infinity to which everyone has a conflicting opinion.

Opinions aside, Part 1 of this series highlights a few examples that illustrate how you might implement solutions to these problems with Curacao.

Part 1: Versioned Requests

How do clients specify the version of the resource they’re asking for?

Query Parameter

One approach, is specifying the desired version of a resource using an optional query parameter. For example, consider the following requests:

GET:/user/89171245.json?version=1
GET:/user/89171245.json?version=2
GET:/user/89171245.json

While technically asking for the same resource /user/89171245.json, the client is using the version query parameter to specify the version of the API it intends to use. The server side can interpret the value of the version query parameter, and respond with an entirely unique response object depending on the requested version. In this case, version=1 may result in an entirely different response JSON object compared to that of version=2. In the event that the version query parameter is omitted, the API will default to the most recent version.

Implementing this versioning mechanism with Curacao is trivial.

The trick is to implement a custom ControllerMethodArgumentMapper that looks for the version query parameter, sanitizes it, and passes the requested version to your controller methods as a typed argument.

First, lets define an enumeration that cleanly represents all possible supported API versions.

public static enum MyApiVersion {

  /* API version 1 */
  VERSION_1("1"),
  
  /* API version 2 */
  VERSION_2("2");
  
  private String version_;
  private MyApiVersion(final String version) {
    version_ = version;
  }
  
  /**
   * Given a string, from a query parameter, convert it into one of the
   * supported API versions.  If the param is null, or doesn't match any
   * known version, this method returns the latest version.
   */
  public static final MyApiVersion versionFromParam(final String param) {
    MyApiVersion result = MyApiVersion.VERSION_2; // Default
    if (param != null) {
      // Iterate over each possible version in the enumeration,
      // looking for a match.
      for (final MyApiVersion version : MyApiVersion.values()) {
        if (version.version_.equals(param)) {
          result = version;
          break;
        }
      }
    }
    return result;
  }
  
}

Now, let’s implement a custom ControllerMethodArgumentMapper that converts the version query parameter on the request, if any, into a MyApiVersion.

import com.kolich.curacao.handlers.requests.mappers.ControllerMethodArgumentMapper;

@ControllerArgumentTypeMapper(MyApiVersion.class)
public final class MyApiVersionArgumentMapper extends ControllerMethodArgumentMapper<MyApiVersion> {

  private static final String VERSION_QUERY_PARAM = "version";

  @Nullable @Override
  public final MyApiVersion resolve(@Nullable final Annotation annotation,
                                    final CuracaoRequestContext context) throws Exception {
    final HttpServletRequest request = context.request_;
    final String versionParam = request.getParameter(VERSION_QUERY_PARAM);
    return MyApiVersion.versionFromParam(versionParam);
  }

}

And finally, we can now write our controller methods to take an argument of type MyApiVersion. At runtime, Curacao will see this MyApiVersion argument on your controller methods, and invoke our custom MyApiVersionArgumentMapper to extract the desired version from the request.

import static com.kolich.curacao.annotations.methods.RequestMapping.RequestMethod.*;

@Controller
public final class VersionedController {

  private final DataSource ds_;

  @Injectable
  public VersionedController(final DataSource ds) {
    ds_ = ds;
  }

  @RequestMapping(value="^\\/user\\/(?<userId>\\d+)\\.json$", methods={GET})
  public final String getUser(@Path("userId") final String userId,
                              final MyApiVersion version) {
    final User user = ds_.getUserById(userId);
    if (MyApiVersion.VERSION_1.equals(version)) {
      // Construct and return a "version 1" User object.
    } else {
      // Construct and return a "version 2" User object.
    }
  }
  
}

Note that the MyApiVersion argument in the controller method above is automatically discovered and injected when invoked by Curacao.

Path

A more common approach to request versioning is through the usage of a version identifier in the path itself. For example, consider the following requests:

GET:/v1/user/98143016.json
GET:/v2/user/98143016.json

Note the v1 and v2 version identifier in the path.

Again, unsurprisingly, implementing this versioning mechanism with Curacao is trivial. Current best practices dictate the usage of multiple controllers — one that handles v1 requests and another that handles v2.

And so, one controller for v1:

package com.foo.api.controllers.v1;

@Controller
public final class ControllerV1 {

  @RequestMapping(value="^\\/v1\\/user\\/(?<userId>\\d+)\\.json$", methods={GET})
  public final String fooV1(@Path("userId") final String userId) {
    return "v1: " + userId;
  }

}

And another for v2:

package com.foo.api.controllers.v2;

@Controller
public final class ControllerV2 {

  @RequestMapping(value="^\\/v2\\/user\\/(?<userId>\\d+)\\.json$", methods={GET})
  public final String fooV2(@Path("userId") final String userId) {
    return "v2: " + userId;
  }

}

Note clean separation using a unique package declaration.

Accept Header

Another, slightly more RESTful approach, is using the Accept HTTP request header to identify the desired version of a resource. This is somewhat analogous to client/server “content negotiation”.

In the interest of brevity, I won’t write a complete implementation here. However, a key takeaway is that you can use Curacao’s @Header annotation to extract the value of any request header. From there, your business logic in the controller can examine the header value to make a decision about what API version is invoked.

@Controller
public final class HeaderController {

  @RequestMapping(value="^\\/foo", methods={GET})
  public final String headerDemo(@Header("Accept") final String accept) {
    if (accept.contains("v2")) {
      // V2
    } else {
      // V1
    }
  }  

}

In addition to @Header, there are a number of “convenience” request header annotations you can use to decorate your controller method arguments:

  • @Accept — convenience for the Accept request header
  • @UserAgent — convenience for the User-Agent request header
  • @ContentType — convenience for the Content-Type request header
  • @Authorization — convenience for the Authorization request header
  • … and of course, many more in the com.kolich.curacao.annotations.parameters.convenience package.

Part 2

Next in this series, Writing Versioned Service APIs With Curacao: Part 2 discusses routing strategies with Curacao.

Enjoy!

Introducing Curacao

83c31646fbfc9e59d10c1af7235dcbe043808239

Sun Aug 31 15:57:31 2014 -0700

Tired of Spring, Jersey, raw Servlets, and other REST toolkits I cautiously approached the thought of building my own JVM web-layer from scratch. In retrospect, I probably didn’t need to spend time on yet another toolkit to help shield engineers from the boilerplate and complexity of web-applications on the JVM. However, I found most existing libraries (and frameworks) to be overly bloated, complex and just generally awful.

I wanted something “better” — of course, better purely by my own personal definition.

I’ve written enough Java and Scala to recognize what’s most relevant when choosing a highly asynchronous and flexible web-layer upon which to build a scalable web-service or application. With a foot in multiple camps, and having previously used most widely available frameworks and tools, I’d like to think I have a unique perspective on this problem. From what I can tell, there’s generally two sides:

  1. The asynchronous overkill approach — Akka, Spray and Play
  2. The boil-the-ocean, thread-based approach — Spring and Apache Struts

Each of these tools have their own merits, but I conjecture that a large majority of the time, they are either misused or chosen for the wrong reasons. Quite often, especially in software engineering, developers get lost in a haze of early over optimization or analysis paralysis. I wish I had a dollar for every time I heard a Product or Engineering Manager say something like, “We need to support 1,000,000 concurrent users! Web-scale!” Hold on — let’s make a pragmatic upfront technical decision, and build a beautiful product first. If and when the opportunity to go “web-scale” presents itself, we can address those tough scalability questions later.

However, in the meantime, there must exist some web-layer that:

  • doesn’t attempt to boil-the-ocean
  • can be used with a Java or Scala stack
  • is easy to understand and debug
  • avoids confusing and generally awful DSLs
  • is fully asynchronous
  • doesn’t require tens-of-megabytes of dependency hell
  • doesn’t “baby” engineers with fancy shells and command line tools
  • has a reasonable set of complete documentation and examples
  • is “fast”

And so, I sat down one evening many moons ago, and began to write my own web-layer from scratch — one that attempts to address many, if not all, of the shortcomings I perceive in existing toolkits.

I named the project Curacao, because I like fancy blue drinks with tiny umbrellas.

10,000 foot view

At a high level, here are some things you should know about Curacao:

  • it’s written in Java 7, but plays nicely with Scala
  • it’s thread based, built on top of asynchronous Servlets as part of the J2EE Servlet 3.0 spec
  • takes a “return or throw anything, from anywhere” approach to response handling
  • implements a clean, and very fast, dependency injection model
  • controllers, components, and routes are defined using simple annotations
  • BYO (bring-your-own) ORM library
  • no XML, anywhere — is configurable using HOCON and the Typesafe Config configuration library
  • for JSON, supports GSON and Jackson out-of-the-box
  • compiled, Curacao ships in a single JAR that’s only 150KB in size
  • deployable with any Servlet 3.0 compatible web-application
  • it’s free and open source on GitHub

Bootstrap

Still here?

Let’s bootstrap a Curacao application in 3-steps.

  1. First, configure your project to pull in the necessary dependencies. As of this writing, the latest stable version is 2.6.3, however you should check the Curacao Releases page for the latest version.

    If using Maven:

    <repository>
      <id>Kolichrepo</id>
      <name>Kolich repo</name>
      <url>http://markkolich.github.io/repo/</url>
      <layout>default</layout>
    </repository>
    
    <dependency>
      <groupId>com.kolich.curacao</groupId>
      <artifactId>curacao</artifactId>
      <version>2.6.3</version>
      <scope>compile</scope>
    </dependency>
    

    If using SBT:

    resolvers += "Kolich repo" at "http://markkolich.github.io/repo"
    val curacao = "com.kolich.curacao" % "curacao" % "2.6.3" % "compile"
    

  2. Second, inject the required listener and dispatcher into your application's web.xml.

    <web-app>
                     
      <listener>
        <listener-class>com.kolich.curacao.CuracaoContextListener</listener-class>
      </listener>
    
      <servlet>
        <servlet-name>CuracaoDispatcherServlet</servlet-name>
        <servlet-class>com.kolich.curacao.CuracaoDispatcherServlet</servlet-class>
        <load-on-startup>1</load-on-startup>
        <async-supported>true</async-supported>
      </servlet>
      <servlet-mapping>
        <servlet-name>CuracaoDispatcherServlet</servlet-name>
        <url-pattern>/*</url-pattern>
      </servlet-mapping>
    
    </web-app>
    

    The CuracaoContextListener listens for ServletContext lifecycle events, and initializes and destroys application components accordingly. And, like you might expect, the CuracaoDispatcherServlet is responsible for receiving and dispatching incoming requests from the Servlet container.

  3. Lastly, create a HOCON configuration file named application.conf and put it in a place that's accessible on your classpath — typically somewhere like src/main/resources. This file defines your Curacao application configuration, and is loaded from the classpath at runtime.

    curacao {
                        
      ## Your boot package is the package in which all of your components and
      ## controllers reside.  At boot time, Curacao uses reflection and scans
      ## this package, and all of its children, looking for annotated classes
      ## to dynamically instantiate.
      boot-package = "com.foobar"
      
      ## The asynchronous timeout for any response.  If your application fails to
      ## respond to any request within this timeout, Curacao will kick in and
      ## throw an exception, which allows you to abort+handle the response
      ## gracefully.  Set to 0 (zero) for an infinite timeout.
      async-context-timeout = 30s
      
      ## The maximum number of threads that will be used to handle incoming
      ## requests.  The number of concurrent request worker threads will never
      ## exceed this size.  Set to 0 (zero) for an unbounded thread pool.
      pools.request {
        size = 4
      }
      
      ## The maximum number of threads that will be used to process outgoing
      ## responses.  The number of concurrent response worker threads will never
      ## exceed this size.  Set to 0 (zero) for an unbounded thread pool. 
      pools.response {
        size = 4
      }
      
    }
    

    Take a look at Curacao's global reference.conf for the complete list of application configuration options. This reference.conf file defines the Curacao default set of configuration options, which are completely overridable in your own application.conf.

That’s it! You’ve bootstrapped your first Curacao enabled application.

Controllers

At their core, Curacao controllers are immutable singletons that are automatically instantiated at application startup — they’re classes that contain methods which Curacao will invoke via reflection when dispatching a request. On launch, Curacao recursively scans your defined boot-package looking for any classes annotated with the @Controller annotation. As requests are received and dispatched from the Servlet container, Curacao very efficiently interrogates each known controller instance looking for a method worthy of handling the request.

For maximum efficiency at runtime, regular expressions and request routing tables are compiled and cached once at startup.

Here’s a sample controller implementation that demonstrates several key features:

@Controller
public final class UserController {

  @RequestMapping("^\\/users\\/(?<userId>[a-zA-Z_0-9\-]+)$")
  public String getUser(@Path("userId") final String userId) {
    return "Load user: " + userId;
  }
  
  @RequestMapping(value="^\\/users$", methods=POST)
  public String createUser(@RequestBody final String body) {
    // Lazily convert 'body' to a user object
    // Insert user into data store
    return "Successfully created user.";
  }
  
  @RequestMapping(value="^\\/users\\/(?<userId>[a-zA-Z_0-9\-]+)$", methods=PUT)
  public void updateUser(@Path("userId") final String userId,
                         final HttpServletResponse response,
                         final AsyncContext context) {
    try {
      // Do work, update user with id 'userId'
      response.setStatus(201); // 201 Created
    } finally {
      // Complete context manually due to 'void' return type
      context.complete();
    }    
  }
  
  @RequestMapping(value="^\\/users\\/(?<userId>[a-zA-Z_0-9\-]+)$", methods=DELETE)
  public void deleteUser(@Path("userId") final String userId,
                         final HttpServletResponse response,
                         final AsyncContext context) {
    try {
      // Delete user specified by 'userId'
      response.setStatus(204); // 204 No Content
    } finally {
      // Complete context manually due to 'void' return type
      context.complete();
    }    
  }
  
  @RequestMapping("^\\/users$")
  public Future<List<String>> queryUsers(@Query("name") final String name) {
    // Query data store for a list of users matching the provided 'name'.
    // Return a Future<List<String>> which represents an async operation that
    // fetches a list of user ID's.
    return someFuture;
  }

}

Like other popular toolkits, request routing is handled using a familiar @RequestMapping method annotation. The @RequestMapping annotation allows you to specify, among other things, the HTTP request method and URI path to match. The default behavior of @RequestMapping uses Java regular expressions and Java 7’s named capture groups to extract path components from the incoming request URI.

When you need the entire request body as a UTF-8 encoded String, simply add a String method argument and annotate it with the @RequestBody annotation. Further, query parameters can be easily extracted using the @Query method argument annotation. For more complex scenarios, when you need direct access to the underlying HttpServletResponse or Servlet 3.0 AsyncContext object, just add them as arguments and Curacao will pass them to your method when invoked. Last but not least, your controller methods may return a Future<?> anytime you need to render the result of an asynchronous operation, that may or may not complete successfully at some point in the future.

In the unlikely event you need to route requests by something other than the URI/path, you can implement your own CuracaoPathMatcher and pass it to Curacao using the matcher attribute of the @RequestMapping annotation.

Components

Like Curacao controllers, components are immutable singletons instantiated at application startup — they’re classes that represent pieces of shared logic or configuration, much like Java “beans”. Unsurprisingly, component classes are annotated with the @Component annotation.

Component singletons can be passed to other components, controllers, request filters, request mappers and response handlers — we’ll cover the latter three later in this post. In the spirit of immutability, Curacao components can only be passed to other Curacao instantiated classes via their constructors — there are no “getters” and “setters”. Current best practices dictate the usage of final like instance variables in your Curacao instantiated classes, ensuring immutability.

Consider the two components below, Foo and Bar — based on their constructor declarations, Bar depends on Foo. In other words, Bar cannot be instantiated unless it is passed an instance of Foo via its constructor.

@Component
public final class Foo {

  public Foo() {
    // Stuph.
  }

}
@Component
public final class Bar {

  private final Foo foo_;

  @Injectable
  public Bar(@Nonnull final Foo foo) {
    foo_ = foo;
  }

}

You may have noticed the @Injectable constructor annotation. The @Injectable annotation is used to declare dependencies on other components. In the example above, because class Bar has an @Injectable annotated constructor with an argument of type Foo, Curacao interprets this relationship as “class Bar depends on Foo”. Therefore, Foo will be instantiated first, and then passed to Bar’s constructor.

Curacao automagically identifies such dependencies, and instantiates component singletons in dependency order. Like other dependency-injection (DI) models, Curacao scans your declared boot-package and intelligently builds an object graph by analyzing dependencies derived from your implementation. However, note there are no “component factories” in Curacao.

Your object graphs can be as simple, or as complex as you’d like.

Injecting components into your controllers is easy too. In your controller, simply add an @Injectable annotated constructor. Component singletons, once instantiated, will be passed to your controller as constructor arguments.

@Controller
public final class SampleController {

  private final Bar bar_;

  @Injectable
  public SampleController(final Bar bar) {
    bar_ = bar;
  }
  
  @RequestMapping("^\\/bar$")
  public String bar() {
    return bar_.toString();
  }

}

Lastly, components that need to be aware of application container lifecycle events such as startup and shutdown, can implement the ComponentInitializable and/or ComponentDestroyable interfaces.

@Component
public final class WebServiceClient implements ComponentDestroyable {

  private final AsyncHttpClient httpClient_;

  public WebServiceClient() {
    httpClient_ = new AsyncHttpClient();
  }
  
  /**
   * Called once during application shutdown to stop this component.
   * Is useful to cleanup or close open sockets and other resources.
   */
  @Override
  public void destroy() throws Exception {
    httpClient_.close();
  }

}
@Component
public final class DataStore implements ComponentInitializable {

  private final MongoDbClient mongo_;

  public DataStore() {
    mongo_ = new MongoDbClient();
  }
  
  /**
   * Called once during application startup to initialize this component.
   * Is useful to further initialize a component beyond its constructor.
   */
  @Override
  public void initialize() throws Exception {
    mongo_.setCredentials("foo", "bar");
    mongo_.setMaxConnections(100);
  }

}

Filters

Request filters are singletons that implement the CuracaoRequestFilter interface and are invoked as “pre-processing” tasks before an underlying controller method is invoked. Filters can accept the request, and attach context attributes for consumption by a controller. Or, they can reject the request by throwing an Exception.

This makes request filters a suitable place for handling request authentication or authorization.

Unlike vanilla Servlet filters, Curacao request filters are handled asynchronously outside of a blocking Servlet container thread. In other words, Curacao calls request.startAsync() on the incoming ServletRequest before it invokes your request filter. This means that Curacao request filters are asynchronously handled in the context of a normal request.

Just like other components or controllers, filters are injectable — decorate a filter’s constructor with @Injectable to inject component singletons into the filter.

public final class SessionFilter implements CuracaoRequestFilter {

  private final DataStore ds_;

  @Injectable
  public SessionFilter(final DataStore ds) {
    ds_ = ds;
  }
  
  @Override
  public void filter(final CuracaoRequestContext context) throws Exception {
    final HttpServletRequest request = context.request_;
    final String auth = request.getHeader("Authentication");
    // Authenticate the request against the data store, throw Exception if needed 
    final String userId = ds_.authorizeUser(auth);
    // If we got here, we must have successfully authenticated the user.
    // Attach the user's ID to the context to be picked up by a controller later.
    context.setProperty("user-id", userId);
  }

}

The CuracaoRequestContext is an object that represents a mutable “request context” which spans across the life of the request. A filter can use the internal mutable property map in this class to pass data objects from itself to another filter, controller, or argument mapper (covered later).

Attach one or more filters to your controller methods using the filters attribute of the @RequestMapping annotation.

@Controller
public final class SecureController {

  @RequestMapping(value="^\\/secure$", filters={SessionFilter.class})
  public String secureArea() {
    // Secure.
  }

}

Request Mappers

Request mappers are immutable singletons that translate the request body, or some other piece of the request, into something directly usable by a controller method. For example, reading and translating an incoming form POST body into a Multimap<String,String>. Or, reading and translating an incoming PUT request body into a custom object — e.g., unmarshalling a JSON string into an application entity.

For convenience, Curacao ships with several default request mappers. For instance, in your controller, if you’d like to convert the incoming request body to a Multimap<String,String>, simply add the right argument and annotate it with the @RequestBody annotation. Curacao uses Google’s Guava Multimap implementation exclusively.

@Controller
public final class RequestBodyDemoController {

  /**
   * Buffer the request body, and decode the URL encoded key-value parameters
   * therein into a Multimap<String,String>.
   */
  @RequestMapping(value="^\\/post", methods=POST)
  public String post(@RequestBody final Multimap<String,String> body) {
    // Assume POST body was 'foo=bar&dog=cat', body.get("foo") returns ["bar"]
    List<String> foo = body.get("foo");
    return foo.toString();
  }
  
  /**
   * Get a single parameter from the POST body, 'foo'.
   */
  @RequestMapping(value="^\\/post\\/foo", methods=POST)
  public String postFoo(@RequestBody("foo") final String foo) {
    return foo;
  }
  
  /**
   * Buffer the entire request body into an NIO ByteBuffer.
   */
  @RequestMapping(value="^\\/put\\/buffer", methods=PUT)
  public String postBuffer(@RequestBody final ByteBuffer body) {
    return "Byte buffer capacity: " + body.capacity();
  }
  
}

Implementing your own request mapper is easy too. For instance, if you need to unmarshall a JSON POST body into an object, simply write a class to extend InputStreamReaderRequestMapper<T> and annotate it with the @ControllerArgumentTypeMapper annotation.

@ControllerArgumentTypeMapper(MyObject.class)
public final class MyObjectMapper extends InputStreamReaderRequestMapper<MyObject> {

  private final DataStore ds_;

  /**
   * Yes, argument mappers are component injectable too!
   */
  @Injectable
  public MyObjectMapper(final DataStore ds) {
    ds_ = ds;
  }

  @Override
  public MyObject resolveWithReader(final InputStreamReader reader) throws Exception {
    // Use provided 'InputStreamReader' and unmarshall string to a MyObject instance
    return myObject;
  }

}

Now that you’ve registered a request mapper for type MyObject, you can simply add a MyObject argument to any controller method. Curacao will automagically invoke your request mapper to convert the body to a MyObject, before calling your controller method.

@Controller
public final class MyObjectController {

  @RequestMapping(value="^\\/myobject", methods=POST)
  public String myObject(final MyObject mine) {
    // Do something with MyObject
    return "Worked!";
  }

}

You can find the default set of Curacao request mappers here.

Response Handlers

Curacao takes a “return or throw anything, from anywhere” approach to response handling.

Like you might expect, response handlers are designed to convert controller returned objects into a response, or convert thrown exceptions into a response. Fortunately, Curacao handles AsyncContext completion for you, so in most cases there’s no need to write verbose code that forcibly calls context.complete() in your controllers.

For convenience, Curacao ships with several default response handlers. For instance, when your controller method returns a String, Curacao automatically interprets this return type as a text/plain; charset=UTF-8 encoded response body and sets the right response headers accordingly. Similarly, if your controller method returns a java.io.File object, Curacao interprets this as a static resource response — images, CSS, JavaScript, etc. As such, Curacao will set the right Content-Type response header based on the file’s extension, and will automatically stream the File contents back to the client.

Thrown exceptions are handled in the same way. For example, Curacao’s default response handling behavior for any thrown java.lang.Exception is to return a vanilla 500 Internal Server Error with an empty response body.

These default behaviors make writing controllers surprisingly pleasant and simple. However, you can of course, override any of these default behaviors by implementing your own RenderingResponseTypeMapper.

@ControllerReturnTypeMapper(MyObject.class)
public final class MyObjectResponseHandler extends RenderingResponseTypeMapper<MyObject> {

  private final DataStore ds_;
  
  /**
   * Yes, response handlers are component injectable too!
   */
  @Injectable
  public MyObjectResponseHandler(final DataStore ds) {
    ds_ = ds;
  }
		
  @Override
  public void render(final AsyncContext context,
                     final HttpServletResponse response,
                     @Nonnull final MyObject obj) throws Exception {
    response.setStatus(200);
    response.setContentType("application/json; charset=UTF-8");
    try(final Writer w = response.getWriter()) {
      // Convert 'MyObject' to JSON using the library of your choice.
      w.write(obj.toJson());
    }
  }
	
}

Now that a response handler has been defined for type MyObject, anytime a controller method returns and object of type MyObject, the MyObjectResponseHandler above will be called by Curacao to convert it into JSON automatically.

Thrown exceptions are handled in the same way.

@ControllerReturnTypeMapper(AuthenticationException.class)
public final class AuthenticationExceptionResponseHandler
  extends RenderingResponseTypeMapper<AuthenticationException> {
		
  @Override
  public void render(final AsyncContext context,
                     final HttpServletResponse response,
                     @Nonnull final AuthenticationException ex) throws Exception {
    // Redirect the user to the login page.
    response.sendRedirect("/login");
  }
	
}

Here’s an example controller that makes use of these response handlers.

@Controller
public final class ResponseHandlerDemoController {

  /**
   * This method returns a 'MyObject' instance, which will trigger Curacao
   * to invoke the MyObjectResponseHandler above to render it as JSON.
   */
  @RequestMapping("^\\/myobject")
  public MyObject getMyObject() {
    return new MyObject();
  }

  /**
   * When a controller throws an 'AuthenticationException', Curacao catches this
   * and invokes the 'AuthenticationExceptionResponseHandler' which redirects
   * the user to the login page.
   */
  @RequestMapping("^\\/home")
  public String home() {
    boolean isLoggedIn = false;
    // Validate that user is authenticated and request contains a valid session.
    if (!isLoggedIn) {
      throw new AuthenticationException();
    }
    return "Hello, world!";
  }

}

You can find the default set of Curacao response handlers here.

Performance

Curacao has been proudly submitted to TechEmpower’s Framework Benchmark test suite.

I’m anxiously waiting on results from Round 10 of their tests, which should include Curacao. When the test results are available, I intend to publish them here.

Further Examples

In the spirit of “eating my own dog food”, this very blog is built on Curacao and is fully open source on GitHub. If you’re looking for more complex component definitions, and realistic request mapping and response handling examples, the application source of this blog will be a great start.

Additionally, further examples that demonstrate the flexibility of Curacao can be found in the curacao-examples project on GitHub.

Open Source

Curacao is free on GitHub and licensed under the popular MIT License.

Issues and pull requests welcome.

Cheers!

Bolt: A Wrapper around Java's ReentrantReadWriteLock

e75d7312d274dc9af9a037f78de3ca0dea35f9d3

Thu Feb 27 20:49:40 2014 -0800

Concurrency is difficult, and generally tough to get right. Fortunately, there are tools that can somewhat ease this pain. For instance, take Java’s ReentrantReadWriteLock — a useful and foundational class that helps any highly concurrent Java application manage a set of readers and writers that need to access a critical block of code. When using a ReentrantReadWriteLock you can have any number of simultaneous readers, but the write lock is exclusive. In other words:

  • If any thread holds the write lock, all readers are forced to wait (or fail hard) until the thread that holds the write lock releases the lock.
  • If the write lock is not held, any number of readers are allowed to access the protected critical block concurrently — and any incoming writers are forced to wait (or fail hard) until all readers are done.

In short, this is the classic ReadWriteLock paradigm.

This is great, except that a vanilla ReentrantReadWriteLock is missing a few key features:

  1. Conditionally wait, or fail immediately, if the desired lock is not available. In other words, let me define upfront what I want to do if the lock I want to “grab” is not available — fail now, or wait indefinitely?
  2. And, execute a callback function only upon successful execution of a transaction. Here, we define a transaction to mean successfully acquiring the lock, doing work (without failure), and releasing the lock.

I wanted these features, so I implemented Bolt — a very tiny wrapper around Java’s ReentrantReadWriteLock with better wait, cleaner fail, and transactional callback support.

LockableEntity

Using Bolt, any entity or object you want to protect should implement the LockableEntity interface.

import com.kolich.bolt.LockableEntity;
import java.util.concurrent.locks.ReadWriteLock;

public final class Foobar implements LockableEntity {

  private final ReadWriteLock lock_;

  public Foobar() {
    lock_ = new ReadWriteLock();
  }

  @Override
  public ReadWriteLock getLock() {
    return lock_;
  }

}

Now, let’s create an instance of this example entity which we will use to protect a critical section of code within a transaction.

public static final Foobar foo = new Foobar();

This instance, foo, is used below throughout my examples.

Read Lock, Fail Immediately

First, let’s grab a shared read lock on foo, but fail immediately with a LockConflictException if the write lock is already acquired by another thread.

new ReentrantReadWriteEntityLock<T>(foo) {
  @Override
  public T transaction() throws Exception {
    // ... do read only work.
    return baz;
  }
}.read(false); // Fail immediately if read lock is not available

Note that read asks for a shared reader lock — the lock will be granted if and only if there are no threads holding a write lock on foo. There very well may be other reader threads.

Read Lock, Block/Wait Forever

Next, let’s grab a shared read lock on foo, but block/wait forever for the read lock to become available. Execute the success callback if and only if the transaction method finished cleanly without exception.

Note the implementation of the success method is completely optional.

new ReentrantReadWriteEntityLock<T>(foo) {
  @Override
  public T transaction() throws Exception {
    // ... do read only work.
    return baz;
  }
  @Override
  public T success(final T t) throws Exception {
    // Only called if transaction() finished cleanly without exception
    return t;
  }
}.read(); // Wait forever

It is very important to note that the underlying lock is held, while the success method is called. That is, the acquired lock isn’t released until the transaction and success method are finished.

Write Lock, Fail Immediately

Grab an exclusive write lock on foo, or fail immediately with a LockConflictException if a write or read lock is already acquired by another thread. Further, execute the success callback method if and only if the transaction method finished cleanly without exception.

new ReentrantReadWriteEntityLock<T>(foo) {
  @Override
  public T transaction() throws Exception {
    // ... do read or write work, safely.
    return baz;
  }
  @Override
  public T success(final T t) throws Exception {
    // Only called if transaction() finished cleanly without exception
    return t;
  }
}.write(); // Fail immediately if write lock not available

Write Lock, Block/Wait Forever

Grab an exclusive write lock on foo, or block/wait forever for all readers to finish.

new ReentrantReadWriteEntityLock<T>(foo) {
  @Override
  public T transaction() throws Exception {
    // ... do read or write work, safely.
    return baz;
  }
}.write(true); // Wait forever

An Example

The Havalo-KVS project makes extensive real-world use of this locking mechanism, as a way to manage shared entities that may be concurrently accessed by any number of threads. Havalo-KVS is a lightweight key-value store written in Java. Internally, it maintains a collection of repositories and objects, and uses Bolt to conditionally gate access to these objects in local memory.

GitHub

Bolt is free, and open source, on GitHub:

https://github.com/markkolich/kolich-bolt

Pull requests welcome.

A New Blogging Platform: GitHub, Twitter Bootstrap, Curacao

665f0b8f6cc90d879fd7710dee66e31d3512b895

Sun Jan 26 20:22:28 2014 -0800

I finally dumped Movable Type and decided to invest a bit of engineering time into a new blogging platform for myself.

I started blogging in 2008, and at that time, the self-hosted blogging platform options were somewhat slim — choose Movable Type (built on Perl and CGI) or choose Wordpress (built on PHP, riddled with security issues). Given its poor track record, I didn’t trust Wordpress, so the choice was obvious: Movable Type. However, as the web progressed, I quickly found myself using an outdated blogging tool entirely due to my own laziness; I started with Movable Type version-4.21 and never bothered to upgrade it over the course of 6-years. Frankly, each time I mustered up the courage to upgrade my Movable Type install, I just gave up and poured myself a drink — it just wasn’t something I wanted to tackle, full of gotch-ya’s and landmines. I had better things to do.

The straw-that-broke-the-camel’s-back came last August in the form of this post, Introducing Havalo, A Non-Distributed NoSQL Key-Value Store for your Servlet Container. I recall using Movable Type’s buggy editor to hammer out that blog post, and kept asking myself, “Why am I doing this? Can’t I just write this blog post using Markdown? And why the hell am I still using Perl/CGI?” Basically, 2008 called, and wanted its blogging platform back. And so, the crusade for something better began.

Most folks would have immediately jumped to Blogger, or [insert name of popular hosted blogging service here]. Apparently I like reinventing wheels, and this was an opportunity to really dig into some technologies I’ve been interested in for a while but didn’t have the need to formally explore. It was finally time to build my own open source blogging platform, from scratch, using the technologies I love.

GitHub Integration & JGit

I love Git, and GitHub.

So, it felt completely natural to use GitHub as the underlying datastore for all of my blog content, not just the source code behind it. Here’s the content creation workflow I desired:

  • Create new Markdown file in GitHub hosted repository: touch new-entry-with-some-name.md
  • Write blog entry using Markdown in any editor I choose: vi new-entry-with-some-name.md
  • Commit new blog entry to repository: git commit -a -m "Title of new blog post"
  • Push file to the remote, to publish new entry: git push origin master

Then, once pushed, the JVM based web-app serving my blog to the world shall git pull and automagically update itself to show the new entry on the web. In other words, I can just write, commit, push, and grab a Snickers. Done.

Well, this is exactly what I built.

On the server side, the JVM-based web-app behind my blog uses JGit to initially clone on startup, and then subsequently pull on a fixed interval to keep itself up-to-date — a local clone of my blog repository is kept and managed on disk by the web-app. Blog entries are sorted and stored in-memory based on their commit date, so the newest entry is always first.

Markdown support with Pegdown

I love Markdown.

Please, no more crappy “rich HTML editors”, okay? I just want to be able to create content using the powerful Markdown syntax I’m familiar with:

### Some Heading
A paragraph.
1. A numbered
2. list
3. with some **bold** text.
<img src="an-image.png"/>
[A link to somewhere awesome](http://awesome.example.com)

I pulled in Pegdown, a pure-Java Markdown processor based on a parboiled PEG parser. It integrated with my app beautifully — all content, pages and entries, could now be written in Markdown and I wasn’t tied to some awful web-interface with a buggy editor.

Page templating was still important, so I integrated FreeMarker into the mix for common page components. Pegdown consumes Markdown and spits out HTML, which is then piped into a FreeMarker template to produce near final HTML:

<#include "common/header.ftl">

  <h2 class="title">${title}</h2>
  <p class="hash"><a href="https://github.com/markkolich/blog/commit/${commit}">${commit}</a></p>
  <p class="date">${date}</p>

  <!-- Markdown generated HTML content -->
  <article>${content}</article>

<#include "common/footer.ftl">

Lastly, the HTML Compressor was brought in to “compress” the FreeMarker generated HTML, removing wasted bytes like line breaks and other spacing characters between HTML tags. The final HTML of each page is quite minimal and compresses nicely over a GZIP’ed transport stream, like GZIP’ed compressed HTTP.

Responsive UI from Twitter Bootstrap

I love Twitter Bootstrap.

One wheel I didn’t reinvent was the “responsive” UI layer for my new blogging platform. I wanted something simple, beautiful, and off-the-shelf. I snagged the latest version of Bootstrap, and chose its Spacelab theme. Given the flexible and extensible nature of Bootstrap, I can in theory, replace this theme with any other for a completely different look without any code rewrites.

Let’s not forget Bootstrap’s responsive design — the act of gracefully degrading or seamlessly transitioning to a view suitable for any device using CSS3 media queries. For years I’ve been maintaining a “mobile” version of my blog at http://mobi.koli.ch. Given Bootstrap’s built in responsiveness, I was finally able to shut down this dedicated mobile portal for good. When you view my new blog on on a mobile device, like an iPhone or an iPad, you’ll notice the view gracefully hides the right most column and displays just the page content. When on a larger device, like a notebook, the right column is visible with more screen real estate.

In short, the mobile version of my blog is now “built in, for free” — whatever device you happen to be on, you’ll see the most appropriate view that’s optimized for your device. And, more importantly, I no longer have to run and maintain a separate web-app just for mobile devices.

Highly Optimized JavaScript with jQuery

I love jQuery.

Bootstrap’s core is written around jQuery, so it was a natural fit.

All JavaScript was a simple extension of the closure pattern, which made it easy for me to write robust and modular code. At build time, my JavaScript is packed, highly optimized, and minified using Google’s Closure Compiler. Similarly, all CSS is packed and minified using the YUI Compressor. Closure Compiler and YUI Compressor support is integrated directly into my SBT Build.scala file, such that all JavaScript and CSS is packaged anytime the compile task is launched.

Async Servlet Web-Layer with Curacao

I love the JVM, and JVM based web-applications.

When choosing a server side web-layer for this project, I mulled over several options:

  • Spring Framework — Too “enterprisey”, too bloated, thread based, meh.
  • Play Framework — Akka Actor based, asynchronous overkill, too “off-the-shelf”, meh.
  • Spray — Scala only, Akka Actor based, poorly documented, too “academic”, meh.
  • Raw Servlet 3.0 — Thread based, asynchronous, too “raw”, meh.
  • Write my own?

I didn’t need Akka Actor’s for this project, that’s complete overkill — threads will do just fine. On the other hand, I like Spring (for the most part) but I really wanted to avoid all of the awful XML configuration and silly annotations. In the end, I wrote my own asynchronous Servlet web-layer.

Meet Curacao, a open source toolkit for building REST/HTTP-based integration layers on top of asynchronous Servlet’s:

https://github.com/markkolich/curacao

Curacao borrows concepts from Play, Spray, and Spring — I “merged” what I felt was the best of these worlds into a single toolkit. Note I say “toolkit” and not framework, because Curacao is not built to be an end all-be all framework. I intentionally avoided things like ORM, JDBC, AOP, etc.

With Curacao I can write powerful and completely asynchronous thread based web-services on top of any Servlet 3.0 compatible container. Here’s a sample @Controller implementation in Curacao:

@Controller
public final class Blog {

  private final EntryCache cache_;

  @Injectable
  public Blog(final EntryCache cache) {
    cache_ = cache;
  }

  @GET("/")
  public final Entry index() {
    return cache_.get("index");
  }

  @GET("/{name}/**")
  public final Entry entry(@Path("name") final String name) {
    return cache_.get(name);
  }

}

This blog is deployed on Tomcat, but Curacao works with Jetty, Resin, and Undertow too.

Caching with Apache’s mod_cache

No web-service would be complete without some reasonable level of caching. For that, I turned to Apache’s mod_cache. I’m already running Tomcat behind Apache’s mod_proxy_ajp so configuring mod_cache in conjunction with mod_expires was trivial.

First, mod_expires is in charge of setting an appropriate Expires HTTP response header on each response. This is important for mod_cache given it keys on the Expires header to identify what resources it can/should cache, and for how long. Here’s my mod_expires configuration within Apache:

ExpiresActive On

## Default expiry is 5-minutes.
ExpiresDefault "access plus 5 minutes"

## Static content, images, fonts, etc. cache for 1-hour.
ExpiresByType image/jpeg "access plus 1 hour"
ExpiresByType image/png "access plus 1 hour"
ExpiresByType image/x-icon "access plus 1 hour"
ExpiresByType application/font-woff "access plus 1 hour"
ExpiresByType application/x-font-ttf "access plus 1 hour"

## For Atom/RSS feed, and XML sitemap.
ExpiresByType text/xml "access plus 1 hour"

## For robots.txt caching.
ExpiresByType text/plain "access plus 1 hour"

The caching of images, fonts, and dynamic resources like atom.xml and robots.txt are slightly more aggressive given they rarely change and are requested more often by crawlers and bots.

To hold cached content, I configured a large RAM disk mounted at /mem for mod_cache:

CacheRoot /mem
CacheEnable disk /

## The default duration to cache a document when no expiry date is specified.
CacheDefaultExpire 300
## The maximum time in seconds to cache a document.
CacheMaxExpire 300

CacheIgnoreNoLastMod On
CacheIgnoreCacheControl On

## The maximum size (in bytes) of a document to be placed in the cache.
CacheMaxFileSize 104857600

## NOTE: CacheDirLevels * CacheDirLength must not be > 20
CacheDirLevels 10
CacheDirLength 2

A RAM disk is a filesystem mounted in volatile memory — access to files and resources cached in a RAM disk is almost always significantly faster than hitting a spinning disk platter. The goal here being that common requests for static resources, and other content that rarely changes, will usually not hit application code. Instead, they would be fresh in the cache.

This blog, and all its content, is open source

Last but not least, all code and content (everything you’re reading here) is open source on GitHub at:

https://github.com/markkolich/blog

Of course, you can clone your own local copy of my blog and all of its content anytime you wish:

git clone https://github.com/markkolich/blog.git

Pull requests welcome!

Enjoy.

Introducing Havalo, A Non-Distributed NoSQL Key-Value Store for your Servlet Container

22d055f77c84531882218072cbd886b8480677c3

Sun Aug 18 18:50:00 2013 -0700

Someone recently asked me, “why spend time building your own key-value store when other trusted solutions like Redis, Mongo, and CouchDB are available off-the-shelf?”

Because I can!

Some History

It all started last year when I began building my wedding web-site. My wife and I tied the knot in September 2012, and I was chartered with building a web-site for the big day. Naturally, being a tech savvy couple, we opted to allow our guests to RSVP online on our wedding web-site. Steak, or fish?

So, the search for a data store began.

I had enough of JDBC, and the monstrosities that came with it – that is, Hibernate and iBATIS. And, I had no intention of firing up a traditional database to store something relatively trivial like a set of RSVP’s to a wedding. Further, given that my wedding web-site was built on Spring 3.2 and ran in a traditional Servlet container, I yearned for a solution that wouldn’t require any additional services or deployment steps. I wanted a data store that would fire up along side of my existing web-applications whenever I started my Servlet container, and just work.

Meet Havalo

Oh, hey!

Written in Java 7, Havalo is a zero configuration, non-distributed NoSQL key-value store that runs in any Servlet 3.0 compatible container.

Sometimes you just need fast NoSQL storage, but don’t need full redundancy and scalability (that’s right, localhost will do just fine). With Havalo, simply drop a WAR into your favorite Servlet 3.0 compatible container and with almost no configuration you’ll have access to a fast and lightweight K,V store backed by any local mount point for persistent storage. And, Havalo has a pleasantly simple RESTful API for your added enjoyment.

Havalo is built around raw asynchronous Servlets and runs inside of any Servlet 3.0 container. Further, Havalo does not use any bloated frameworks or toolkits, it is as minimal and as lightweight as I could build it.

Oh, and Havalo is open source, licensed for free to the world under the popular MIT License.

Technical Challenges

There were several technical challenges I overcame while I built Havalo:

  • Resource Locking - When you “PUT” (upload) an object into Havalo, the object is eventually saved to a disk platter as a vanilla file. Attempting to retrieve this file later in a multi-threaded application lead to strange behavior depending on which file system and operating system you were using. For example, on Windows (NTFS), if one thread in the JVM has a open FileInputStream to a file on disk but another thread in the same JVM comes along and tries to delete that file, the delete will immediately fail. However, on Linux (ext3), the delete will succeed but the file will not be physically removed from the platters until all open file handles pointing to the file are closed. To work around this inconsistent behavior, Havalo does not rely on the file system to manage resource locking. Instead, Havalo uses kolich-bolt internally to manage a set of ReentrantReadWriteLock’s in memory, which manages reader/writer access to the raw object files on disk.

  • In Memory Indexing - I spent quite a while trying to identify the right data structures to use internally that would allow fast object insertion and very fast searching of existing object keys. For example, say a user inserted three objects into their bucket, with keys “foo”, “foobar”, and “dog”. I needed a data structure that would let me quickly find objects that started with a given prefix. In this case, if a user asked for all objects that started with “fo”, naturally Havalo would return “foo”, and “foobar”. In the end, I used a Trie data structure, which worked beautifully. In fact, Havalo uses the patricia-trie implementation exclusively.

  • Filename and filename length - If you’ve ever dealt with an application that saves files to disk, you may be familiar with the restrictions file systems place on the length of your filenames. That is, the length of the path to a file on disk. Most modern file systems, including ext3 and NTFS, strictly enforce a maximum filename length of 255-bytes. So, if Havalo was to accept objects identified by keys longer than 255-bytes, a workaround was required. Fortunately, this was easily solved by base-32 encoding a hash of the given key to produce a normalized filename. Yes, that’s not a typo, base-32. This has an added benefit, in that base-32 only uses the digits 0-9 and uppercase letters A-Z. As a result, this worked around buggy file systems that ignore filename case (e.g., on NTFS a file named “xyz” is the same as “XyZ”). So, using filenames that only contain the digits 0-9 and uppercase letters A-Z completely dances around this problem.

  • Java Client Bindings - The Havalo API is a RESTful service which deals exclusively in JSON. To facilitate adoption of the API, literally to make it easier to consume the service, I wrote a robust set of Java bindings (a client) for the Havalo API. You’ll find the havalo-kvs-client project is an easy off-the-shelf solution which can be integrated into your own application.

Download It

You can find instructions and additional technical details on how to download and configure Havalo for your own environment on my Havalo GitHub page at https://github.com/markkolich/havalo-kvs.

Havalo is open source, so you are always welcome to browse the source and submit a pull request if you’d like to contribute.

Enjoy!

Thoughts on Using Raw java.util.UUID's in Your Web-Application or Web-Service: Check your UUID Length's Too

47dc940a4a615e48319720c7ee351344ba788395

Mon May 13 19:15:00 2013 -0700

You’re probably familiar with UUID’s — those ubiquitous universally unique identifiers used in just about every modern web-application or web-service. And, if you’re a developer living on the JVM, you’re probably close friends with java.util.UUID whether you like it or not.

Generally speaking, UUID’s are a convenient way to represent some unique object or entity inside of an application. After all, they’re supposed to be “universally unique” and random enough such that an application can, in theory, generate “random” UUID’s forever without any collisions. In other words, UUID’s are represented by a 128-bit number under-the-hood, so the total number of possible UUID’s is immense — 340,282,366,920,938,463,463,374,607,431,768,211,456 unique UUID’s to be exact.

No application today could possibly have more than 340,282,366,920,938,463,463,374,607,431,768,211,456 users, or need to store more than 340,282,366,920,938,463,463,374,607,431,768,211,456 objects, right?

So what’s the big deal?

In their canonical form, UUID’s are represented by 32 hexadecimal digits, displayed in five groups separated by hyphens, in the form 8-4-4-4-12 for a total of 36 characters. For example:

scala> import java.util.UUID

scala> UUID.randomUUID
res0: 09bf989f-5b24-47bc-871e-1e824d4f4c60

Again, note that UUID’s are typically represented by 32-hexadecimal digits, with a canoncial form string length of 36 (including the hyphens).

scala> UUID.randomUUID.toString.length
res1: Int = 36

And therein lies the rub.

Given that UUID’s are represented by a series of hexadecimal digits, it occurs to me that appending a long string of leading zeros, or even omitting a leading zero (if present), still results in a valid UUID. For example, 0x0000000A is equivalent to 0x0A, or even 0xA.

That said, these UUID’s are logically identical:

9bf989f-5b24-47bc-871e-1e824d4f4c60
09bf989f-5b24-47bc-871e-1e824d4f4c60
00000000000000000000000000000009bf989f-5b24-47bc-871e-1e824d4f4c60

At least, according to java.util.UUID they are!

OK, so, what’s the problem?

Well, consider this: if you use UUID’s in the paths to resources in your web-service or web-application, you need to make sure your application (or the framework you’re using) does the right thing with egregiously long, or slightly short, UUID’s represented in a URI as a String.

For example, take this request:

GET:/api/object/{uuid}

Within the business logic, many web-applications (and frameworks) do something like the following:

val id = UUID.fromString({uuid})

In theory, this can lead to a number of wonderful exploits, including buffer overflow attacks and other awesome denial of service breakdowns in your web-service or web-application.

What’s a developer to do?

Well, the long and short of it is that at the end of the day, you also have to check the length of incoming UUID’s that you plan to “do something useful” with in your application. If the incoming UUID is longer than 36-characters or shorter than 36-characters, you’re wasting your time.

So, here’s a quick regular expression that does the right thing as far as “syntactically” correct UUID’s go:

^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$

And now, we can use the Scala interactive interpreter to verify our new regular expression:

scala> val r =
     | "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$".r

scala> r.findFirstIn("000000000000000009bf989f-5b24-47bc-871e-1e824d4f4c60")
res0: Option[String] = None

scala> r.findFirstIn("09bf989f-5b24-47bc-871e-1e824d4f4c60")
res1: Option[String] = Some(09bf989f-5b24-47bc-871e-1e824d4f4c60)

Yay! Note that on the first call to findFirstIn, a None (no match) was returned. On the second invocation with a UUID of the correct length, Some(uuid) was returned given the input String was syntactically correct and of a perfect length.

So, in the end, not a huge deal but it’s good to keep in mind that when dealing with UUID’s you cannot rely on java.util.UUID alone to parse and verify an incoming identifier. In the end, you’ve got to use your own UUID verification regular expression. Or, better yet, use the verification mechanisms provided by your web-service or web-application framework (if one exists) to verify the length of incoming UUID’s.

Enjoy.

Resolve Custom Object Arguments in a Spring 3 Controller @RequestMapping Annotated Method

4330b54cda01f01539659f6955d74cf561c0f61a

Sat Jul 30 18:35:00 2011 -0700

Spring 3 is great at automatically resolving standard arguments into a controller request method.

For example, a primitive Spring controller might look like this …

@Controller
@RequestMapping(value="/somepath")
public final class MyController {

  @RequestMapping(method={RequestMethod.GET, RequestMethod.HEAD})
  public ModelAndView someMethod(final HttpServletRequest request,
    final Principal principal) {
    // Extract some special object needed to process the request from
    // the session -- this object is bound to the session elsewhere on
    // a successful authentication.
    final MyObject obj = (MyObject)request.getSession().getAttribute("myobjkey");
    // Do actual work.
    /* ... */
    return new ModelAndView("someview");
  }

}

In this case, Spring knows the HttpServletRequest argument represents the incoming Servlet request, and the Principal argument is the object representing the authenticated user (in the event that you’re using Spring Security to manage authentication in your web-application). On method invocation, Spring automatically resolves these arguments for you. Neat!

However, the repetition becomes obvious where in every controller, you need to fetch the same MyObject from the session, over and over again. Instead of repeating that line of code in every method of every controller that needs access to MyObject, what if you could tell Spring how to resolve MyObject automatically on invocation?

Let’s say you’ve defined a custom object and bound it to the session on a successful authentication (either on your own or via Spring Security) …

import java.util.UUID;

public final class MyObject {

  // A unique and static identifier.
  private final UUID id_;

  public MyObject() {
    id_ = UUID.randomUUID();
  }

  public UUID getId() {
    return id_;
  }

}

Good news! Spring can automatically resolve an argument of type MyObject if used as an argument into a controller request method.

Meet WebArgumentResolver

The WebArgumentResolver interface let’s you define a bean that tells Spring where to find a custom argument of any type when used in a controller request method. Here’s an example that tells Spring where to find an argument of type MyObject bound to the session …

import static javax.servlet.jsp.PageContext.SESSION_SCOPE;

import org.springframework.core.MethodParameter;
import org.springframework.web.bind.support.WebArgumentResolver;
import org.springframework.web.context.request.NativeWebRequest;

public final class SessionExtractingWebArgumentResolver implements WebArgumentResolver {

  @Override
  public Object resolveArgument(final MethodParameter mp,
    final NativeWebRequest nwr) throws Exception {
    Object argument = UNRESOLVED;
    if(mp.getParameterType().equals(MyObject.class)) {
      // Assumes that a MyObject is bound to the session elsewhere using
      // attribute key "myobjkey" on a successful authentication.
      if((argument = nwr.getAttribute("myobjkey", SESSION_SCOPE)) == null) {
        throw new Exception("Fail, no MyObject bound to session!");
      }
    }
    return argument;
  }

}

Now that we’ve defined our WebArgumentResolver bean, we can wire it into in our Spring MVC configuration.

Wire it Into Spring MVC

Wire your custom WebArgumentResolver into Spring’s AnnotationHandlerMethodAdapter using a quick declaration in your Spring MVC XML configuration. Here’s an example …

<bean class="org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter">
  <property name="customArgumentResolver">
    <bean class="your.package.SessionExtractingWebArgumentResolver" />
  </property>
</bean>

Yay! Now, if Spring encounters a MyObject argument in a controller request method, it will look it up using our custom WebArgumentResolver and extract it from the session accordingly.

An Improved Controller

Now that Spring can resolve MyObject automatically, we can use it as an argument into any controller request method …

@RequestMapping(method={RequestMethod.GET, RequestMethod.HEAD})
public ModelAndView better(final MyObject o) {
  // ...
}

No more ugly repetition, and less code. Enjoy.

More Fun with RFC 2397: the "data" URL scheme and Mobile Networks

52bf8ba90ef0840da7603d930628a8649b8759dd

Tue Jan 11 14:20:00 2011 -0800

In July of ’09, when I first learned of the “data” URL scheme, I was pumped. With a little work, my web-applications could use the “data” URL scheme to embed actual base-64 encoded binary image data directly inside of my HTML and CSS. In the same post, I subsequently commented on why this scheme can be incredibly useful, especially for mobile web-applications or API’s that service mobile apps. Even with significant advances in wireless networks over the past several years, traditional HTTP continues to lag (for the most part) over poor 3G and 4G networks. For this reason, the “data” URL scheme can be a life saver — you can embed binary image data directly inside of your HTML and CSS, freeing the device from initiating wasteful HTTP transactions to load these images later.

Today marked yet another personal milestone for my usage of the “data” URL scheme. Building an API that services a mobile app for the HP/Palm webOS platform, I quickly rediscovered the importance of this scheme. It turns out I can embed base-64 encoded binary image data in a JSON response payload that is sent directly to a wireless webOS device! What this means, is that I can build my API resource to send everything the app requested, including any additional external resources like images, in a single HTTP response!

An Example

Imagine the app is fetching details about a user from my API. The HTTP request leaving the app might look something like this:

GET /user/markkolich.json HTTP/1.1
Host: api.example.com
User-Agent: webOS

And, a normal HTTP response might look something like this:

HTTP/1.0 200 OK
Date: Fri, 21 Jan 2011 23:09:32 GMT
Content-Type: application/json;charset=UTF-8
Content-Length: 78
Vary: Accept-Encoding,User-Agent
Connection: close

{
 "user":"markkolich",
 "name":"Mark Kolich",
 "avatar":"http://api.example.com/images/markkolich.png"
}

Nothing special. But note that the API response, a JSON object, contains a URL to an avatar image which in all likelihood the app will need to fetch and display later. The problem here, is that now that the app has the response in its hands, it has to turn around and kick off yet another HTTP transaction to load the image from the provided URL. And that additional fetch could be very painful over a slow wireless network with quite a bit of latency.

The “data” URL Scheme to the Rescue

Instead of the API sending just a URL that points to an image, it could also include the actual image data itself, a base-64 encoded “data” URL in the JSON response. For example:

{
 "user":"markkolich",
 "name":"Mark Kolich",
 "avatar":{
  "url":"http://api.example.com/images/markkolich.png",
  "data_uri":"
    ABCAAAAAA6fptVAAAAAnRSTlMA/1uRIrUAAAACYktHRAD+8Ij8KQAAAAlwSFlz
    AAAASAAAAEgARslrPgAAAAl2cEFnAAAAAQAAAAEAx5Vf7QAAAApJREFUCNdj+A
    8AAQEBABu27lYAAAAASUVORK5CYII="
 }
}

I know that this isn’t technically valid JSON because I’m line wrapping in the middle of the base-64 encoded image payload; I’m wrapping so that I can fit the entire JSON block into a single column on my blog for display purposes. For the record, your data payload shouldn’t have any line breaks in it.

Looking at the new response, this is far superior to sending just a URL that points to an image. Yes, we’re sending a little more data, but now the app does not have to initiate another HTTP transaction to load the avatar. Due to the usually poor latency of wireless networks (EDGE, 3G, 4G, etc.), and the natural overhead of HTTP, it’s far better to send more data at once in a single transaction than over multiple smaller transactions.

Finally, in JavaScript, sourcing the encoded image data into an actual Image object is trivial:

// Assume userObj is the JSON object returned
// in my example response above.
var userObj = { /*...*/ };

// Straight up.
(new Image).src = userObj.avatar.data_uri;

// Maybe you prefer jQuery?
$("<img>").attr("src", userObj.avatar.data_uri);

Assuming your app platform supports the “data” URL scheme this strategy wins every time, hands down.

The “data” URL Format

Putting it all together, the RFC 2397 says the accepted syntax/format of data URI’s are as follows:

data:[<mediatype>][;base64],<data>

So you’ll need to define a media type (the Content-Type), declare that the data is base-64 encoded and provide an encoded payload.

Determining the Content-Type

If you don’t already know the Content-Type of the image you plan to base-64 encode, then you’ll have to discover it. This isn’t too hard, and involves writing a bit of code that checks the header of your image to determine its type. If you examine the specs of each image format you plan to support, you’ll probably find that:

a. Every JPEG-image starts with a quick 2-byte “Start of Image” (SOI) marker:

0xFF D8

b. Every PNG-image starts with a fixed 8-byte signature:

0x89 50 4E 47 0D 0A 1A 0A

c. Every GIF-image starts with a fixed 3-byte signature:

0x47 49 46

I’m not going to post any sample solutions here, because iterating over byte[] arrays and comparing values to determine an image format is trivial. However, in Java, it may help you to think of each image format as a value in an enumeration:

public enum ImageContentType {
  
  PNG("image/png", new byte[]{
    (byte)0x89, (byte)0x50, (byte)0x4E, (byte)0x47, 
    (byte)0x0D, (byte)0x0A, (byte)0x1A, (byte)0x0A
    }),

  JPEG("image/jpeg", new byte[]{
    (byte)0xFF, (byte)0xD8
    }),

  GIF("image/gif", new byte[]{
    (byte)0x47, (byte)0x49, (byte)0x46
    });
  
  private String contentType_;
  private byte[] header_;
  
  private ImageContentType(String contentType, byte[] header) {
    contentType_ = contentType;
    header_ = header;
  }
  
  public byte[] getHeader() {
    return header_;
  }
  
  public String getContentType() {
    return contentType_;
  }
  
  @Override
  public String toString() {
    return getContentType();
  }
  
  public static final ImageContentType getContentType(final byte[] image) {
    ImageContentType ict = null;
    for(final ImageContentType type : ImageContentType.values()) {
      /* compare the header of image[] to type.getHeader() */
    }
    return ict;
  }
  
}

Of course, you could always just check the file extension of the image you plan to encode, and determine a Content-Type based on that. In other words, if the resource ends in .jpg then you could assume, with reasonable certainty, that the Content-Type is image/jpeg. However, only relying on the file extension to tell you the format can be a dangerous strategy if you’re not careful.

Encode and Assemble

Now that you know the Content-Type, you can base-64 encode the image payload and assemble a valid “data” URI string. Don’t bother writing a base-64 encoder from scratch, since there are many wonderful open-source implementations available for free. The most popular seems to be in the Apache Commons Codec library. Specifically, take a peek at org.apache.commons.codec.binary.Base64.

So, here’s some pseudo code illustrating how to put it all together:

import org.apache.commons.codec.binary.Base64;

private static final String DATA_URI_SCHEME = "data:%s;base64,%s";

/* ... */

// Create a new Base64 object, setting the line length to zero
// so that the output is not chunked (e.g., no line breaks).
final Base64 b64 = new Base64(0);

// Some image data, that you've fetched elsewhere.
final byte[] image = ...;

// Discover the image format.
final ImageContentType ict = ImageContentType.getContentType(image);

// Encode the image.
byte[] encoded = b64.encodeBase64(image);

// Convert the encoded byte array into its String representation.
// Base-64 is just ASCII, to this is totally fine.
String imageEncoded = new String(encoded, "UTF-8");

/* ... */

// Build the data URI String for inclusion in our API response.
final String dataUri = String.format(DATA_URI_SCHEME,
  ict.toString(), imageEncoded);

Now that you’ve assembled a valid data URI String, it’s simple to attach it to a JSON object and return it to the caller.

Remember to Close Your Streams When Using Java's Runtime.getRuntime().exec()

b19ed7abf496c9fca1b0f3b1429d59ff0693939b

Wed Jan 05 14:25:00 2011 -0800

For more than a year, I got away with forgetting to close my standard I/O streams when spawning a process in Java with Runtime.getRuntime().exec(). On Linux, I was using exec() to spawn the df command to check my file system disk space usage. Standard out from df was piped into the parent (Java) where I parsed the output to see if any partitions were getting full. Simple enough, right?

In December 2010, I began experimenting with Java’s next generation garbage collection engine, aptly named G1 (a.k.a., Garbage First). Assuming you have Java 6 Update 14 or later you can enable the next-generation G1 garbage collector (still experimental as of Jan 2011) using the following JVM options:

-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

This post isn’t about G1, so I’m not going to dig into the nitty-gritty on garbage collection. However, I discovered that G1 isn’t as aggressive as the current Java garbage collector with regards to cleaning up streams. Bug in G1? Maybe, maybe not. Regardless, my app ran for a week or two with G1 enabled then I started to see all sorts of silly java.net.SocketException’s claiming I had “Too many open files”. Using the trusty lsof command, I saw that my Java process had left open a ton of stranded pipes. Definitely an indication of a leak somewhere …

#/> lsof -p 23064 | grep pipe
...
java    23064 mark  996w  FIFO    0,7    152581 pipe
java    23064 mark  997r  FIFO    0,7    152309 pipe
java    23064 mark  998r  FIFO    0,7    152448 pipe
java    23064 mark  999w  FIFO    0,7    152720 pipe
java    23064 mark 1000w  FIFO    0,7    152859 pipe
java    23064 mark 1001r  FIFO    0,7    152583 pipe
java    23064 mark 1002w  FIFO    0,7    153134 pipe
java    23064 mark 1003r  FIFO    0,7    152722 pipe
java    23064 mark 1004w  FIFO    0,7    154801 pipe
java    23064 mark 1005r  FIFO    0,7    152861 pipe
java    23064 mark 1006w  FIFO    0,7    152997 pipe
java    23064 mark 1007w  FIFO    0,7    153564 pipe
java    23064 mark 1008r  FIFO    0,7    152999 pipe
java    23064 mark 1009r  FIFO    0,7    153136 pipe
java    23064 mark 1010r  FIFO    0,7    153278 pipe
java    23064 mark 1011w  FIFO    0,7    153406 pipe
java    23064 mark 1012w  FIFO    0,7    153713 pipe
...

With a little persistence, I crawled through my code looking for any obvious problem spots — places where I forgot to close a stream — and discovered that my calls to exec() were problematic. Calling exec() returns a Process object for the child where all standard I/O ops are redirected to the parent through three streams: STDOUT, STDIN, STDERR. It turns out you have to explicitly close these streams when you’re done with the child otherwise they are left open! And, as you can see in the lsof output above, I was not closing these streams causing a nasty leak which eventually brought down my application.

Going back to differences in the garbage collectors, it seems that the current default garbage collector cleaned up after my mess (closed the streams for me), but G1 did not. Hence why I never saw the “Too many open files” exception until I enabled G1.

That said, the undocumented proper way of handing a Process object and its corresponding I/O streams is to wrap the exec() call in a try-finally block, closing the STDOUT, STDIN, and STDERR streams when you’re done with the Process object. The abstract class java.lang.Process exposes these three streams to you via getOutputStream(), getInputStream() and getErrorStream() which you must explicitly close.

Here’s the pseudo code:

import static org.apache.commons.io.IOUtils.closeQuietly;

Process p = null;
try {
  p = Runtime.getRuntime().exec(...);
  // Do something with p.
} finally {
  if(p != null) {
    closeQuietly(p.getOutputStream());
    closeQuietly(p.getInputStream());
    closeQuietly(p.getErrorStream());
  }
}

Note that closeQuietly() is part of the Apache Commons IOUtils library — it’s a helper method to close a stream ignoring nulls and exceptions. With this change in place, I redeployed my app and sure enough the problem was resolved.

Lesson learned: regardless of what garbage collector you’re using, it’s always a good idea to explicitly close the STDOUT, STDIN, and STDERR streams associated with a Process object when you are done with it.

Enjoy.

Better Mobile Device Detection with a Spring 3 Interceptor

7811084b8cd81753dc3d0050fd17413375d42de0

Mon Oct 18 22:05:00 2010 -0700

The other day on some Java/JSP tutorial web-site I saw the worst example ever for detecting and properly rendering a mobile capable version of a web-site. Yes, I’m pointing at you Roseindia. In every JSP of this example, their mobile User-Agent detection involved one big if/else block:

<%

String userAgent = request.getHeader("User-Agent");
if(userAgent.contains("iPhone")) {
  %> Mobile site! <%
} else {
  %> Regular site! <%
}

%>

This wins the worst coding example of the year award. Here’s why this is terrible and you should never use this example:

  1. Not all requests contain a User-Agent header. In fact, the User-Agent header is purely optional. In the code above, if the request does not contain a User-Agent you’ll see a nice NullPointerException thrown at userAgent.contains() given that the userAgent is null.
  2. Not every mobile device is an iPhone. What about Blackberry, Android or Palm clients? Blindly assuming that every mobile user is on an iPhone, or other similar device, is horrendously ignorant.
  3. Many great frameworks exist, like Spring 3 MVC, that allow you to separate your web-application business and display logic. This in mind, combing both into a single JSP is a bad idea for a number of reasons. In an ideal world, your mobile device detection would occur in an interceptor that triggers your MVC framework to render one view for mobile devices, and another view for all others.

This is a fairly common requirement: users visiting your site in a standard web-browser see one view, and users on a mobile device (like a Palm) see a “mobile version” of the same view. So, here’s a way to implement better mobile device detection in your web-application using a Spring 3 HandlerInterceptorAdapter.

Note that I assume you are familiar with Spring 3 MVC, and have a working Spring 3 application already up and running.

Configure Spring

First, you’ll need to make the necessary adjustments to your Spring MVC configuration which usually involves tweaking your mvc.xml configuration file. Regardless of where your MVC XML configuration is, you’ll be defining an MVC interceptor for your application like so:

<mvc:interceptors>

  <mvc:interceptor>
    <mvc:mapping path="/somepath**" />
    <mvc:mapping path="/anotherpath**" />
    <bean class="com.kolich.spring.interceptors.MobileInterceptor"
        init-method="init">
        <property name="mobileUserAgents">
          <list value-type="java.lang.String">
            <value>.*(webos|palm|treo).*</value>
            <value>.*(android).*</value>
            <value>.*(kindle|pocket|o2|vodaphone|wap|midp|psp).*</value>
            <value>.*(iphone|ipod).*</value>
            <value>.*(blackberry|opera mini).*</value>
          </list>
        </property>
    </bean>
  </mvc:interceptor>

</mvc:interceptors>

This bean, which I will discuss next, extends Spring’s abstract HandlerInterceptorAdapter. We will build this interceptor so that it’s called right before a Spring view is rendered, giving the interceptor a chance to modify the final view name as necessary. Also, note that this bean defines a List of regular expressions that the interceptor will use to determine if the client is a mobile device. You can add or remove regular expressions to this List depending on which mobile devices (a.k.a., User-Agent’s) you plan to support.

If you are not familiar with Spring interceptors, you might like to read up on them here.

The MobileInterceptor

Without further ado, here’s my MobileInterceptor:

/**
 * Copyright (c) 2010 Mark S. Kolich
 * http://mark.koli.ch
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */

package com.kolich.spring.interceptors;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.log4j.Logger;
import org.springframework.web.servlet.ModelAndView;
import org.springframework.web.servlet.handler.HandlerInterceptorAdapter;

public class MobileInterceptor extends HandlerInterceptorAdapter {

  /**
   * The name of the mobile view that the viewer is re-directed to
   * in the event that a mobile device is detected.
   */
  private static final String MOBILE_VIEWER_VIEW_NAME = "mobile";

  /**
   * The User-Agent Http header.
   */
  private static final String USER_AGENT_HEADER = "User-Agent";

  private List<String> mobileAgents_;
  private List<Pattern> uaPatterns_;

  public void init() {
    uaPatterns_ = new ArrayList<Pattern>();
    // Pre-compile the user-agent patterns as specified in mvc.xml
    for(final String ua : mobileAgents_) {
      try {
        uaPatterns_.add(Pattern.compile(ua, Pattern.CASE_INSENSITIVE));
      } catch (PatternSyntaxException e) {
        // Ignore the pattern, if it failed to compile
        // for whatever reason.
      }
    }
  }

  @Override
  public void postHandle(HttpServletRequest request,
    HttpServletResponse response, Object handler,
    ModelAndView mav) throws Exception {
    final String userAgent = request.getHeader(USER_AGENT_HEADER);
    if(userAgent != null) {
      // If the User-Agent matches a mobile device, then we set
      // the view name to the mobile view JSP so that a mobile
      // JSP is rendered instead of a normal view.
      if(isMobile(userAgent)) {
        final String view = mav.getViewName();
        // If the incoming view was "homepage" then this interceptor
        // changes the view name to "homepage-mobile" which, depending
        // on your Spring configuration would probably resolve to
        // "homepage-mobile.jsp" to render a mobile version of your
        // site.
        mav.setViewName(view + "-" + MOBILE_VIEWER_VIEW_NAME);
      }
    }
  }

  /**
   * Returns true of the given User-Agent string matches a suspected
   * mobile device.
   * @param agent
   * @return
   */
  private final boolean isMobile(final String agent) {
    boolean mobile = false;
    for(final Pattern p : uaPatterns_) {
      final Matcher m = p.matcher(agent);
      if(m.find()) {
        mobile = true;
        break;
      }
    }
    return mobile;
  }

  public void setMobileUserAgents(List<String> agents) {
    mobileAgents_ = agents;
  }

}

As you probably noticed the real meat of this interceptor bean is inside of the postHandle() method, which examines the User-Agent HTTP request header (if any), checks if it’s a mobile device, and if so slightly changes the resulting view name so that a mobile version of the view is rendered instead of the normal version. According to the Spring documentation, the postHandle() method is “called after HandlerAdapter actually invoked the handler, but before the DispatcherServlet renders the view.” In our case, this is perfect.

Inside of postHandle() my MobileInterceptor retrieves the resolved view name, then if the User-Agent matches that of a known mobile device, it changes the view name by appending “-mobile” to end of it. For example, say you have a view named “about” that is rendered by “about.jsp”. This interceptor would change the resulting view name to “about-mobile” which would be rendered by “about-mobile.jsp” (assuming you are using a standard InternalResourceViewResolver to resolve view names to JSP’s). In other words, this means you can put all of your mobile display logic into about-mobile.jsp, while about.jsp is left in tact for non-mobile clients; you keep your mobile and non-mobile display logic separate in two individual files. Of course, I don’t have to tell you that keeping these separate will make your life as a developer a LOT easier in the long run.

Putting it all Together

Putting everything together, the <mvc:interceptor> XML configuration tells Spring to call my interceptor bean whenever it encounters a specific path. In this case, I told Spring to watch for the paths /somepath and /anotherpath based on the <mvc:mapping>’s you see above in the XML. When Spring handles a request for /somepath or /anotherpath it will call the interceptor at the appropriate point in the chain based on the methods overridden by my bean. In this case, I’ve overridden the postHandle() method such that Spring will call my interceptor bean to do what it needs to do once the view has been resolved and it’s ready to render up some content. Of course, you could also override preHandle() if you needed the interceptor to be called before a view is selected, and so on. Again, take a peek at HandlerInterceptorAdapter for all of the gory details.

Enjoy!

Recursively Deleting Large Amazon S3 Buckets

d7b268e6f71843a5735e21bb9765548b58f9d430

Fri Sep 17 09:54:00 2010 -0700

My first experience using Amazon Web Services for a production quality project was quite fun, and deeply interesting. I’ve played with AWS a bit on my own time, but I recently had a chance to really sink my teeth into it and implement production level code that uses AWS as a real platform for an upcoming web, and mobile application.

Perhaps the most interesting, and frustrating, part of this project involved storing hundreds of thousands of objects in an AWS S3 bucket. If you’re not familiar with S3, it’s the AWS equivalent to an online storage web-service. The concept is simple: you create an S3 “bucket” then shove “objects” into the bucket, creating folders where necessary. Of course, you can also update and delete objects. If it helps, think of S3 as a pseudo online file-system that’s theoretically capable of storing an unlimited amount of data. Yes, I’m talking Exabytes of data … theoretically … if you’re willing to pay Amazon for that much storage.

In any event, I created a new S3 bucket and eventually placed hundreds of thousands of objects into it. S3 handled this with ease. The problem, however, was when it came time to delete this bucket and all objects inside of it. Turns out, there is no native S3 API call that recursively deletes an S3 bucket, or renames it for that matter. I guess Amazon leaves it up to the developer to implement such functionality?

That said, if you need to recursively delete a very large S3 bucket, you really have 2 options: use a tool like s3funnel or write your own tool that efficiently deletes multiple objects concurrently. Note that I say concurrently, otherwise you’ll waste a lot of time sitting around waiting for a single-threaded delete to remove objects one at a time, which is horribly inefficient. Well this sounds like a perfect problem for a thread pool and wouldn’t you guess it, even a CountDownLatch!

The idea here is you’ll want to spawn multiple threads from a controlled thread pool where each thread is responsible for deleting a single object. This way, you can delete 20, 30, 100 objects at a time. Yay for threads!

Here’s the pseudo code. Note that I say pseudo code because it’s not a complete implementation. This examples assumes you have an AWS S3 implementation (a library) that’s able to list objects in a bucket, delete buckets, and delete objects.

package com.kolich.aws.s3.util;

import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

import com.amazonaws.services.s3.model.S3ObjectSummary;

public class RecursiveS3BucketDelete {

  private static final String AWS_ACCESS_KEY_PROPERTY = "aws.key";
  private static final String AWS_SECRET_PROPERTY = "aws.secret";

  /**
   * The -Daws.key and -Daws.secret system properties should
   * be set like this:
   * -Daws.key=AK7895IH1234X2GW12IQ
   * -Daws.secret=1234567890123456789012345678901234456789
   */

  // Set up a new thread pool to delete 20 objects at a time.
  private static final ExecutorService pool__ =
        Executors.newFixedThreadPool(20);

  public static void main(String[] args) {

    final String accessKey = System.getProperty(AWS_ACCESS_KEY_PROPERTY);
    final String secret = System.getProperty(AWS_SECRET_PROPERTY);
    if(accessKey == null || secret == null) {
      throw new IllegalArgumentException("You're missing the " +
          "-Daws.key and -Daws.secret required VM properties.");
    }

    final String bucketName;
    if(args.length < 1) {
      throw new IllegalArgumentException("Missing required " +
          "program argument: bucket name.");
    }
    bucketName = args[0];

    // ... setup your S3 client here.

    List<S3ObjectSummary> objects = null;
    do {
      objects = s3.listObjects(bucketName).getObjectSummaries();
      // Create a new CountDownLatch with a size of how many objects
      // we fetched.  Each worker thread will decrement the latch on
      // completion; the parent waits until all workers are finished
      // before starting a new batch of delete worker threads.
      final CountDownLatch latch = new CountDownLatch(objects.size());
      for(final S3ObjectSummary object : objects) {
        pool__.execute(new Runnable() {
          @Override
          public void run() {
            try {
              s3.deleteObject(bucketName,
                URLEncoder.encode(object.getKey(), "UTF-8"));
            } catch (Exception e) {
              System.err.println(">>>> FAILED to delete object: (" +
                bucketName + ", " + object.getKey()+ ")");
            } finally {
              latch.countDown();
            }
          }
        });
      }
      // Wait here until the current set of threads
      // are done processing.  This prevents us from shoving too
      // many threads into the thread pool; it's a little more
      // controlled this way.
      try {
        System.out.println("Waiting for threads to finish ...");
        // This blocks the parent until all spawned children
        // have finished.
        latch.await();
      } catch (InterruptedException e) { }
    } while(objects != null && !objects.isEmpty());

    pool__.shutdown();

    // Finally, delete the bucket itself.
    try {
      s3.deleteBucket(bucketName);
    } catch (Exception e) {
      System.err.println("Failed to ultimately delete bucket: " +
          bucketName);
    }

  }

}

Additional notes, and warnings:

  • If you’re not familiar with using a CountDownLatch, you can find my [detailed blog post on it here](understanding-javas-countdownlatch.html}.
  • If you’re going to delete multiple objects at a time, you should confirm the S3 library you’re using is thread safe. Many S3 libraries I’ve seen rely on the popular Apache Commons HttpClient to handle the underlying HTTP communication work with S3. However, you should note that HttpClient isn’t thread safe by default, unless you’ve explicitly set it up to use a ThreadSafeClientConnManager.

Spring 3 and Spring Security: Setting your Own Custom /j_spring_security_check Filter Processes URL

16f15e688f3ac42dace9b89aa4e3e1eba668508f

Sat Jul 24 20:20:00 2010 -0700

While working on a new personal project, I decided to pick up and dig into Spring 3 MVC and Spring Security. I’ve touched both of these technologies here and there in a number of other projects, but this new opportunity has really opened the door for a deep dive into Spring.

I setup a few Spring 3 controllers, and integrated Spring Security into my web-app. All went great and so I added a simple form-based login to my Spring Security XML configuration.

Problem: Overriding UsernamePasswordAuthenticationFilter

When setting up a form-based login via a default Spring Security <http:security> configuration, Spring auto generates and configures a UsernamePasswordAuthenticationFilter bean. This filter, by default, responds to the URL /j_spring_security_check when processing a login POST from your web-form. First, I want to override Spring Security’s default login process URL to /login instead of /j_spring_security_check. Second, I’ve configured a Spring 3 controller to display my login web-form when a user visits /login.

That said, here’s the underlying problem with Spring Security’s default UsernamePasswordAuthenticationFilter: I want it to accept and process POST’s to /login, but a GET or any HTTP method to /login should be forwarded to the next filter in the chain. Not surprisingly, you cannot do this with Spring Security’s default UsernamePasswordAuthenticationFilter because it does not @Override the doFilter() method of AbstractAuthenticationProcessingFilter. In short, there’s no way to get and check the incoming HTTP request method and re-route it using the default UsernamePasswordAuthenticationFilter.

Solution: Write your own Spring Security Authentication Filter

If you want a Spring controller to process GET requests to /login, but Spring Security to intercept and process a POST to /login, then you’ll need to write your own Spring Security authentication filter. Here’s the idea:

public class MyFilter extends AbstractAuthenticationProcessingFilter {

  private static final String DEFAULT_FILTER_PROCESSES_URL = "/login";
  private static final String POST = "POST";

  public MyFilter () {
    super(DEFAULT_FILTER_PROCESSES_URL);
  }

  @Override
  public Authentication attemptAuthentication(HttpServletRequest request,
    HttpServletResponse response) throws AuthenticationException,
    IOException, ServletException {
    // You'll need to fill in the gaps here.  See the source of
    // UsernamePasswordAuthenticationFilter for a working implementation
    // you can leverage.
  }

  @Override
  public void doFilter(ServletRequest req, ServletResponse res,
    FilterChain chain) throws IOException, ServletException {
    final HttpServletRequest request = (HttpServletRequest) req;
    final HttpServletResponse response = (HttpServletResponse) res;
    if(request.getMethod().equals(POST)) {
      // If the incoming request is a POST, then we send it up
      // to the AbstractAuthenticationProcessingFilter.
      super.doFilter(request, response, chain);
    } else {
      // If it's a GET, we ignore this request and send it
      // to the next filter in the chain.  In this case, that
      // pretty much means the request will hit the /login
      // controller which will process the request to show the
      // login page.
      chain.doFilter(request, response);
    }
  }

}

Note the good stuff inside of doFilter(). If the incoming request method is a POST, then we send it up to our AbstractAuthenticationProcessingFilter to actually process the login. If it’s a GET, or any other HTTP request method for that matter, we simply send it to the next filter in the chain.

Finally, remember that you’ll need to define your own FORM_LOGIN_FILTER inside of your <security:http> Spring Security XML configuration to override the default /j_spring_security_check URL:

<security:http auto-config="false" use-expressions="true"
  entry-point-ref="LoginUrlAuthenticationEntryPoint">
  <security:custom-filter position="FORM_LOGIN_FILTER" ref="MyFilter" />
</security:http>

<bean id="LoginUrlAuthenticationEntryPoint"
  class="org.springframework.security.web.authentication.LoginUrlAuthenticationEntryPoint">
  <property name="loginFormUrl" value="/login" />
</bean>

Enjoy!

Formatting a Java Date into a Specific TimeZone and Conversion Between TimeZone's

918c33457114f88ec08bbc675820f307bb536d4e

Sat May 15 12:51:00 2010 -0700

Date objects in Java, and probably most other robust languages, simply represent a snapshot of a point in time. In other words, java.util.Date knows nothing about the time zone you’re referring to when instantiating or manipulating a Date object. Fact is, java.util.Date does not have to care about your time zone, because internally a Date is really nothing more than a count of the number of milliseconds since the standard base time known as “the epoch”, namely January 1, 1970, 00:00:00 GMT.

If it helps, think about it this way: X milliseconds since the epoch is X milliseconds since the epoch in the US-Pacific time zone, X milliseconds since the epoch in GMT-0 (London), X milliseconds since the epoch in India, etc. In short, when it’s 1273947282085 milliseconds since the epoch, it’s 1273947282085 milliseconds since the epoch everywhere in the world at the same time regardless of what time zone you’re sitting in. And since Java’s util.Date is simply a snapshot of the number of milliseconds at a specific point in time, you can see why Date doesn’t care about your time zone. It’s irrelevant.

But, how do I convert a java.util.Date into a different time zone? You can’t, and that question makes no sense. That’s like asking me to “take a picture of the sound.” Here’s some crap code that you should not use, but I’ve put it here for illustrative purposes:

// Do NOT use this, it does nothing and makes no sense.
public static final Date convertIntoTimeZone(final Date date, final TimeZone tz) {
  final Calendar cal = Calendar.getInstance();
  cal.setTime(date);
  cal.setTimeZone(tz);
  return cal.getTime();
}

You can’t convert a Date into a different time zone, but you can use Java’s handy DateFormat class to format a Date into the time zone of your choice. To put it differently, let Date do its thing — then, when you’re ready to display or print out a String representation of Date, that’s when you tell DateFormat what time zone you want it in. So, here’s some code that makes sense, and actually works:

final Date currentTime = new Date();

final SimpleDateFormat sdf = new SimpleDateFormat("EEE, MMM d, yyyy hh:mm:ss a z");

// Give it to me in US-Pacific time.
sdf.setTimeZone(TimeZone.getTimeZone("America/Los_Angeles"));
System.out.println("US-Pacific time: " + sdf.format(currentTime));

// Give it to me in GMT-0 time.
sdf.setTimeZone(TimeZone.getTimeZone("GMT"));
System.out.println("GMT time: " + sdf.format(currentTime));

// Or maybe Zagreb local time.
sdf.setTimeZone(TimeZone.getTimeZone("Europe/Zagreb"));
System.out.println("Zagreb time: " + sdf.format(currentTime));

// Even 10 hours and 10 minutes ahead of GMT
sdf.setTimeZone(TimeZone.getTimeZone("GMT+0010"));
System.out.println("10/10 ahead time: " + sdf.format(currentTime));

Cheers.

HTTP Digest Access Authentication using MD5 and HttpClient 4

83e158023b85f1d9bec507a18516b1a6552e8b3b

Tue May 04 14:30:00 2010 -0700

Dealing with HTTP’s Digest authentication mechanism isn’t too bad once you have the basic building blocks in place. Luckily HttpClient 4 can automatically solve many types of authentication challenges for you, if used correctly. Using HttpClient 4, I built an app that authenticates against a SOAP based web-service requiring WWW-Authenticate Digest authentication. In a nutshell, the fundamental principal behind HTTP Digest authentication is simple:

  • The client asks for a page that requires authentication.
  • The server responds with an HTTP 401 response code, providing the authentication realm and a randomly-generated, single-use value called a “nonce”. The authentication “challenge” itself is encapsulated inside of the WWW-Authenticate HTTP response header.
  • The client “solves” the authentication challenge and a solution is sent back to the web-server via the HTTP Authorization header on a subsequent request. The solution usually contains some type of MD5 hashed mess of your username, password, and “nonce”.
  • Assuming the solution is acceptable the server responds with a successful type response, usually an HTTP 200 OK.

Here’s a sample with a bit of pseudo code mixed in (so, you get the idea):

// A org.apache.http.impl.auth.DigestScheme instance is
// what will process the challenge from the web-server
final DigestScheme md5Auth = new DigestScheme();

// This should return an HTTP 401 Unauthorized with
// a challenge to solve.
final HttpResponse authResponse = doPost(url, postBody, contentType);

// Validate that we got an HTTP 401 back
if(authResponse.getStatusLine().getStatusCode() == HttpStatus.SC_UNAUTHORIZED) {
  if(authResponse.containsHeader("WWW-Authenticate")) {
    // Get the challenge.
    final Header challenge = authResponse.getHeaders("WWW-Authenticate")[0];
    // Solve it.
    md5Auth.processChallenge(challenge);
    // Generate a solution Authentication header using your
    // username and password.
    final Header solution = md5Auth.authenticate(
      new UsernamePasswordCredentials(username, password),
      new BasicHttpRequest(HttpPost.METHOD_NAME,
          new URL(url).getPath()));
    // Do another POST, but this time include the solution
    // Authentication header as generated by HttpClient.
    final HttpResponse goodResponse =
      doPost(url, postBody, contentType, solution);
    // ... do something useful with goodResponse, which assuming
    // your credentials were valid, should contain the data you
    // requested.
  } else {
    throw new Error("Web-service responded with Http 401, " +
      "but didn't send us a usable WWW-Authenticate header.");
  }
} else {
  throw new Error("Didn't get an Http 401 " +
    "like we were expecting.");
}

Enjoy.

Understanding Java's CountDownLatch and CyclicBarrier

6571e6e8ee005e982d33fa722188916c6befc2d0

Fri Apr 09 12:20:00 2010 -0700

While working on some nifty multi-threaded Java recently, a colleague pointed me to a few really useful Java classes: CountDownLatch and CyclicBarrier. My code was quite typical, a parent worker thread spawns a bunch of children to do real work, and needs to wait for the children to finish before continuing. The catch though, is that the child worker threads may or may not finish successfully, and in all likelihood will finish at different times. Even so, the parent thread must wait until all of its children have finished because the parent can only make forward progress once the children are complete. I whipped up a little demo that spawns five worker threads which update a JProgressBar at a random interval. The demo finishes once each progress bar hits 100%.

CountDownLatch

Meet CountDownLatch.

As described in the Java 6 API docs, a CountDownLatch is “a synchronization aid that allows one or more threads to wait until a set of operations being performed in other threads completes.” In other words, the developer says new CountDownLatch(N) which waits for N threads to finish before the latch is “released” allowing the calling thread to make forward progress. Couldn’t be more perfect here. To make my life a little easier, I wrote a few wrapper classes that encapsulate a CountDownLatch which allow me to easily synchronize on a List<BaseWorker>, a list of worker threads:

  • ThreadRunner.java — A class that accepts a List<BaseWorker> (a List of BaseWorker’s), creates a new CountDownLatch(list.size()), starts each BasedWorker then allows the developer to await() on the runner for all BaseWorker’s to finish.
  • BaseWorker.java — An abstract class that represents each worker thread, and defines a set of methods each BaseWorker must implement to be used with a ThreadRunner.

So, using these wrappers, let’s create a new worker:

public final class MyWorker extends BaseWorker {

  private final int worker_;

  public MyWorker(int worker) {
    super();
    worker_ = worker;
    // ...
  }

  @Override
  public void myRun() throws Exception {
    // ...
  }

  @Override
  public String getWorkerName() {
    return String.format("%s #%s", getClass().getSimpleName(), worker_);
  }

}

Now let’s setup a new ThreadRunner that will take a bunch of BaseWorker’s, start them, then wait for all to finish:

public final class MyRunner {

  private static final List<BaseWorker> workers__;
  static {
    workers__ = new ArrayList<BaseWorker>();
    workers__.add(new MyWorker(1));
    workers__.add(new MyWorker(2));
    workers__.add(new MyWorker(3));
  }

  public static void main(String[] args) {
    final ThreadRunner runner = new ThreadRunner(workers__);
    // Start all of the threads in this runner.
    runner.start();
    // Wait for all of the threads to finish.
    runner.await();
    // Did all of our workers complete without error?
    if(runner.wasSuccessful()) {
      System.out.println("All workers finished cleanly.");
    } else {
      System.out.println("Not all workers finished cleanly.");
    }
  }

}

In this example, I built a List of BaseWorker’s, gave the list to the ThreadRunner and asked the runner to start them. Upon calling runner.await(), the ThreadRunner blocks waiting for all of the workers to finish. Note that my concept of “finish” here means either successfully, or unsuccessfully (an Exception or Error case). Subsequently, I call runner.wasSuccessful() to check if all of the workers finished cleanly, basically asking the runner did all of your workers finish without throwing any Exception’s or Error’s?

If you’re interested, you can download my complete ThreadRunner demo/example that further demonstrates the usage of these wrapper classes using Swing and several JProgressBar’s.

CyclicBarrier

A CyclicBarrier is similar to a CountDownLatch, except that a CyclicBarrier is “a synchronization aid that allows a set of threads to all wait for each other to reach a common barrier point.” Like a CountDownLatch, a CyclicBarrier can be used to synchronize a number of threads. But instead of exiting upon completion, theads using a CyclicBarrier await() for all other threads in the pool to finish. Here’s a usage example of a CyclicBarrier built around my BaseWorker class:

public final class MyCyclicWorker extends BaseWorker {

  private final CyclicBarrier barrier_;

  public MyWorker(CyclicBarrier barrier) {
    super();
    barrier_ = barrier;
    // ...
  }

  @Override
  public void myRun() throws Exception {
    // ...
    // Wait here for all other threads in the CyclicBarrier to finish.
    barrier_.await();
  }

  @Override
  public String getWorkerName() {
    return getClass().getSimpleName();
  }

}

Here’s the class that starts up a bunch of these MyCyclicWorkers, then runs a single “cleanup” thread once all of the workers are done:

public final class CyclicExample {

  private static final int CYCLIC_THREADS = 5;

  public static void main(String[] args) {
    final CyclicBarrier barrier =
            new CyclicBarrier(CYCLIC_THREADS,
              new Runnable() {
                @Override
                public void run() {
                  // Cleanup thread, or completion thread.
                  // Called when all of the worker threads
                  // are finished.
                  // ...
                }
              });
    for(int i=0; i < CYCLIC_THREADS; ++i) {
      new MyCyclicWorker(barrier).start();
    }
    // ...
  }

}

Enjoy.

Java: JumpToLine, Jump or Seek to a Line in a File

a49656441f1b51315870e6039bdc18a1e1e41874

Mon Jan 18 12:13:00 2010 -0800

Seeking to a line number in a text file isn’t too hard to implement in Java if you use a few common and trusted API’s like Apache’s Commons I/O library. Recently I needed some Java that could automatically seek to a given line in a file and then remember the line number of the last line read. The next time I open the log reader, it should automatically seek itself to the last line read, and let me read any subsequent lines added by another user or process. This is perfect for log file monitoring: the first invocation of the reader would read lines 1 through X and the next invocation would read lines X+1 through Y, and so on. Using Apache’s Commons I/O API, this isn’t difficult at all. You may or may not know that the Commons I/O API contains a very convenient LineIterator class, which lets the developer iterate over lines in a file using a Reader.

With that in mind, meet JumpToLine, a somewhat hackish class I wrote that wraps Apache’s Commons I/O LineIterator in a way that lets you seek ahead to a specific line in a file, and is smart enough to remember the last line read (so that you don’t read the same line twice).

Example #1

Here’s how you might use JumpToLine to seek to line 10 in mylog.log, then read that line and every line after it:

final JumpToLine jtl = new JumpToLine(new File("mylog.log"));

try {
  // Open the underlying reader and LineIterator.
  jtl.open();
  // Seek to line 10; will throw a NoSuchElementException if
  // out of range.
  jtl.seek(10);
  // While there are any lines after and including line 10,
  // read them.
  while(jtl.hasNext()) {
    final String line = jtl.readLine();
    System.out.println(line);
  }
} catch (Exception e) {
  e.printStackTrace(System.err);
} finally {
  // Close the underlying reader and LineIterator.
  jtl.close();
}

Example #2

Here’s how you might use JumpToLine to seek to the last line read in mylog.log, and then read any subsequent lines added to the file since it was last read:

final JumpToLine jtl = new JumpToLine(new File("mylog.log"));

try {
  // Open the underlying reader and LineIterator.
  jtl.open();
  // Seek to the last line read since we last tried to
  // read any lines from this file.
  jtl.seek();
  // While there are any more lines to read from the last
  // line read position, then read them.
  while(jtl.hasNext()) {
    final String line = jtl.readLine();
    System.out.println(line);
  }
  // For grins, what is the last line number we read?
  System.out.println("Last line number read: " + jtl.getLastLineRead());
} catch (Exception e) {
  e.printStackTrace(System.err);
} finally {
  // Close the underlying reader and LineIterator.
  jtl.close();
}

JumpToLine is now formally part of my kolich-common Java library available on Github.

Controlling When and Where Java Writes Out its Permgen and Heap to Disk

b4374b1330fa512466f442e9532536b3e0e72365

Fri Oct 30 11:10:00 2009 -0700

A blog reader recently contacted me with an interesting question: can you explicitly tell Java when and where to flush its permgen and heap to disk? The answer, based on what I understand about Java and operating system fundamentals, is no.

I can say with much certainty that you can’t control where Java saves its heap and permgen (either on disk or in memory). Java itself doesn’t know about paging stuff out to disk. It simply asks the operating system for the memory it needs and if the OS can’t give it, then either the OS has to fail the request or make room by paging out unused chunks of memory to swap. In other words, Java relies on the host OS to handle this type of stuff.

But what if you’re dealing with millions of Java objects on a standard computer and you don’t have room to keep all of those objects in physical memory? In this case, your only real option is to write code that manually swaps objects in/out of the disk. Of course, this requires that you implement your own swapping mechanism, which isn’t too bad. When your Java application needs a set of objects, it loads what it needs into memory from disk, does some stuff with the objects, then writes them back out to disk.

Meet java.io.Serializablehttp://java.sun.com/javase/6/docs/api/java/io/Serializable.html

Here’s an example:

import java.io.ByteArrayOutputStream;
import java.io.ObjectOutputStream;
import java.io.OutputStream;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;

public class Dog implements Serializable {

  private static final long serialVersionUID = -4367737315167700936L;

  private String name_;
  private String breed_;

  public Dog (String name, String breed) {
    this.name_ = name;
    this.breed_ = breed;
  }

  @Override
  public String toString() {
    return String.format("%s:%s", this.name_, this.breed_);
  }

  public static void main (String [] args) {

    final List<Dog> dogs = new ArrayList<Dog>();
    dogs.add( new Dog("Fido", "mutt") );
    dogs.add( new Dog("Clifford", "big red dog") );

    ByteArrayOutputStream os = null;
    ObjectOutputStream out = null;
    for( Dog d : dogs ) {
      try {

        // To write the dogs out to a file, you'll of course
        // need to use a FileOutputStream instead of a
        // ByteArrayOutputStream
        os = new ByteArrayOutputStream();
        out = new ObjectOutputStream(os);
        out.writeObject(d);

        // Print the serialized version of Dog
        final String serialized = os.toString();
        System.out.println(d.toString() + " serialized is: " + serialized);

      } catch (Exception e) {
        e.printStackTrace(System.err);
      } finally {
        closeQuietly(os);
        closeQuietly(out);
      }
    }

  }

  private static final void closeQuietly(final OutputStream os) {
    try {
      os.close();
    } catch (Exception e) {  }
  }

}

Each of the objects you wish to save to disk will have to implement java.io.Serializable. This will let you convert a Java object into something that can be written out to disk. From there, you will have to write some type of queue or stack control mechanism that will know when, from where, and how to page these objects in and out of the disk.

Bundle Java (the JRE) and Launch a Java App with 7zip SFX: Convert Java Apps to an Executable

c57375a8ca5fa042b08a9b61aedbe7b8c05b7a1a

Mon Oct 26 22:00:00 2009 -0700

I’ve been playing around with a lot of installer type stuff recently. I discovered that Mozilla Firefox uses the 7zip SFX install launcher kick off the Firefox installation process. I started playing around with 7zip SFX, and realized that you can do some pretty cool stuff with it. In fact, I discovered that you can actually bundle a Java app and the Java Runtime Environment (JRE) into your own little 7zip SFX launcher. Naturally, this means you can write a Java app and then let your users start it by double clicking a native Win32 .exe. And best of all, because your launcher contains the Java Runtime Environment, the user does not have to have a JRE installed on their system to run your application!

The launcher extracts the JRE and your app to a temporary directory, then launches it using that freshly extracted JRE.

Why Is This Useful

Java is fantastic for its write once, run anywhere methodology. Only problem is, unlike a native Windows app, you need a JVM/JRE to run a Java application. Most vendors who sell software written in Java tell their users or customers that they need to install a JRE first before they can run the app. This makes sense, but it’s a slight (err, huge) inconvenience; Sun’s Java installer is bulky and often cumbersome. Wouldn’t it be nice if you could avoid that forced installation step, and simply ship a supported Java runtime with your Java application? This way, the user simply double clicks an .exe, a launcher extracts a supported JRE, and starts. In short, the user doesn’t have to install a JRE at all, but rather the JRE they need is simply extracted to a temporary directory and your application starts using that freshly extracted JRE. Further, when the user exits the application, the temporary JRE directory your app launcher created is automatically cleaned up, and all is well.

Not surprisingly, this is completely doable using 7zip SFX. However, note that if you choose to ship the JRE with your launcher, you can expect your executable to be approximately 16MB larger than it would be without the JRE. IMHO, 16MB is a small price to pay for the added convenience of not having to install another piece of bloated software. Plus you know that the JRE your launcher extracts and starts your application with fully supports your Java app; you don’t have to worry about the Java updater updating the JRE on the user’s system behind your back which might break your app.

Getting Started

Before you start packaging up your app with 7zip, you’ll probably want to download my complete example pack. This ZIP file contains everything you’ll need to get started, including a ready to ship JRE (Java 6 Update 16) and an Ant build file. Note that you do not need to install 7zip; I’ve packed the necessary 7zip.exe to create the archive with the example pack. However, if you want to install 7zip, can download the installer here or from my mirror on Onyx. This sample pack is also an Eclipse project. If you work out of Eclipse, you can import the .project inside of the example pack into your Eclipse IDE.

Or, if you want to see the 7zSD.sfx launcher in action, download the pre-built demo launcher.

Fundamentals

Here’s how this all works. 7zip (and other ZIP installer type packages) provide SFX launchers. These launchers are essentially native Windows executables that understand how to extract an archive to a temporary directory, and launch an application (usually another installer). This is how the Mozilla Firefox installer works: when you launch the “installer” the extracting files dialog that opens is actually the 7zip SFX launcher extracting the real setup.exe to a temporary directory. Once done, it starts setup.exe to complete the installation process.

In this case, the basic principle is the same, except I’m using the 7zip SFX launcher to extract my application and required JRE components to a temporary directory, and then start it. Producing a native Windows SFX launcher is quite easy; you need to binary concatenate three files together: the SFX launcher, an app.tag configuration file, and a 7zip archive. In Windows, using the copy command, this looks something like:

C:\> copy /b 7zSD.sfx + app.tag + app.7z start.exe

This produces start.exe, a portable native Windows app that contains everything your Java application needs to run in a single executable! When run, start.exe will use 7zSD.sfx to extract the contents of app.7z to a temporary directory, and launch whatever application you’ve defined in app.tag.

My Sample Java App

My example Java app is very straightforward. Yours will, of course, be more complicated. My sample app simply opens a JOptionPane to display the current “working directory” (where the SFX launcher was started from) and the “temporary directory” (the temp directory where the SFX launcher extracted the JRE and application files to).

package com.kolich.sevenzip.example;

import java.io.File;
import java.io.IOException;

import javax.swing.JFrame;
import javax.swing.JOptionPane;
import javax.swing.SwingUtilities;

public class StartHere {

  private File workingDir_;
  private File tempDir_;

  public StartHere(File root, File temp){
    this.workingDir_ = root;
    this.tempDir_ = temp;
  }

  /**
   * The working directory, where the application was
   * started from.
   * @return
   */
  public File getWorkingDir(){
    return this.workingDir_;
  }

  /**
   * The temp directory, where the launcher extracted
   * your app and JRE to on the users' system.
   * @return
   */
  public File getTempDir(){
    return this.tempDir_;
  }

  public static void main(String[] args)
    throws Exception {

    File root;
    try {
      root = new File(args[0]);
    } catch ( Exception e ) {
      root = new File(".");
    }

    File temp;
    try {
      temp = new File(args[1]);
    } catch ( Exception e ) {
      temp = new File(".");
    }

    final StartHere sh = new StartHere(root, temp);
    Runnable worker = new Runnable() {
        public void run() {
          showMessageDialog(sh);
          System.exit(0);
        }
    };
    SwingUtilities.invokeLater(worker);

  }

  private static void showMessageDialog(StartHere sh) {
    try {
      JOptionPane.showMessageDialog(new JFrame(),
        "A java app launched by 7zip SFX!\n\n" +
        "My working directory is:\n" +
        sh.getWorkingDir().getCanonicalPath() +
        "\n\nAnd I've been extracted to temp directory:\n" +
        sh.getTempDir().getCanonicalPath() );
    } catch (IOException e) {
      e.printStackTrace( System.err );
    }

  }

}

Here’s a screen shot:

The Ant build script in my example pack compiles this app and creates app.jar, a runnable JAR file.

The App.tag Configuration File

I’m using the 7zSD.sfx launcher by Oleg Scherbakov at http://7zsfx.solta.ru/en/. There are a ton of configuration options as described on Oleg’s web-site. In the example, my app.tag configuration file is as follows:

;!@Install@!UTF-8!
Title="7ZIP Java Launcher Example"
ExtractDialogText="Extracting ..."
GUIFlags="32"
ExtractTitle="Extracting"
FinishMessage="Application stopped."
RunProgram="launcher\jre\bin\javaw.exe -jar launcher\app.jar \"%%S\" \"%%T\""
;!@InstallEnd@!

There’s nothing too complicated about the configuration file. In this example, I’m simply extracting the 7zip file included with the native SFX launcher and starting launcher\jre\bin\javaw.exe, which is the JRE packaged with the launcher (found under launcher\jre in the example pack). The %%S property in the configuration file is the directory that contains the SFX executable (where the user started it from). The %%T property is the temporary directory where the SFX launcher placed the extracted JRE and application files. Note that when the Java application exits, the SFX launcher will automatically delete/cleanup this temporary directory.

This example simply asks 7zSD.sfx to extract and then start launcher\app.jar using launcher\jre\bin\javaw.exe.

The Ant Build File

My ant build file does a few things. First, it compiles the Java app and packages it into a runnable JAR file. From there, it uses 7zip to compress the JRE and the resulting JAR file into app.7z. Finally, it uses Ant’s concat task to binary concatenate 7zSD.sfx, app.tag and the app.7z file together. The result is start.exe, a native self-contained Windows executable that contains the JRE and Java app itself!

Note that the JRE is 7zip’ed inside of app.7z. This is how the JRE is shipped/included with the launcher.

<project name="7zipexample" default="package.7zipexample">

  <property name="src.dir" location="${basedir}/src/com/kolich"/>
  <property name="build.dir" location="${basedir}/build"/>
  <property name="launcher.dir" location="${basedir}/launcher"/>
  <property name="7zip.exe.dir" location="${basedir}/7zip"/>
  <property name="sfx.dir" location="${basedir}/sfx"/>
  <property name="dist.dir" location="${basedir}/dist"/>

  <target name="clean.7zipexample" depends="clean.build.7zipexample,
      clean.dist.7zipexample" />

  <target name="clean.build.7zipexample">
    <delete includeemptydirs="true">
    <fileset dir="${build.dir}" includes="**/*" />
    </delete>
  </target>

  <target name="clean.dist.7zipexample">
    <delete includeemptydirs="true">
    <fileset dir="${dist.dir}" includes="**/*" />
      <fileset dir="${launcher.dir}" includes="app.jar" />
      <fileset dir="${launcher.dir}" includes="app.7z" />
    </delete>
  </target>

  <target name="package.7zipexample" depends="clean.7zipexample">

    <!-- compile the source -->
    <javac destdir="${build.dir}" srcdir="${src.dir}">
      <include name="**/*.java"/>
    </javac>

    <!-- create a runnable jar -->
    <jar destfile="${launcher.dir}/app.jar" manifest="Manifest.mf">
      <fileset dir="${build.dir}">
        <include name="**/*.class" />
      </fileset>
    </jar>

    <!-- compress all of the files we need to down with 7zip -->
    <exec executable="${7zip.exe.dir}/7z.exe" failonerror="true">
      <arg value="a" />
      <arg value="-t7z" />
      <arg value="-r" />
      <arg value="${launcher.dir}\app.7z" />
      <arg value="${launcher.dir}" />
    </exec>

    <!-- concat the files we need together to produce a binary
        launcher -->
    <concat destfile="${dist.dir}/start.exe" binary="yes">
      <fileset file="${sfx.dir}/7zSD.sfx" />
      <fileset file="${sfx.dir}/app.tag" />
      <fileset file="${launcher.dir}/app.7z" />
    </concat>

  </target>

</project>

You can always manually build the installer package yourself, but why bother if you have an Ant build file ready to do the work for you? When you run the package.7zipexample build target in the example build file, the resulting ready to launch executable can be found at dist\start.exestart.exe is your shippable application. Again, no pre-installed Java Runtime Environment required!

Changing the Icons and Version Information

If you use Oleg’s 7zSD.sfx launcher as is, you’ll notice the icon attached to the resulting .exe is quite poor. In all likelihood, you’ll want to replace the icon with one for your application. Doing so is quite easy with Resource Hacker, a freeware utility to view, modify, rename, add, delete and extract resources in 32bit Windows executables and resource files. Detailed instructions on how to replace the icon can be found here on the 7zSD.sfx web-site. Note that you can also use Resource Hacker to edit the version and copyright details included in the resulting executable as shown below.

In summary, it’s fairly straightforward to bundle and ship the Java Runtime Environment with your Java application using 7zip SFX. Heck, Sun allows and even tells you how to redistribute the JRE with your applications (just read the LICENSE file provided with any JRE installation).

On GitHub

All code shown here is available in my 7zip-sfx-java project on GitHub.

Java's "os.arch" System Property is the Bitness of the JRE, NOT the Operating System

5946627a02755f64961995d902eafa07e58a323c

Mon Oct 19 12:15:00 2009 -0700

If you ever use Java to check if a system is 32 or 64-bit, you should know that Java’s os.arch system property returns the bitness of the JRE, not the OS itself. Sites like this are WRONG — any resource that claims Java’s os.arch property returns the real “architecture of the OS” is lying. Case in point, I recently ran this tiny program on a 64-bit Windows 7 machine, with a 32-bit JRE:

import com.sun.servicetag.SystemEnvironment;

public class OSArchLies {

  public static void main(String[] args) {

    // Will say "x86" even on a 64-bit machine
    // using a 32-bit Java runtime
    SystemEnvironment env =
        SystemEnvironment.getSystemEnvironment();
    final String envArch = env.getOsArchitecture();

    // The os.arch property will also say "x86" on a
    // 64-bit machine using a 32-bit runtime
    final String propArch = System.getProperty("os.arch");

    System.out.println( "getOsArchitecture() says => " + envArch );
    System.out.println( "getProperty() says => " + propArch );

  }

}

The output from this tiny app on a 64-bit box:

#/> java OSArchLies
getOsArchitecture() says => x86
getProperty() says => x86

In this case, one would expect to see something like x86_64 or amd64 instead of just x86. Bottom line, don’t believe what you read about os.arch and other Java system properties. They are usually properties of the JRE/JDK itself, and not necessarily the real properties of the underlying OS or architecture. If you need to check if a system is actually 32 or 64-bit, you should look elsewhere in the system registry or write your own native app and call it from Java.

SHA1withRSA Digital Signing in Java: OpenSSL, PKCS#8

39424b1c04c8db8e455eca47443bd9d6992e9216

Fri Mar 20 11:42:42 2009 -0700

I’ve been going key crazy over the last several days working on digital signing in PHP, and now Java. It’s not hard, but what’s really confusing is all of the key generation and manipulation stuff before you even get to the code (the point where you want to actually use the key to sign some data).

  • PHP wants a plain RSA key in DER format.
  • Java seems to prefer a PKCS#8 base64 encoded RSA key in DER format with no password.

This stuff is all over the map.

This post is an attempt to document what worked for me. There are definitely other signing methods out there, but I finally got PKCS#8 with an RSA key in DER format to work in Java. Here’s how:

First, generate a new RSA key using openssl:

openssl genrsa -out key.pem 1024

Now, for Java, you need to convert the RSA key into a PKCS#8 encoded key in DER format:

openssl pkcs8 -topk8 -in key.pem -nocrypt -outform DER -out key.pkcs8

Now that you’ve got a PKCS#8 encoded key, you can easily use the PKCS8EncodedKeySpec class to parse the key for signing. Here’s my somewhat hackish code that signs a message (a String) using SHA1withRSA from Sun’s JSSE (Java’s Secure Socket Extension) framework:

package org.kolich.security;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.security.KeyFactory;
import java.security.PrivateKey;
import java.security.Signature;
import java.security.spec.PKCS8EncodedKeySpec;

/**
 * Signs a byte array (a message) using a PKCS#8 encoded
 * RSA private key.
 * @author kolichko Mark S. Kolich
 *
 */
public class PKCS8RSASigner {

    private static final long MAX_KEY_SIZE_BYTES = 8192L;

    private static final String UTF_8 = "UTF-8";

    private static final String SHA1_WITH_RSA = "SHA1withRSA";
    private static final String SUN_JSSE = "SunJSSE";
    private static final String RSA = "RSA";

    private static final char HEX_DIGIT [] = {
        '0', '1', '2', '3', '4', '5', '6', '7',
        '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
    };

    private File pkcsKeyFile_;
    private byte [] keyFileBytes_;
    private Signature dsa_;
    private KeyFactory keyFactory_;
    private PrivateKey privateKey_;

    public PKCS8RSASigner ( File pkcsKeyFile ) {

        try {
            this.pkcsKeyFile_ = pkcsKeyFile;
            this.dsa_ = Signature.getInstance(SHA1_WITH_RSA, SUN_JSSE);
            this.keyFactory_ = KeyFactory.getInstance(RSA, SUN_JSSE);

            this.init();
        } catch ( Exception e ) {
            // Wrap it, so every where that you use PKCS8RSASigner
            // you don't have to wrap the constructor in a try/catch.
            // But the caller should catch Error's though.
            throw new Error(e);
        }

    }

    /**
     * Given a message, generate a signature based on this
     * PKCS#8 private key.
     * @param message
     * @return
     * @throws Exception
     */
    public byte [] getSignature ( byte [] message ) throws Exception {

        this.dsa_.update( message );
        return this.dsa_.sign();

    }

    /**
     * Setup this PKCS8RSASigner.  Load the key file into
     * memory, and init the key factory accordingly.
     * @throws IOException
     */
    private void init ( ) throws Exception {

        FileInputStream is = null;

        if ( !this.pkcsKeyFile_.exists() ) {
            throw new FileNotFoundException( "RSA key file not found!" );
        }

        // Get the size, in bytes, of the key file.
        final long length = this.pkcsKeyFile_.length();

        if ( length > MAX_KEY_SIZE_BYTES ) {
            throw new IOException( "Key file is too big!" );
        }

        try {

            is = new FileInputStream( this.pkcsKeyFile_ );

            int offset = 0;
            int read = 0;
            this.keyFileBytes_ = new byte[(int)length];
            while ( offset < this.keyFileBytes_.length
                        && (read=is.read(this.keyFileBytes_, offset,
                        this.keyFileBytes_.length-offset)) >= 0 ) {
                offset += read;
            }

        } catch ( IOException ioe ) {
            throw ioe;
        } finally {
            try {
                if ( is != null ) {
                    is.close();
                }
            } catch ( IOException ioe ) {
                throw new Exception("Error, couldn't close FileInputStream", ioe);
            }
        }

        PKCS8EncodedKeySpec privKeySpec = new PKCS8EncodedKeySpec(
          this.keyFileBytes_ );

        // Get the private key from the key factory.
        this.privateKey_ = keyFactory_.generatePrivate( privKeySpec );

        // Init the signature from the private key.
        this.dsa_.initSign( this.privateKey_ );

    }

    /**
     * Convert a byte array into its hex String equivalent.
     * @param bytes
     * @return
     */
    public static String toHex ( byte [] bytes ) {

        if ( bytes == null ) {
            return null;
        }

        StringBuilder buffer = new StringBuilder(bytes.length*2);
        for ( byte thisByte : bytes ) {
            buffer.append(byteToHex(thisByte));
        }

        return buffer.toString();

    }

    /**
     * Convert a single byte into its hex String
     * equivalent.
     * @param b
     * @return
     */
    private static String byteToHex ( byte b ) {
        char [] array = { HEX_DIGIT[(b >> 4) & 0x0f], HEX_DIGIT[b & 0x0f] };
        return new String(array);
    }

    public static void main ( String [] args ) {

        // A bunch of sample messages to digitally sign
        // using your PKCS#8 encoded private key.
        String [] toSign = {
            "some string",
            "http://kolich.com",
            "bleh bleh bleh"
        };

        // Create a new PKCS8RSASigner using the specified
        // PKCS#8 encoded RSA private key.
        PKCS8RSASigner signer = new PKCS8RSASigner(new File("key.pkcs8"));

        for ( String s : toSign ) {
            try {
                System.out.println(
                        toHex( signer.getSignature(
                                s.getBytes( UTF_8 ) )
                            ).toUpperCase()
                        );
            } catch ( Exception e ) {
                e.printStackTrace( System.err );
            }

        }

    }

}

Good luck.

Java: Resolving org.xml.sax.SAXParseException: Content is not allowed in prolog

52471093e03c52108d604bab49f28a5bfdb672fb

Mon Feb 02 07:00:00 2009 -0800

Parsing an RSS feed can be tricky. Your code has to gracefully handle all sorts of strange corner cases; everything from malformed XML to an unexpected byte sequence in the feed prolog. I recently worked on a problem that dealt with the latter: I was trying to parse an RSS feed in Java, and kept hitting an org.xml.sax.SAXParseException: Content is not allowed in prolog. The prolog is anything before the opening <?xml tag at the start of the feed. I dug into it a little further, and discovered that many UTF-8 encoded files include a three-byte UTF-8 Byte-order mark. When dealing with a UTF-8 encoded RSS feed, this three-byte pattern (0xEF 0xBB 0xBF) in the prolog can cause all sorts of interesting XML parsing problems, including a SAXParseException: Content is not allowed in prolog.

One solution is to use a quick-and-dirty regular expression to cleanup the XML prolog before feeding it into a parser.

First, I wanted to confirm my suspicion about the UTF-8 Byte-order mark. I used wget to download the feed in question http://www.hp.com/hpinfo/stories.xml and opened it up using khexedit. Sure enough, the first three bytes are EF BB BF:

Because these extra three bytes are present in the prolog, you might see an exception that looks something like this when trying to parse the XML:

Caused by: org.jdom.input.JDOMParseException: Error on line 1:
                      Content is not allowed in prolog.
     at org.jdom.input.SAXBuilder.build(SAXBuilder.java:468)
     at org.jdom.input.SAXBuilder.build(SAXBuilder.java:851)
     at com.sun.syndication.io.WireFeedInput.build(WireFeedInput.java:178)
     ... 188 more
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
     ... 190 more

As mentioned, a quick-and-dirty solution to this problem is to build a regular expression to strip off any junk in the prolog before feeding the XML into a parser. Here’s an example that strips off any non-word characters in the prolog:

String xml = "<?xml ...";
Matcher junkMatcher = (Pattern.compile("^([\\W]+)<")).matcher( xml.trim() );
xml = junkMatcher.replaceFirst("<");

As of Java 1.4, you could also try something a little cleaner:

String xml = "<?xml ...";
xml = xml.trim().replaceFirst("^([\\W]+)<","<");

Note that calling String.trim() on the XML isn’t good enough, because trim() only handles leading and trailing white space. Once I got rid of the UTF-8 Byte-order mark, my XML parser handled the feed with no issues.

Understanding Java's "Perm Gen": MaxPermSize, heap space, etc.

8be7078abafab37beb4d6842be7d931ae270f8ab

Thu Jan 29 22:00:00 2009 -0800

During my travels at work, I’ve come across a few interesting memory management issues in Java. My team has deployed several large web-applications in a single instance of Apache Tomcat. The Linux box running these applications only has about 2GB of physical memory available. Once the apps are deployed, about 1.8 GB of system memory is consumed by Java alone. Clearly, we need to improve our memory management a bit.

However, I took a few minutes to do some digging on Java’s Permanent Generation (Perm Gen) and how it relates to the Java heap. Here are some distilled notes from my research that you may find useful when debugging memory management issues in Java:

JVM argument -Xmx defines the maximum heap size. The arg -Xms defines the initial heap size. For example:

-Xmx4g -Xms512m

In Tomcat land, these settings would go in your startup.sh or init script, depending on how you start and run Tomcat. With regards to the MaxPermSize, this argument adjusts the size of the “permanent generation.” As I understand it, the perm gen holds information about the “stuff” in the heap. So, the heap stores the objects and the perm gen keeps information about the “stuff” inside of it. Consequently, the larger the heap, the larger the perm gen needs to be.

Here is an example showing how you might use MaxPermSize:

-XX:MaxPermSize=512m

Additional Notes

  • Use the JVM options -XX:+TraceClassLoading and -XX:+TraceClassUnloading to see what classes are loaded/un-loaded in real-time. If you have doubts about excessive class loading in your app; this might help you find out exactly what classes are loaded and where.
  • Use -XX:+UseParallelGC to tell the JVM to use multi-threaded, one thread per CPU, garbage collection. This might improve GC performance since the default garbage collector is single-threaded. Define the number of GC threads to use with the -XX:ParallelGCThreads=N option where N is the number of GC threads you wish to consume.
  • Never call System.gc() in your code. The application doesn’t know the best time to garbage-collect, only the JVM really does.
  • The JVM option -XX:+AggressiveHeap inspects the machine resources (size of memory and number of processors) and attempts to set various heap and memory parameters to be optimal for long-running, memory allocation-intensive jobs.

Cheers.