September 2009 Archives

utf-8-registered-trademark.pngYesterday, I realized the importance of setting a properly computed Content-Length header in your HTTP response.  You may know that the Content-Length header is the length of the response body in octets (8-bit bytes), and not the number of characters.  In Java, I was computing the Content-Length using the String.length() method, which essentially returns a count of the number of characters in the String (assuming each character was only one-byte).  Well, in many cases, using the String length as the Content-Length is entirely wrong, especially when dealing with UTF-8 encoded characters in your String.  UTF-8 encodes each character (code point) in 1 to 4 octets (8-bit bytes).  Meaning that if you have UTF-8 encoded characters in your String, then those characters may use more than a single byte to represent themselves.  But, when you call String.length(), you're only going to get back the number of characters, not the number of bytes used to represent those characters (e.g., what the HTTP Content-Length header needs).

So, here's the situation: I was working with some XML, where one of the entries happened to contain a special registered trademark (r) symbol.  My HTTP response was returning the data contained in this XML element, and on each response, the data was truncated by one-byte.  Though a useful pair programming exercise, a colleague and I looked into the problem, and found that I was improperly computing the Content-Length.

Here's why.  According to the proper Unicode documentation/chart on unicode.org, we see that the registered trademark symbol uses two bytes to represent itself: 0x00 AE.  But we know that in UTF-8 land, the first byte should indicate that we're dealing with a two-byte character sequence.  Hence, the UTF-8 encoding for the registered trademark symbol is: 0xC2 AE.  Using a trusty hex editor, we can verify that this indeed the correct encoding, by examining the byte sequence for the the UTF-8 encoded registered trademark symbol in the XML:

registered-trademark-utf8-encoding.png


Yep, sure enough, 0xC2 AE.  So this character (r) uses two-bytes to represent itself in UTF-8.  In other words, even though the (r) registered trademark symbol is a single code point in my response, it needs two bytes to properly represent itself.

Now, say you include this XML element in an HTTP response of some kind, but computed the Content-Length using String's length() method.  Like me, you might find that your HTTP response is truncated, given that your computed Content-Length is one byte less than it should be:

// WRONG: because String.length returns the number of
// characters, not the number of bytes like you would
// expect for UTF-8 encoded characters,
String response = [insert XML with UTF-8 characters here];
this.contentLength_ = response.length();

To fix this, convert the string to a sequence of UTF-8 encoded bytes, and compute the content length using the length of the resulting byte array:

// CORRECT: get a UTF-8 encoded byte array from the response
// String and set the content-length to the length of the
// resulting byte array.
String response = [insert XML with UTF-8 characters here];
byte[] responseBytes;
try {
responseBytes = response.getBytes("UTF-8");
}
catch ( UnsupportedEncodingException e ) {
System.err.print("My computer hates UTF-8");
}

this.contentLength_ = responseBytes.length;

Not surprisingly, this solved my problem.  So, remember kids, your Content-Length header is the number of bytes in your response, not the number of characters in a String.  And, as you just learned, some code points (characters) in UTF-8 land can use up to four bytes to represent themselves.

Back to work.
koli.ch-first-ever-dot-ch-through-netsol.jpgLast Friday night my domain registrar, Network Solutions, added .ch (Switzerland), .pl (Poland), .cz (Czech Republic), .ru (Russian Federation), and .li (Liechtenstein/Long Island) to their domain name lineup.  I've been waiting for NetSol to support dot-ch for quite a while, so I could snag koli.ch.  And now, I'm the proud owner of http://koli.ch, a custom domain hack of my last name using the Swiss ccTLD.  Even cooler, NetSol's VP of Engineering confirmed on Twitter that koli.ch was the first ever dot-ch domain registration through Network Solutions.  BTW, before I forget, special thanks to Network Solutions for adding .ch to their lineup!

With that said, probably in the next few weeks, I plan on moving my blog to http://mark.koli.ch.  I have other plans for kolich.com at the moment, hence the desire to digitally relocate.  It will definitely be a bit of work getting everything configured properly again on a new ccTLD, but at least I'll have fun configuring my web-server to gracefully redirect search bots and blog readers to my new online home.

Until then, remember, life's a bea.ch.
firefox-cappuccino-kolich.jpgFollowing yesterday's post on serving up static content with Restlet's built in web-server, I received a couple of questions on Twitter inquiring how I implemented my "download the latest Cappuccino" feature at http://mark.koli.ch/cappuccino/latest.  If you visit http://mark.koli.ch/cappuccino/latest you'll notice it prompts you to save a file named "cappuccino.jar", which is the latest Cappuccino build available on kolich.com.  Even if I change versions, and post a newer build on my server, you can always download the latest from http://mark.koli.ch/cappuccino/latest.  So, you might expect this URL to prompt you to save a file named "latest", given that this is how most web-browsers work; they take the string following the last / and use that as the file name.  However, when someone visits /latest I want the browser to prompt them to save "cappuccino.jar" and not "latest".

The title of this post is slightly misleading, since the Content-Disposition header cannot be set directly using mod_rewrite.  As I understand it, there are a set of headers that mod_rewrite understands, and Content-Disposition is not one of them.

So, to prompt the user to save cappuccino.jar instead of "latest", I used an interesting combination of mod_rewrite and mod_headers.  Specifically, I used mod_rewrite to internally redirect any requests to .../cappuccino/latest to the latest cappuccino Jar file available on my server.  Second, I used mod_headers to add a Content-Disposition header to any request ending with ../cappuccino/latest, which forces the browser/client to interpret the file as "cappuccino.jar" and not "latest".  Here's the configuration from my Apache httpd.conf file:

## Internally redirect for Cappuccino
RewriteRule ^/cappuccino/latest /cappuccino/dist/cappuccino-v0.1.jar [L]
SetEnvIfNoCase Request_URI cappuccino\/latest$ cappuccino-latest
Header set Content-Disposition \
"attachment; filename=cappuccino.jar" env=cappuccino-latest

With this configuration, Apache will set the Content-Disposition header on each request for http://mark.koli.ch/cappuccino/latest.  Using a tool like HttpFox, you can prove to yourself that this works as expected:

httpfox-snapshot-cappuccino-kolich.jpg


Enjoy.
Common problem: I need to share a bunch of files with a friend, or co-worker, but I can't send them via email because the files are greater than 10MB in total size.  I could send them to a local web-server, or post them on kolich.com, but that involves starting a new SSH session, some SCP's, what a mess.  I wish I had a really lightweight simple web-server that I can simply copy into any directory, and start with one command.  Once started, the web-server will simply serve files right from the directory I started it from (e.g., the web-server root becomes the current working directory).  Then I can tell my co-workers to visit http://192.168.1.100:8080, for example, to download the files.

A few weeks ago, I went through a Restlet tutorial for a project at work, and knew that Restlet supported its own internal HTTP web-server for serving up static files.  I wanted to learn a little more about this, so I created Cappuccino.  Cappuccino is a lightweight server that uses Restlet's internal HTTP web-server to serve up static files from the directory it's started from.  Technically speaking, Cappuccino is just a handsome wrapper of Restlet's internal HTTP server.

Using Simon Tuff's one-jar, I packaged all required libraries and resources into a single .jar file.  As a result, you can start Cappuccino with a one-liner (assuming you have a good JRE installed in your path):

#/> java -jar cappuccino.jar

Once started, Cappuccino serves up static files in the current directory on port 8080 by default.  Of course, you can change the default port:

#/> java -jar cappuccino.jar 8099

You can also add an alias to your *NIX shell (bash, etc.) for automatic downloading and start up of Cappuccino from the current working directory.  For example, set a "servehere" alias to automatically download the latest Cappuccino and run it:

#/> alias servehere='lynx -dump http://kolich.cc/cino > cappuccino.jar; \
java -jar cappuccino.jar'

Once set, simply type servehere to download and start Cappuccino from the current directory.

Note: I tried using "wget --quiet http://kolich.cc/cino" to automatically download cappuccino.jar in the alias example above, but it appears that wget doesn't always obey the Content-Disposition header sent by my web-server.  It seems that wget on my CentOS box understands the Content-Disposition header by default, but wget on my Fedora 7 box does not.  Better to use Lynx, which gets it right more often than not.

Download the Source:
http://mark.koli.ch/cappuccino/src/cappuccino-v0.1.zip

Download the Runnable Jar:

http://mark.koli.ch/cappuccino/dist/cappuccino-v0.1.jar


Enjoy.
Back in early June, I wrote up a quick blog post describing how I setup my Did you Find this Helpful? widget.  The goal of this widget was to determine which blog posts my readers found most helpful, and least helpful.  As described, I used a little JavaScript (jQuery driven of course) to log the request with Apache via AJAX.  The yes or no responses from readers are cached in my Apache log files, which let me run reports on these logs to see which of my posts are most helpful, and least helpful:

192.168.1.1 - - [05/Jun/2009:09:03:07 -0700] \
"GET /tracker/Helpful/yes/?http%3A%2F%2Fmark.koli.ch%2F2009%2... \
HTTP/1.1" 204 - "http://mark.koli.ch/2009/05/..." \
"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.10) \
Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729)"

Since June I've been collecting "Did you find this helpful?" responses, and finally got around to analyzing my logs to determine what is actually most helpful, and least helpful.  Here's the top 5 in each category.  Interestingly, two of the most helpful blog posts are also two of the least helpful.  I guess you can't please everyone:


Top 5 Most Helpful

  1. JAVA: Resolving org.xml.sax.SAXParseException: Content is not allowed in prolog
  2. HOWTO: Whole Disk Backup and Recovery with dd, gzip, and p7zip (Linux on a CD too)
  3. Symantec AntiVirus: HOWTO Disable/Unlock Scheduled Administrator Scans
  4. Using the Twitter API, Cache Your 'Tweets' in PHP (TwitterCacher)
  5. Understanding Java's "Perm Gen" (MaxPermSize, heap space, etc.)

Complete list September 2009 Most Helpful blog posts.


Top 5 Least Helpful

  1. HOWTO: Make Your Own AddThis Social Bookmarking Sharing Widget (Sharing URLs)
  2. Understanding Java's "Perm Gen" (MaxPermSize, heap space, etc.)
  3. Java to Capitalize the First Letter of Each Word in a Sentence
  4. ImageMagick and PHP: Your Best Friend Or Your Worst Nightmare (Installing and a Few Examples)
  5. JAVA: Resolving org.xml.sax.SAXParseException: Content is not allowed in prolog

Complete list September 2009 Least Helpful blog posts.
51TRv0bGiLL._SL500_AA280_.jpgOK, here's a post that has nothing to do with technology.  Usually when I acquire a new appliance, or piece of weekend warrior type equipment, I enjoy writing up a quick honest-to-God review about my first impressions.  The Porter-Cable C2002-WK Oil-Free UMC Pancake Compressor with 13-Piece Accessory Kit I recently purchased is no exception.  And for the record, I'm no contractor or construction worker; I'm just a guy who needed a somewhat simple air compressor for basic round the house jobs.  Generally speaking, I only plan on using this compressor to keep tires inflated, and to spray out dusty computer equipment.  I'm sure this compressor has other, more noble uses.  I have yet to find one, but I'm sure I'll come across something soon.
Quick tip: If you're ever in a situation that requires a simple and dirty wipe/format/erase of a device (USB key, hard disk, whatever), you might find the following HOWTO somewhat useful.  This post assumes you are familiar with Linux.

Note that these instructions tell you how to erase a disk for simple "keep prying eyes away from your data" purposes.  If the device you're erasing contains sensitive data of any kind, and you care about data security, then you should consider "shredding" your device using a tool like DBan (Darik's Boot And Nuke).


#1 - Attach and Locate the Device You want to "Erase"

For a hard disk, you'll probably use /dev/hda.  For a USB key, something like /dev/sdd is most common.  You'll need to locate the correct device special file for your device; these vary from system to system.


#2 - Erase with All Zeros, or a Random Bit Pattern

Once you've located the DSF for your device, you can use dd to erase it by writing out a series of continuous zeros, or a random bit pattern.  For the sake of this example, I'll assume the device you want to erase is /dev/hda.  Erase the device with all zeros:

#/> dd if=/dev/zero of=/dev/hda bs=1024k

Or, erase the device with a random bit pattern using /dev/urandom:

#/> dd if=/dev/urandom of=/dev/hda bs=1024k

BTW, for the curious, you can also generate decent passwords using /dev/random.
nytimes-javascript-get-selected-text.pngYou may have noticed that a few sites out there trigger some type of event when you use your mouse to select a word or a block of text on the page.  After selecting some text, a little pop-up might appear allowing you to look up the definition of the selected word, or search Google for the selected phrase.  The New York Times online is a perfect example; while reading any of their articles, select a block of text with your mouse and you'll notice a little balloon like icon appears.  If you click the balloon icon, a pop-up window opens that back searches all New York Times articles for the selected text.  Like any reasonable software engineer, I was curious how the New York Times online implemented this select, click, and search feature.  As it turns out, implementing your own is quite easy with jQuery as shown in my example.

First, I leveraged/borrowed this little code snippet from CodeToad, that offered a nice cross-browser compatible function for getting the user selected text in the browser.  I was hoping that a single call to get the selected text would work across all platforms, but of course, each browser has its own getSelection implementation.

Kolich.Selector.getSelected = function(){
var t = '';
if(window.getSelection){
t = window.getSelection();
}else if(document.getSelection){
t = document.getSelection();
}else if(document.selection){
t = document.selection.createRange().text;
}
return t;
}

Second, use jQuery to bind a mouseup event handler to the document:

$(document).ready(function(){
$(document).bind("mouseup", Kolich.Selector.mouseup);
});

From there, your event handler can simply call your getSelected() function to get the selected text and do something with it:

Kolich.Selector.mouseup = function(){
var st = Kolich.Selector.getSelected();
if(st!=''){
alert("You selected:\n"+st);
}
}

Putting it all together, your code might look something like this:

if(!window.Kolich){
Kolich = {};
}

Kolich.Selector = {};
Kolich.Selector.getSelected = function(){
var t = '';
if(window.getSelection){
t = window.getSelection();
}else if(document.getSelection){
t = document.getSelection();
}else if(document.selection){
t = document.selection.createRange().text;
}
return t;
}

Kolich.Selector.mouseup = function(){
var st = Kolich.Selector.getSelected();
if(st!=''){
alert("You selected:\n"+st);
}
}

$(document).ready(function(){
$(document).bind("mouseup", Kolich.Selector.mouseup);
});

You can find a complete demo here.  Enjoy.

Twitter (@markkolich)

Translate

About this Archive

This page is an archive of entries from September 2009 listed from newest to oldest.

August 2009 is the previous archive.

October 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.