October 2009 Archives

A blog reader recently contacted me with an interesting question: can you explicitly tell Java when and where to flush its permgen and heap to disk?  The answer, based on what I understand about Java and operating system fundamentals, is no.

I can say with much certainty that you can't control where Java saves its heap and permgen (either on disk or in memory).  Java itself doesn't know about paging stuff out to disk.  It simply asks the operating system for the memory it needs and if the OS can't give it, then either the OS has to fail the request or make room by paging out unused chunks of memory to swap.  In other words, Java relies on the host OS to handle this type of stuff.

But what if you're dealing with millions of Java objects on a standard computer and you don't have room to keep all of those objects in physical memory?  In this case, your only real option is to write code that manually swaps objects in/out of the disk.  Of course, this requires that you implement your own swapping mechanism, which isn't too bad.  When your Java application needs a set of objects, it loads what it needs into memory from disk, does some stuff with the objects, then writes them back out to disk.

Meet java.io.Serializable:
http://java.sun.com/javase/6/docs/api/java/io/Serializable.html

Here's an example:

import java.io.ByteArrayOutputStream;
import java.io.ObjectOutputStream;
import java.io.OutputStream;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;

public class Dog implements Serializable {

private static final long serialVersionUID = -4367737315167700936L;

private String name_;
private String breed_;

public Dog (String name, String breed) {
this.name_ = name;
this.breed_ = breed;
}

@Override
public String toString(){
return String.format("%s:%s", this.name_, this.breed_);
}

public static void main (String [] args) {

final List<Dog> dogs = new ArrayList<Dog>();
dogs.add( new Dog("Fido", "mutt") );
dogs.add( new Dog("Clifford", "big red dog") );

ByteArrayOutputStream os = null;
ObjectOutputStream out = null;
for( Dog d : dogs ){
try {

// To write the dogs out to a file, you'll of course
// need to use a FileOutputStream instead of a
// ByteArrayOutputStream
os = new ByteArrayOutputStream();
out = new ObjectOutputStream(os);
out.writeObject(d);

// Print the serialized version of Dog
final String serialized = os.toString();
System.out.println(d.toString() + " serialized is: " +
serialized);
 
} catch (Exception e) {
e.printStackTrace(System.err);
}
finally {
closeQuietly(os);
closeQuietly(out);
}
}

}

private static final void closeQuietly(final OutputStream os){
try {
os.close();
} catch (Exception e) { }
}

}

Each of the objects you wish to save to disk will have to implement java.io.Serializable.  This will let you convert a Java object into something that can be written out to disk.  From there, you will have to write some type of queue or stack control mechanism that will know when, from where, and how to page these objects in and out of the disk.
While adding permalink and Twitter support to Onyx last week, I realized that PHP can easily convert a number between two arbitrary bases using its built in base_convert function.  This was crucial for me because it meant that I could quickly generate a base-36 encoded string from a standard base-10 number:

$tiny = base_convert($key, 10, 36);

As Wikipedia explains, "...the choice of 36 is convenient in that the digits can be represented using the Arabic numerals 0-9 and the Latin letters A-Z.  Base 36 is therefore the most compact case-insensitive alphanumeric numeral system using ASCII characters...".  Not to mention, it's the highest radix most languages support by default.

I built Onyx to maintain a list of files (as inode's) in a MySQL database table.  Luckily, the inode's are stored in this table as auto incrementing BIGINT(20)'s, which are of course, base-10 encoded numbers:

CREATE TABLE files (

f_inode BIGINT(20) NOT NULL AUTO_INCREMENT,
...

INDEX inode_index ( f_inode ),
PRIMARY KEY ( f_inode ),
...

) TYPE=InnoDB;

Base-36 encoding my inode's is key because this means that I can easily, for example, convert base-10 inode 3,032 into "2c8" base-36 for a tiny URL.  From there, it's simply a matter of using the base-36 encoded string in a tiny/shortened permalink URL to refer to the correct file/inode on Onyx.  Yes, this is a poor example because I'm only saving 1 character in a supposedly shorter URL ("3032" vs "2c8").  However, this makes a lot more sense with larger base-10 numbers like 105,621,983 which condenses nicely into "1qvufz" base-36.  Personally, I'd rather see /1qvufz on the end of a "tiny" URL than /105621983.

Enjoy.
7zip-sfx-extracting-shot.pngI've been playing around with a lot of installer type stuff recently.  I discovered that Mozilla Firefox uses the 7zip SFX install launcher (if that's what you call it?) to kick off the Firefox installation process.  I started playing around with 7zip SFX, and realized that you can do some pretty cool stuff with it.  In fact, I discovered that you can actually bundle a Java app and the Java Runtime Environment (JRE) into your own little 7zip SFX launcher.  Naturally, this means you can write a Java app and then let your users start it by double clicking a native Win32 .exe.  And best of all, because your launcher contains the Java Runtime Environment, the user does not have to have a JRE installed on their system to run your application!  The launcher extracts the JRE and your app to a temporary directory, then launches it using that freshly extracted JRE.

Continue reading for the details ...
While wrapping up a few bugs on one of my latest projects, Onyx, I encountered one of the most annoying and irritating IE nuisances to date.  Let me reiterate my hatred for Internet Explorer.  So here's the deal.  In PHP, I was serving up a few static files nestled away on the web-server.  The user would click a link, my PHP would fopen a file, and send it to the browser.  Simple enough, and this worked just fine on HTTP.  Then, I moved my new web-application to a production environment under HTTPS.  Suddenly, some downloads failed on IE with the following error:

"Internet Explorer cannot download [file] from [server]. Internet Explorer was not able to open this Internet site. The requested site is either unavailable or cannot be found. Please try again later."

Here's a screen shot of the error from IE8:

ie-cannot-open-file-https.jpg


With a little digging, I tracked down the following support pages on microsoft.com that explain the problem:

It appears that Internet Explorer gets confused when it sees a Pragma and Cache-Control header together in the same response.  And, it has difficulties interpreting these headers properly over HTTPS.  As described on microsoft.com, "when Internet Explorer communicates with a secure Web site through SSL, Internet Explorer enforces any no-cache request. If the header or headers are present, Internet Explorer does not cache the file. Consequently, Office cannot open the file."  And, when you're using sessions in PHP, the PHP engine automatically by default inserts the Expire, Cache-Control, and Pragma headers in your responses as explained here.

In my case, the solution to this issue was to remove the Pragma header in my HTTP response.  In PHP, I forcefully unset the Pragma header with the following code-snippet:

header("Pragma: ");

This successfully clears the Pragma header in my responses, and all is well.  With this tweak, IE7 and 8 correctly handle my file downloads.
I threw myself right into the bitness fire this afternoon, trying to figure out how to reliably determine if a Windows OS is 32-bit or native 64-bit.  I tried all sorts of things, everything from a VB Script, to a few tiny C++ programs built to issue WMI queries.  I tried using WMI to read the OSArchitecture property from its "SELECT * FROM Win32_OperatingSystem" output, but that failed miserably on Windows XP.  As it turns out, the OSArchitecture property you can read via WMI wasn't added until Vista.  Nice.  So how do we check for 64-bit Windows XP?

After about an hour or so of searching, I stumbled across this post which described in reasonable detail what I needed to:

"In your code, you first need to check the size of IntPtr, if it returns 8 then you are running on a 64-bit OS. If it returns 4, you are running a 32 bit application, so now you need to know whether you are running natively or under WOW64. To get this information you will need to call kernel32.dll API "IsWow64Process" using PInvoke, this API returns a Boolean 'true' if you are running under WOW64, that means you are running 32 bit application on a 64-bit Windows system. Be careful however to check the OS version before calling this API, only XP SP2 and implements this one."

How could Microsoft make something that should be so easy, so complicated?  What a FAIL.

I then found this post, that offered up a nice little 32-bit C++ app one can compile and run to check the real, actual bitness of the OS.  For the most part, it does exactly what the first post said I needed to do, with the exception of checking the size of IntPtr's.  I cleaned it up a little bit, and successfully compiled it on 32-bit Windows XP with Microsoft Visual C++ 2005 (Version 8.0.50).  This app checks the bitness of the OS by discovering if it's running under WOW64, or natively as a 32-bit app.  WOW64 stands for Windows-on-Windows, it's the 64-bit only kernel subsystem that lets 32-bit apps run on 64-bit Windows.  The bitness checker starts and then asks the Windows kernel if it's running under WOW64.  If it is running under WOW64, that clearly means it's a 32-bit app (as compiled) running on a 64-bit OS.  If it's not running in WOW64, then we're on a 32-bit OS ...

#include "stdafx.h"
#include <iostream>
#include "comutil.h"

#define RESPONSE_32_BIT "32"
#define RESPONSE_64_BIT "64"

using namespace std;

typedef BOOL (WINAPI *IW64PFP)(HANDLE, BOOL *);

int main(int argc, char **argv){

BOOL res = FALSE;

// When this application is compiled as a 32-bit app,
// and run on a native 64-bit system, Windows will run
// this application under WOW64. WOW64 is the Windows-
// on-Windows subsystem that lets native 32-bit applications
// run in 64-bit land. This calls the kernel32.dll
// API to see if this process is running under WOW64.
// If it is running under WOW64, then that clearly means
// this 32-bit application is running on a 64-bit OS,
// and IsWow64Process will return true.
IW64PFP IW64P = (IW64PFP)GetProcAddress(
GetModuleHandle(L"kernel32"), "IsWow64Process");

if(IW64P != NULL){
IW64P(GetCurrentProcess(), &res);
}

cout << ((res) ? RESPONSE_64_BIT : RESPONSE_32_BIT) << endl;

return 0;

}

Will output "32" on 32-bit Windows and "64" on 64-bit Windows.  Download the pre-built .exe and .cpp source here.

  • Tested and worked on:  32-bit Windows XP Professional SP3, 32-bit Windows Vista Enterprise SP2, 64-bit Windows Vista Enterprise, 32-Bit Windows 7 Home Premium, 64-bit Windows 7 Home Premium.

  • Did NOT test on: 64-bit Windows XP. I don't have a 64-bit Windows XP box lying around. If you have one, and you can check this code for me, please let me know and I'll be happy to update this post and give you credit for testing it for me. @cmsimike confirmed on 10/27/09 that my bitness checker also works perfectly on 64-bit Windows XP.

  • Roger confirmed on 11/12/09 that my bitness checker also works on 32-bit Windows Server 2003 and 64-bit Windows Server 2008.  Thanks, Roger!

Cheers.
If you ever use Java to check if a system is 32 or 64-bit, you should know that Java's "os.arch" system property returns the bitness of the JRE, not the OS itself.  Sites like this are WRONG; any resource that claims Java's "os.arch" property returns the real "architecture of the OS" is lying.  Case in point, I recently ran this tiny program on a 64-bit Windows 7 machine, with a 32-bit JRE:

import com.sun.servicetag.SystemEnvironment;

public class OSArchLies {

public static void main(String[] args) {

// Will say "x86" even on a 64-bit machine
// using a 32-bit Java runtime
SystemEnvironment env =
SystemEnvironment.getSystemEnvironment();
final String envArch = env.getOsArchitecture();

// The os.arch property will also say "x86" on a
// 64-bit machine using a 32-bit runtime
final String propArch = System.getProperty("os.arch");

System.out.println( "getOsArchitecture() says => " + envArch );
System.out.println( "getProperty() says => " + propArch );

}

}

The output from this tiny app on a 64-bit box:

#/> java OSArchLies
getOsArchitecture() says => x86
getProperty() says => x86

In this case, one would expect to see something like "x86_64" or "amd64" instead of just "x86".  Bottom line, don't believe what you read online about "os.arch" and other Java system properties.  They are usually properties of the JRE/JDK itself, and not necessarily the real properties of the underlying OS or architecture.  If you need to check if a system is actually 32 or 64-bit, you should look elsewhere in the system registry or write your own native app and call it from Java.
onyx-logo.jpgIn my spare time over the last several weeks, I've been hammering out a new personal project I've had on my to-do list for quite a while.  I named it Onyx.  Onyx is my solution to an ongoing and often frustrating "digital content clutter" problem.

Why?:


While browsing the web, I tend to accumulate a lot of junk; if I like something, I save it. If I see a cool application of some sort, I'll take a screen shot. If I find a cool song, I'll snag it for later. Or, if I have an important document I need to archive, I'll store it. As it turns out, all of this digital content was sitting around in a relatively unorganized, unsearchable, and unsharable set of files and directories on a local file system.

Onyx is my solution to this problem. Files uploaded into Onyx can be protected, searched, organized, and shared much easier than a set of files and directories sitting on my local disk.  Plus, the file storage itself is hosted on my web-server "in the cloud", which means that I can access it using any decent web-enabled device.  And, when necessary, I can easily share files and directories in Onyx with friends, family, or co-workers via Twitter, email, or instant messenger.

How?:

Onyx is a web-app written in PHP 5 with a MySQL back end.  Data uploaded into Onyx is physically stored in the "cloud" on the web-server file system.  This proved to be an interesting and challenging atomic transaction problem.  Because I'm using a MySQL database AND a file system to organize and store files in Onyx, I have to make sure that my database and file system stay in sync.  I could have used a BLOB to store the file data itself inside of my Onyx MySQL database, but I avoided that because there may be cases where I want to access the uploaded files without a database (e.g., if my database crashes, or if I do something stupid, I don't want to lose all of my files in a database table).

Where?:

When ready, Onyx will be available at http://onyx.koli.ch.

When?:

In the next few days I'll be putting the finishing touches on Onyx, and plan to release it to the world in beta mode.  And, if possible, I hope to add SSL (HTTPS) support for security.

Inspiration to build Onyx courtesy of Baconfile.  You'll notice the fundamental look and feel, and some icons, were kindly borrowed from Leah Culver and Wilson Miner of Baconfile.

Continue reading for a few screen shots.
koli.ch-blog-move-success.pngAfter quite a bit of planning, I finally got around to moving my blog to http://mark.koli.ch.  Previously, my blog was hosted on kolich.com, but I'm planning on using my dot-com for another personal venture.  My mobile blog has also moved to http://mobi.koli.ch.  As planned, everything "personal" is now consolidated under koli.ch.  And so, if you notice a problem (broken links, errors, etc.) please let me know.

Once again special thanks to my registrar, Network Solutions, for adding the .ch Swiss ccTLD to their domain lineup.  No, I'm not Swiss, but I do enjoy Swiss cheese, and I've been waiting for NetSol to support .ch for a while so I could snag my own personal domain hack.

IMHO, koli.ch is quite del.icio.us.
OK, I'm on a slightly vengeful tear this week dealing with the infamous hot-linking problemHere, I explained how to more gracefully handle hot-linking blogs, forums, and other sites.  Then yesterday, I explained the situation to a few folks at work, and they suggested that I investigate returning a really massive and annoying animated GIF instead of a big static red square!  So, I looked into it, and it turns out that I'm able to return this 5000 x 5000 pixel animated GIF at a fraction of the bandwidth it would cost me to return this PNG'ed red square.  This new animated GIF alternates between red and yellow at roughly 200ms per frame.  It's quite annoying, and even better, many browsers struggle to render this animated GIF given it's width and height (and it's only 36KB)!

Enough hot-linking posts, I think I finally have this problem under control.

Have a great Friday.
nohotlink.pngYesterday, I caught kentuckysportsradio.com hot-linking to images on my blog.  I noticed an insane amount of traffic on my server requesting a single image, over and over again from the same referrer.  Classic case of hot-linking, and boy, does that irritate me.  Really though, is it that tough to just save an image and host it on your own blog instead of linking to others?

In a previous post I described how to address the hot-linking problem by using Apache to check the referrer on each request.  My previous solution involved simply returning a transparent 1x1 pixel image in place of the actual image requested.  Well, through this KentuckySportsRadio situation, I discovered that when your server is pounded on and the hot-linker doesn't notice that it's bothering you, they don't (err, won't) stop hot-linking.  Unfortunately, returning a transparent image just dosen't get their attention well enough.  So, I stepped it up a notch.

Instead of returning a transparent 1x1 pixel image, I decided to become really obvious and configured my server to return a massive, bright red, 2000x2000 pixel square.  This huge red square is bound to capture someone's attention.  And, wouldn't you know it, it worked!  After tweaking my server to send back this really annoying and disruptive 2000x2000 pixel square, KentuckySportsRadio.com gave up and changed their ways.  I win.

This is a great win for bloggers and system administrators everywhere.  You can somewhat curb hot-linking, and at the same time, my big red 2000x2000px square is only about 40Kb in size so it won't eat up your bandwidth.  Feel free to use my big red stop hot-linking square on your own site.  It's bound to make a great addition to a hot-linkers blog, forum, or web-site!

I love the smell of HTTP in the morning.

Twitter (@markkolich)

Translate

About this Archive

This page is an archive of entries from October 2009 listed from newest to oldest.

September 2009 is the previous archive.

November 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.