Rediscovering HTTP 410 Gone

| 1 TrackBack
http-410-gone.pngThis evening, I accomplished another important milestone in the kolich.com migration process.  Since acquiring koli.ch, I've been slowly migrating my blog and all of its resources to my new online home under mark.koli.ch.  This migration began back in September 2009, as I aptly described here and here.  Only problem is, I still see a ton of bots and other users requesting resources under kolich.com, even though as far as I'm concerned, it's shut down and there's nothing to see there.

For a month or so, I've been gracefully redirecting traffic with an HTTP 301 Moved Permanently.  Even so, it appears that Google and other crawlers are still hitting kolich.com looking for stuff that simply doesn't exist there anymore, even though I've been telling them for a solid month to "go look somewhere else."  Time to pull out the big guns.  A quick flip through my handy copy of RFC 2616, that's the HTTP 1.1 spec, lead me to rediscover HTTP 410 Gone.  If you haven't met HTTP 410, it's the forgotten step child of HTTP 404 Not Found.  As described here, "Error 410 means 'Resource gone', as in, a resource used to exist at this location, but now it's gone. Not only is it gone, but I don't know (or I don't want to tell you) where it went. If I knew where it went, and I wanted to tell you, I would use error 301 ('Permanent redirect') and any smart client would simply redirect to the new address. But 410 means 'Resource gone, no forwarding address'. Train gone sorry."

Looks great, that's exactly what I need.  Time to serve up some 410's.  I configured my local mark.kolich.com Apache 2.2.3 virtual host with mod_rewrite to return an HTTP 410 Gone for most resources:

RewriteCond %{REQUEST_URI} !\.(html?)$ [NC]
RewriteCond %{REQUEST_URI} !^/$ [NC]
RewriteRule ^/(.*)$ - [G,L]

Note the [G,L] on the RewriteRule directive.  G, meaning Gone, and L meaning the last rule in the chain to apply to this request.  In this case, any request for a resource that doesn't end in .html (or .htm) and isn't aimed at the server root, I immediately respond with an HTTP 410 Gone.  Here's a nice example.  I'm handling HTML pages a little differently.  Requests for an actual blog entry itself (a resource that ends in .html), are caught an handled a little more gracefully as shown here.  I haven't yet decided when to phase out this graceful catch.

In any event, let's see if an HTTP 410 gets the attention of those pesky crawlers and RSS feed readers.  To be continued ...

Did You Find this Helpful?

Did you find this post helpful, or at least, interesting?

  

Send Mark a Direct Message

If you'd like to send me a direct message, please do so below. However, I do not publicly post comments or messages submitted directly to me. So, if you're going to try to SPAM me, or my blog, you're pretty much wasting your time.

400 characters remaining

Error

About Mark

A Silicon Valley native, Mark Kolich is a full-time Software Engineer, a casual entrepreneur, and a consultant for hire. A web technologies expert, his current focus is on building powerful and robust cloud-driven web-applications using Java, PHP, Perl, AJAX, DHTML, CSS, and JavaScript. His favorite programming languages are PHP, Java and JavaScript. He uses Linux, enjoys biking to work, loves building great software, and always writes elegant, readable, and maintainable code.

1 TrackBack

You're reading this blog, so you may have noticed that I'm operating under mark.koli.ch; a domain hack of my name using the .ch Swiss ccTLD.  The previously retired kolich.com sits idle, serving up HTTP 410 Gone's when appropriate.  Interesti... Read More

Twitter (@markkolich)

Translate

About this Entry

This page contains a single entry by Mark Kolich published on November 10, 2009 9:47 PM.

Let Movable Type Generate Your XML Sitemap with a Custom Index Template was the previous entry in this blog.

The Twitter Abacus: Silently Logging Every Link You Click On (twitter.com/abacus) is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.