It's funny though because I've often criticized and poked fun at folks touting various SEO (search engine optimization) techniques. My opinion has been that as long as you build a great site, with decent information on it, "they" (visitors) will come. I still live by that mantra but it's clear to me now that Google, and other search engines, really do pay careful attention to your domain. Even so, I have no plans to switch back to kolich.com, but if you are considering a switch to a domain hack like mark.koli.ch you should expect or at least be aware of the changes this move might bring to your page rank in various search results.
November 2009 Archives
It's funny though because I've often criticized and poked fun at folks touting various SEO (search engine optimization) techniques. My opinion has been that as long as you build a great site, with decent information on it, "they" (visitors) will come. I still live by that mantra but it's clear to me now that Google, and other search engines, really do pay careful attention to your domain. Even so, I have no plans to switch back to kolich.com, but if you are considering a switch to a domain hack like mark.koli.ch you should expect or at least be aware of the changes this move might bring to your page rank in various search results.
RewriteCond %{REQUEST_METHOD} ^TRACE [NC,OR]
RewriteCond %{REQUEST_METHOD} ^TRACK [NC]
RewriteRule ^/(.*)$ - [F,L]You can prove to yourself that this works, by using a tool like curl to issue an HTTP TRACE and TRACK to your newly secured web-server. Use the -X option with curl to specify the HTTP request type:
#/> curl -v -A "Curl" -X TRACE mark.koli.ch
* About to connect() to mark.koli.ch port 80 (#0)
* Trying 24.130.215.240... connected
* Connected to mark.koli.ch (24.130.215.240) port 80 (#0)
> TRACE / HTTP/1.1
> User-Agent: Curl
> Host: mark.koli.ch
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Date: Sat, 14 Nov 2009 18:53:06 GMT
< Server: Apache
< Content-Length: 202
< Connection: close
< Content-Type: text/html; charset=iso-8859-1
<
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.</p>
</body></html>
* Closing connection #0
Yep, works nicely. One thing that slightly annoys me though is that the HTTP OPTIONS method still reports that my server supports TRACE, even though I clearly don't anymore. A quick Google search reports that many other folks have had the same concern, with no clear resolution.
This afternoon, I was using the HttpFox Firefox extension to analyze some web-traffic for a work related project. With HttpFox still running in the background (I forgot I left it running), I opened another tab and navigated over to my Twitter page to check out a few things. I clicked a few links, replied to a few folks, etc. Switching back to my work project, I closed Twitter and re-opened HttpFox. Well well well, what do we have here. I discovered that Twitter silently rolled out some JavaScript that actively tracks every link I click on. Any link, in any Tweet, that you click on is silently reported back to Twitter behind the scenes. Looking at the output of HttpFox pretty much proves it:

Looks like twitter.com/abacus is some type of web-service used by Twitter to log what links we're all clicking on. I'm curious why Twitter cares what links we're clicking on.
BTW, Twitter, if you're reading this, the HTTP Content-Type in your responses from /abacus are incorrect. You're phoning home by creating a new Image() in your core JavaScript like so:
But your Content-Type from this request is text/html, which could cause problems in a few browsers. If you're going to use an Image(), the returned Content-Type from your /abacus web-service should be that of an image: image/jpeg, image/png, image/gif, etc.

Cheers.
Looks like twitter.com/abacus is some type of web-service used by Twitter to log what links we're all clicking on. I'm curious why Twitter cares what links we're clicking on.
BTW, Twitter, if you're reading this, the HTTP Content-Type in your responses from /abacus are incorrect. You're phoning home by creating a new Image() in your core JavaScript like so:
(new Image()).src="/abacus?"+$.param(A);
But your Content-Type from this request is text/html, which could cause problems in a few browsers. If you're going to use an Image(), the returned Content-Type from your /abacus web-service should be that of an image: image/jpeg, image/png, image/gif, etc.
Cheers.
For a month or so, I've been gracefully redirecting traffic with an HTTP 301 Moved Permanently. Even so, it appears that Google and other crawlers are still hitting kolich.com looking for stuff that simply doesn't exist there anymore, even though I've been telling them for a solid month to "go look somewhere else." Time to pull out the big guns. A quick flip through my handy copy of RFC 2616, that's the HTTP 1.1 spec, lead me to rediscover HTTP 410 Gone. If you haven't met HTTP 410, it's the forgotten step child of HTTP 404 Not Found. As described here, "Error 410 means 'Resource gone', as in, a resource used to exist at this location, but now it's gone. Not only is it gone, but I don't know (or I don't want to tell you) where it went. If I knew where it went, and I wanted to tell you, I would use error 301 ('Permanent redirect') and any smart client would simply redirect to the new address. But 410 means 'Resource gone, no forwarding address'. Train gone sorry."
Looks great, that's exactly what I need. Time to serve up some 410's. I configured my local mark.kolich.com Apache 2.2.3 virtual host with mod_rewrite to return an HTTP 410 Gone for most resources:
RewriteCond %{REQUEST_URI} !\.(html?)$ [NC]
RewriteCond %{REQUEST_URI} !^/$ [NC]
RewriteRule ^/(.*)$ - [G,L]Note the [G,L] on the RewriteRule directive. G, meaning Gone, and L meaning the last rule in the chain to apply to this request. In this case, any request for a resource that doesn't end in .html (or .htm) and isn't aimed at the server root, I immediately respond with an HTTP 410 Gone. Here's a nice example. I'm handling HTML pages a little differently. Requests for an actual blog entry itself (a resource that ends in .html), are caught an handled a little more gracefully as shown here. I haven't yet decided when to phase out this graceful catch.
In any event, let's see if an HTTP 410 gets the attention of those pesky crawlers and RSS feed readers. To be continued ...
Many sites are moving towards dynamic XML sitemaps. These sitemaps let you tell Google, Yahoo, Bing, and Ask.com which pages on your site they should index, how often, and when they were last modified. You can even assign a priority to each page in the sitemap, which serves as an indication of how important a specific page is in relation to others.
The sitemap protocol is well defined here at sitemaps.org.
Yesterday, I configured Movable Type, my blog publishing platform, to automatically generate my own sitemap.xml when I publish a new page or blog entry. I added a custom Movable Type Index Template that would automatically generate a complete sitemap.xml for me, and place it under the root of my blog at http://mark.koli.ch/sitemap.xml.
1- My Sitemap XML Index Template
My custom sitemap XML Index Template is relatively straightforward. In my Movable Type control panel, I clicked "Create index template" on the Blog Templates screen. I named my template "XML Sitemap" and used the following configuration:
Using the sitemap XML protocol defined at sitemaps.org, I configured this template to include my blog root, all pages, all entries, and all archives in the sitemap. I assigned a higher priority to my blog root and individual pages versus the entries and archives. Note that I also omitted the <changefreq> tag under each "page", because I have no idea how often those pages will actually change. Also, I intentionally omitted the <lastmod> tag under each archive page, since again, there's not point in defining the last modified date on an archive.

Of course, you're free to change this template as you see fit as long as it adheres to the sitemap standard.
2- Submit Your Sitemap XML (Submission URL's)
Once you publish your sitemap with Movable Type, you'll probably want to alert Google, Bing, Yahoo and Ask.com that you've got a new sitemap.xml available for your blog. As described here in the sitemap protocol, you can "ping" these web crawlers to alert them of the change. To do so, copy and paste these URL's into a web-browser, and replace <sitemap URL> with the full URL to your new sitemap:
Example:
On each submission, you should see some type of successful (HTTP 200 OK) response indicating that your submission was successful. Here's what Google's looked like:

Enjoy!
The sitemap protocol is well defined here at sitemaps.org.
Yesterday, I configured Movable Type, my blog publishing platform, to automatically generate my own sitemap.xml when I publish a new page or blog entry. I added a custom Movable Type Index Template that would automatically generate a complete sitemap.xml for me, and place it under the root of my blog at http://mark.koli.ch/sitemap.xml.
1- My Sitemap XML Index Template
My custom sitemap XML Index Template is relatively straightforward. In my Movable Type control panel, I clicked "Create index template" on the Blog Templates screen. I named my template "XML Sitemap" and used the following configuration:
<?xml version="1.0" encoding="<$mt:PublishCharset$>"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<!-- blog root -->
<url>
<loc><$mt:BlogURL encode_xml="1"$></loc>
<lastmod>
<mt:Entries lastn="1">
<$mt:EntryModifiedDate utc="1" format="%Y-%m-%dT%H:%M:%S+00:00"$>
</mt:Entries>
</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<!-- pages -->
<mt:Pages lastn="0">
<url>
<loc><$mt:PagePermalink$></loc>
<lastmod><mt:PageModifiedDate utc="1" format="%Y-%m-%dT%H:%M:%S+00:00"$></lastmod>
<priority>0.8</priority>
</url>
</mt:Pages>
<!-- entries -->
<mt:Entries lastn="0">
<url>
<loc><$mt:EntryPermalink encode_xml="1"$></loc>
<lastmod><$mt:EntryDate utc="1" format="%Y-%m-%dT%H:%M:%S+00:00"$></lastmod>
<changefreq>never</changefreq>
<priority>0.6</priority>
</url>
</mt:Entries>
<!-- archives -->
<mt:ArchiveList archive_type="Monthly">
<url>
<loc><mt:ArchiveLink></loc>
<priority>0.4</priority>
<changefreq>never</changefreq>
</url>
</mt:ArchiveList>
</urlset>
Using the sitemap XML protocol defined at sitemaps.org, I configured this template to include my blog root, all pages, all entries, and all archives in the sitemap. I assigned a higher priority to my blog root and individual pages versus the entries and archives. Note that I also omitted the <changefreq> tag under each "page", because I have no idea how often those pages will actually change. Also, I intentionally omitted the <lastmod> tag under each archive page, since again, there's not point in defining the last modified date on an archive.
Of course, you're free to change this template as you see fit as long as it adheres to the sitemap standard.
2- Submit Your Sitemap XML (Submission URL's)
Once you publish your sitemap with Movable Type, you'll probably want to alert Google, Bing, Yahoo and Ask.com that you've got a new sitemap.xml available for your blog. As described here in the sitemap protocol, you can "ping" these web crawlers to alert them of the change. To do so, copy and paste these URL's into a web-browser, and replace <sitemap URL> with the full URL to your new sitemap:
http://www.google.com/ping?sitemap=<sitemap URL>
http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=<sitemap URL>
http://submissions.ask.com/ping?sitemap=<sitemap URL>
http://www.bing.com/webmaster/ping.aspx?siteMap=<sitemap URL>
Example:
http://www.google.com/ping?sitemap=http://mark.koli.ch/sitemap.xml
On each submission, you should see some type of successful (HTTP 200 OK) response indicating that your submission was successful. Here's what Google's looked like:
Enjoy!
If you follow me on Twitter, you may have noticed I silently launched Onyx a few weeks ago. In a nutshell, as described on the Onyx homepage ..."Onyx is a social file management tool I built to help me keep track of, organize, and share my digital archive. While browsing the web, I tend to accumulate a lot of junk; if I like something, I save it. If I see a cool application of some sort, I'll take a screen shot. If I find a cool song, I'll snag it for later. Or, if I have an important document I need to archive, I'll store it. All of this digital content was sitting around in a relatively unorganized and unsearchable set of files and directories on a local file system. Onyx is my solution to this digital content clutter problem. Files and bookmarks uploaded into Onyx can be protected, searched, organized and shared much easier than a set of files and directories on my local disk."
Yep.
Onyx was a chance for me to "cross off" an important task on my digital TODO list that's been hanging over my head for a while: organize and archive all of my digital crap. It also gave me a chance to play with some new technologies I've been wanting to integrate into a real project for quite a while, like jQuery UI's draggable and droppable. I also learned how to base-36 encode numbers for a tiny URL, and solved a very annoying problem using HTTPS with Internet Explorer.
In the last 24-hours, I finished uploading all of my personal, and public, digital content into Onyx which you can browse here from my Onyx home directory. Of course, like any good file management solution, my personal/private files are protected. What you'll see in my home directory are files and folders I've allowed the public to view.
For the curious software engineer, Onyx is written entirely in PHP running on Apache 2.2.3. I'm also using a clever little mod_rewrite hack in Apache to drop the .php on each Onyx URL. Dropping the .php makes my URL's look a little cleaner; hipster Django and RoR can suck on that one. You may also ask why I named this project "Onyx". As described here on Wikipedia, Onyx is a type of colorful layered quartz which contains bands of almost every color. This colorful layering reminded me of the layered structure of a file system: files, folders, bookmarks, etc. all mashed together. Hence, Onyx.
If you'd like to read a little more about my Onyx project, you might find this post interesting. Thoughts and feedback are always welcome.
Rock on.
What's bothersome to me is that I need a Google Account to see what Google supposedly knows about me. Well, what about those cute little .google.com cookies they shove into my browser when I use their search engine? IMHO, Google Dashboard is missing one key feature: the ability to clearly show me what Google knows about me and my web-search history, anonymously, based on the already unique ID tracking numbers in those cookies. Google, why do I need an account to see what you've learned about me based on my "anonymous" web-history?
There's probably only a few realistic explanations for why Google wouldn't let you see this information:
- Their cookies aren't actually used for tracking of web-searches and user habits. I suppose this is a possibility.
- Or, more likely, analyzing your web-search traffic is where the real bacon is. And, not surprisingly, Google doesn't want to show us the real underlying data their advertising engine uses to show us ads, which is their primary revenue stream. I guess I don't blame them. After all, they are just another public corporation with shareholder responsibilities.
I'm awfully tired of the world bending over and blindly accepting everything Google throws at us as the greatest thing since sliced bread. If you really understand how Google makes their money, you should also try to understand what Google is not showing us, or not telling us, and why.
Blocking Google Cookies in Firefox
For the most part, I've given up on Google. Their web-search is fine, but I don't particularly enjoy the fact that my web-search and browsing history is "anonymously" tracked behind my back. If you'd like to permanently, or temporarily, block Google from inserting their nosy tracking cookies into your browser you can easily do so by setting a "cookie exception" in Firefox (assuming you use Firefox):
- Click the Tools menu, and select "Options...".
- Click the Privacy tab.
- Click the "Exceptions..." button.
- In the "Address of web site:" box, enter ".google.com" no quotes and click Block to add the google.com domain to your blocked list.
A few blog readers astutely pointed out that if you block cookies from .google.com, you won't be able to login to any Google services. Yes, I know that. And for the record, I don't use Gmail or any other Google Account that would require me to login on a regular basis. When I need to login to my Google Code account, I temporary unblock .google.com, and login.


