Understanding How Dominant Google Really is ... Analyzing Your Apache Logs

| No TrackBacks
google-dominant.jpgI knew Google is the world's search leader, but I didn't really comprehend how dominant Google is until I took a look at my own Apache logs.  I ran a few crude commands to compile some basic numbers on all traffic coming into kolich.com since December of '08.  Specifically, I was focused on the HTTP Referrer header; the referrer tells me where the request coming into my blog originated from (e.g., which site).  Of course, anyone with enough skill can easily fake the referrer header on an incoming HTTP request, so my numbers won't be dead-on accurate.  But, for my own curiosity, they're good enough.  As it turns out, Google owned referrers (Google search results, links to kolich.com from other other Google pages, etc.) account for approximately 91.6% of all incoming traffic to my site.  That's awfully significant, considering Yahoo! only accounts for about 0.8% of all kolich.com traffic.  No wonder Yahoo! is hemorrhaging talent.

Continue reading for the analysis ...
My Apache Server logs everything to /var/log/httpd, which is the default log location on any base CentOS install.  Using the wonderful bash | (pipe), I can easily chain commands together to get what I need.

First, let's count how many unique referrers generated traffic into kolich.com:

(root@skull)/var/log/httpd> cat access_log* | \
awk '{print $11}' | \
grep -v "\"-\"" | \
grep -v "mark\.kolich\.com" | \
grep http | sort -u | wc -l
837

So, I received 837 hits from unique referrers.  This command chain is relatively simple.  First, I'm using awk to only capture the referrer field of the Apache access_log files.  Secondly, I'm using grep -v to strip out any empty referrer strings, and any requests originating my from my domain.  For example, a click from mark.koli.ch to mark.koli.ch/page2.html does not count.  Third, I'm using sort -u to only show me the set of unique strings.  And finally, I'm using wc -l to count the number of lines in the result.

Now that I have an aggregate total, let's find out how many referrers were NOT from Google:

(root@skull)/var/log/httpd> cat access_log* | \
awk '{print $11}' | \
grep -v "\"-\"" | \
grep -v "mark\.kolich\.com" | \
grep http | sort -u | \
grep -v google | wc -l
70

So, only 70 unique hits to my site were from non-Google sources.  This is basically the same command as before, but I added a grep -v google near the end of the chain to ignore any referrers containing the word "google".

Now, if we do the math on the numbers we have so far, it's clear that Google accounts for 91.6% of all traffic coming into kolich.com:

(root@skull)/var/log/httpd> bc -lq
837-70
767
(767/837)*100
91.63679808841099163600
quit

Just for grins, let's see how much traffic originates from Yahoo!:

(root@skull)/var/log/httpd> cat access_log* | \
awk '{print $11}' | \
grep -v "\"-\"" | \
grep -v "mark\.kolich\.com" | \
grep http | sort -u | \
grep -i yahoo | wc -l
7

Wow, a huge 7-hits!  Let's do the math on that:

(root@skull)/var/log/httpd> bc -lq
(7/837)*100
.83632019115890083600
quit

So, only 0.8% of all traffic into kolich.com based on unique referrer originates from Yahoo!.  Google is over 91%.  The remaining 7.6% of traffic was from other non-Google and non-Yahoo sources.

Now you see how dominant Google really is.

Did You Find this Helpful?

Did you find this post helpful, or at least, interesting?

  

Send Mark a Direct Message

If you'd like to send me a direct message, please do so below. However, I do not publicly post comments or messages submitted directly to me. So, if you're going to try to SPAM me, or my blog, you're pretty much wasting your time.

400 characters remaining

Error

About Mark

A Silicon Valley native, Mark Kolich is a full-time Software Engineer, a casual entrepreneur, and a consultant for hire. A web technologies expert, his current focus is on building powerful and robust cloud-driven web-applications using Java, PHP, Perl, AJAX, DHTML, CSS, and JavaScript. His favorite programming languages are PHP, Java and JavaScript. He uses Linux, enjoys biking to work, loves building great software, and always writes elegant, readable, and maintainable code.

No TrackBacks

No trackbacks attached to this entry.

Twitter (@markkolich)

Translate

About this Entry

This page contains a single entry by Mark Kolich published on January 23, 2009 9:45 AM.

Finally, Solid State Hard Disks (SSD's) are Moving into the Mainstream was the previous entry in this blog.

Make Your Own Ringtones With makeownringtone.com is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.