In yesterday’s post I explained my creative process for uncovering new and interesting search marketing ideas. In this post I want to focus on the other critical element toward becoming an expert: endless experimentation. Of course testing must be done carefully to avoid arriving at the wrong conclusions, which will bring us to another of my favorite topics: human error.
As I like to do, let me explain my process with an actual example.
Last month there was an interesting post on SEOmoz about session IDs and HTTP cookies. In the post, Rand asserted that search engines don’t support cookies, and it’s therefore another alternative to controlling robot access to a site. Very clever; I don’t know how I didn’t think about that first! 🙂
Well, in the comments, King questioned the validity of the original assumption that search engines don’t accept cookies. Here is what he had to say:
I’m not sure its [sic] really true that search engines (Google at least) don’t accept cookies. I recently (well 6 months ago) created a site that checks for cookies before allowing customers access to the shopping cart. If cookies are disabled it sends the user to a[n] info page on the topic Google indexed the actual shopping cart page perfectly well, they totally bypassed the “cookie info” page, and never indexed that at all. Cookie checking was done entirely via PHP code.
For a while I have assumed that Google does not support cookies, but the truth is that search engines are constantly being improved and have evolved over the years. For instance, years ago search engine crawlers did not follow links embedded in JavaScript, but recent experiments have proven that at least Google does follow the less intricate ones.
So, this was a perfect candidate for a simple experiment. Let’s confirm whether search engines accept cookies or not. As best I can, I like to follow the scientific method.
The observation
In order to determine whether or not search engines accept cookies I configured my web server to append cookie information to my visitor log file. If you use Apache as your webserver, this is how you do it. Under your website configuration, change your log format to include HTTP cookie information, like this:
LogFormat “%h %l %u %t \”%r\” %>s %b \”%{Referer}i\” \”%{User-Agent}i\” \”%{Cookie}i\””
The reason I choose logs for my observation is because search engines do not execute the JavaScript tags commonly used by web-based analytics packages. I need to see the behavior of the robots on the site, so my logs are the most logical option. An alternative would be to use a packet sniffer such as tcpdump, but sniffers spit out far more information than I need and parsing web server logs with regular expressions is very simple and straight forward. There is no need to complicate things.
First, I check the log for regular user visits to the site (especially the pages that return HTTP cookies) and I confirm that the cookies are being logged when the user is accepting (and returning them).
195.62.206.192 – – [01/Mar/2008:03:03:04 -0500] “GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1” 200 61477 “http://hamletbatista.com/2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/” “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; MediaCenter PC 5.0; .NET CLR 3.0.04506; InfoPath.2)” “__utma=205505417.902185886.1204356928.1204356928.1204356928.1; __utmb=205505417; __utmc=205505417; __utmz=205505417.1204356928.1.1.utmccn=(organic)|utmcsr=google|utmctr=link%3Awww.mutinydesign.co.uk|utmcmd=organic; fbbb_=1343817243.1.1204357560473; subscribe_checkbox_88ce75a961c252a943f6a63bd04c8d5d=unchecked; comment_author_88ce75a961c252a943f6a63bd04c8d5d=Webeternity+web+design; comment_author_email_88ce75a961c252a943f6a63bd04c8d5d=goodsite%40webeternity.co.uk; comment_author_url_88ce75a961c252a943f6a63bd04c8d5d=http%3A%2F%2Fwww.webeternity.co.uk”
Here most of the cookies are from Google Analytics (check the ones that start with __utm and utm).
Side note: Here I confirmed my suspicion. My loyal reader David Hopkins is responsible for the large amount of manual comment spam I’m receiving lately. Apparently his competitors want to rank top ten in Google for “web design” too 🙂
If search engines support cookies, they should return them back and the server will log them with their visit in an entry.
Now, let see what happened when each one of the top search engines visited the site.
Google – no cookies logged
74.52.123.218 – – [04/Mar/2008:00:20:56 -0500] “GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1” 200 61477 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” “-”
Yahoo – no cookies logged
74.6.28.203 – – [01/Mar/2008:19:18:36 -0500] “GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1” 200 61477 “-” “Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)” “-”
Msn/Live – no cookies logged
65.55.209.101 – – [02/Mar/2008:06:22:15 -0500] “GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/ HTTP/1.1” 200 61477 “-” “msnbot/1.0 (+http://search.msn.com/msnbot.htm)” “-”
As you can see from the log, none of the top search engines returned the cookies and hence the web server didn’t log them.
Formulation of the hypothesis
From direct observation I can conclude that, as of this moment, top search engines do not support HTTP cookies.
Use of the hypothesis to predict the existence of other phenomena
I can therefore predict that it’s possible to use cookies to control or modify the access of robots to my website. A lot of creative things can be done using this technique.
Performance of experimental tests of the predictions by several independent experimenters
Here is my call to you to duplicate these tests on your site and report back whether you get the same results. This is the step that most experimenters miss. You need to share your findings with your peers and the exact procedure you used to arrive at your conclusions, and invite them to test as well and see if they arrive at the same results. “But why do we need to do this?” you might ask. It’s because of human error.
Incorporating Human Error
We are imperfect and we make mistakes. I first learned this lesson years ago in a physics class I took in high school. The teacher taught us to repeat each measurement several times, average the results, and use the lowest and highest values as limits. The idea is that we are must face the fact that we won’t know the exact value, but we can determine a pretty accurate range. The concept of human error in that class was so interesting that I have never forgotten about it (as you can see :-)).
Human error is not limited to just taking measurements. There are many psychological issues as well. Here are four common mistakes that I regularly make and that I see others making when they come up with new SEO theories:
1. Bias. Many times when you are testing a theory you already want it to be true or you want it to be false. It is very hard to start experimenting without some prejudice about what you expect the outcome to be. At the same time, it is just as easy to ignore supporting evidence that is contrary to your desired outcome. Sometimes you want to believe so hard ignore what the data is actually telling you. This is particularly true when you are testing just to prove a point or to prove somebody else wrong.
2. Failing to estimate the errors in the experiment. As I explained above, there will always be a margin of error, and we need to account for it or risk losing the entire value of the experiment.
3. Failing to repeat the experiment under different scenarios and circumstances. I have to admit that this is one mistake I make too often. I guess I am too lazy to repeat my experiments, but nonetheless I know very well the importance of being able to repeat and confirm the conclusions. It is particularly important that the test be duplicated by your peers who will hopefully have different biases than you.
4. Identifying symptoms as diseases. One of the disadvantages that we (SEOs) have is that search engines are black boxes and we don’t know for sure what is going on inside them. We can see the search results and study patterns to arrive at conclusions, but for example, many times observations we make are mistakenly labeled as penalties. It’s easy to jump headlong in the wrong direction. I like to draw this parallel: imagine telling your doctor that you have a headache and he returns with, “Oh, you must have a brain tumor.” There are probably thousands of diseases that share a headache as a symptom and only further tests are going to get at the right diagnosis. The same happens when you observe search results or search engine robot behavior. There are probably hundreds of reasons why a particular result has changed, including something as simple as a random search engine glitch. To be on the safe side, I simply ask myself a common sense question: “What would be the purpose of the search engine doing this? Does this help them do a better job or am I misinterpreting the results?”
Experimenting and testing theories is what separates experts from the pretenders. You need to be highly skeptical of any new concept unless you can see solid proof or you can test it yourself. I’ve witnessed many interesting ideas and concepts unearthed, not because of deep research or deduction, but by observation and trial and error. It’s important to have a receptive mind. Try to avoid these mistakes when you are doing your SEO experiments and I am sure you will become a stellar SEO expert in no time!
Jez
March 7, 2008 at 4:24 pm
Hi Hamlet, Nice to see your posting again! An interesting experiment. I really enjoyed reading the posts on link building strategies you linked to yesterday. Regards your method for determining crawl rates, do you ever run into problems with Google captha? I had the same thing with Davids competitors ;-)
Hamlet Batista
March 7, 2008 at 4:51 pm
Hi Jez, I am glad to see that I didn't lose my loyal readers! I don't know if you knew that, but Google captcha has been broken too. I recommend you put random delays between your requests to avoid the captcha if you need to send the automated queries. I read on their documentation that you can ask them for permission so I will research their requirements for that.
Jez
March 7, 2008 at 5:30 pm
Yes I read that a Russian team claimed to have broken it. I also read that you can scrape Google Accessible Search to avoid the captcha. Can you do this kind of thing with your API key?
Hamlet Batista
March 8, 2008 at 5:37 am
Jez - You are definitely committed to do black hat stuff :-) I prefer not to encourage black hat tactics. Please drop me an email so we can chat.
Andy Beard
March 8, 2008 at 2:00 pm
Just a thought, how were the cookies being set? Hmm Google captcha... I have played around a little using Google Docs with their various tools for importing data from the web, including SERPs They actually give an example of using it with Google.co.uk to grab the links. For some reason the Xpath I tried using would work great with various Xpath tools, but Google Docs was coming up with errors. If someone is a slightly more competent programmer, they might find Google Docs a useful backdoor
Hamlet Batista
March 9, 2008 at 4:24 am
<blockquote>Just a thought, how were the cookies being set?</blockquote> Andy - You got me :-) In reality I did not perform the tests on this site. I performed them on one of my money making sites. The cookies are being set by our affiliate tracking system. I will rerun the test and I appreciate if you can do the same. Google Docs? interesting ...
Gab Goldenberg
March 10, 2008 at 7:23 pm
Whooooosh... The sound of the tecnical stuff flying over my head. That said, I like your tying the scientific method to SEO :). Something I also loved about high school science!
Kristina
March 17, 2008 at 2:40 am
That is interesting because i read that Google does not like cookies. However, like you said, things are evolving all the time. This can be viewed as a good or bad thing.
Hamlet Batista
March 17, 2008 at 5:11 pm
Gab - You know I can't help it. I need to write some technical stuff every once in a while :-) Kristina - thanks for your comment.
freebingo
May 19, 2008 at 11:49 pm
Seriously, I can never truly understand the function of cookie. Why is it even called a cookie?