Engage the Cloaking Device: My presentation at SMX Advanced (slides and comments, too)

by Hamlet Batista | June 07, 2008 | 9 Comments

I promised everybody that I’d be posting my presentation slides from my talk at the SMX Advanced Bot Herding panel, so here they are!
First, let me say that I was very excited to be speaking at a major search marketing conference, and I can say with confidence that all the traveling was definitely worth it. My only regret is that I did not get to finish my presentation. This is the first time I spoke publicly and as an inexperienced speaker I was not even looking at the timer. My apologies to all those in attendance. 🙂 Frankly, I do think speakers should be allowed a little bit more time for SMX Advanced, as you really do need time to lay the groundwork before delving deeply into these sorts of topics.
For those that didn’t come, let me summarize the key takeaways from my speech and put it into context regarding Google’s recent post on Webmaster Central:

Cloaking: Serving different content to users than to Googlebot. This is a violation of our webmaster guidelines. If the file that Googlebot sees is not identical to the file that a typical user sees, then you’re in a high-risk category. A program such as md5sum or diff can compute a hash to verify that two different files are identical.

Basically, Google says that geolocation and IP delivery (when used for geolocation purposes) are fine as long as you present the same content to the Googlebot as you would present to the user coming from the same region. Altering the content the robot sees puts you in “a high-risk category.” Google is so strict that it suggests you need a checksum program to make sure you are delivering the same content. Obviously, it doesn’t matter if your intention is to improve the crawling and indexing of your site or not.
Why would you want to cloak anyway?
Let’s talk about the key scenarios I discussed in my speech:

  • Content accessibility

–          Search unfriendly Content Management Systems. According to Google, if you are using a proprietary CMS that does not allow the flexibility of making the URLs search-engine friendly, or if it has cookie-based session IDs, or has unique titles and descriptions, you need to replace your CMS with a newer one. Using a reverse proxy that cloaks to fix those issues is a “bad idea.” Again: easy for Google, hard for the customer.

–          Rich media sites.  If you use Scalable Inman Replacement, SWFObject, JavaScript or CSS to render rich media content to the user and regular text to the search engine then you are fine, because the checksums will be the same.

–          Content behind forms. Google is experimenting with a bot that will try to pull content from basic forms using HTTP GET and providing values listed in the HTML.

  • Membership sites

–          Free and paid content. Google recommends we register our premium content using Google News’ First click free. The idea is that you need to give searchers the first page of your content for free and they need to register for the rest. This is very practical for newspapers that have resorted to cloaking in the past. I do see a problem with this technique for sites like SEOmoz where some of the premium pages are guides that cost money. If SEOmoz signed up for this service, I would be able to pull all the guides by guessing search terms that would bring them up in the results.

  • Site structure improvements

–          Alternative to PageRank sculpting via “no-follow.” I explained a clever technique where you can cloak a different link path to robots than you present to regular users. The link path for users should be focused on ease of navigation and the link path to regular users should be focused on ease of crawling and deeper index penetration. This is very practical but not really mandatory.

  • Geolocation/IP delivery

–          According to the post we don’t need to worry about this. Some good news at last!

  • Multivariate testing

–          This is a very interesting case and I would have liked them to explain this in the Webmaster Central post. Search engine robots don’t take part in these experiments because they don’t execute JavaScript, yet many users will see a different version of the page than the robot sees. JavaScript-based cloaking will provide the same checksum for the page the bot sees and the page the user sees. I’m sure some clever black hats are taking advantage of this to do “approved” cloaking. 🙂

Google = Romulans
Just like the Romulans from Star Trek, Google doesn’t want cloaking technology in the hands of everyone. I didn’t get to talk about this in my presentation, but let me speculate as to why Google is drawing such a hard line on cloaking: Simply put, it is the easiest, cheapest and most scalable solution for them.
1.       As a developer I can tell you that running checksums against the content presented to Googlebot vs. the content presented to the cloaking detection bots is the easiest and most scalable way for them to do it.
2.       Similar to the problem with paid links, it is easier to let us do all the work of labeling our sites so they can detect the bad guys without having to dedicate a huge amount of resources to solve such problems.
Enjoy the slides and feel free to ask any questions. If you were there at SMX Advanced and watched me present, please let me know your honest comments. Criticism can only help me improve. Let me know what you think of the slides, too. Originally, I had planned to use more graphics than text, but ultimately I thought that the advanced audience would appreciate the added information.

Hamlet Batista

Chief Executive Officer

Hamlet Batista is CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He holds US patents on innovative SEO technologies, started doing SEO as a successful affiliate marketer back in 2002, and believes great SEO results should not take 6 months

9

REPLIES

Try our SEO automation tool for free!

RankSense automatically creates search snippets using advanced natural language generation. Get your free trial today.

OUR BLOG

Latest news and tactics

What do you do when you’re losing organic traffic and you don’t know why?

Getting Started with NLP and Python for SEO [Webinar]

Custom Python scripts are much more customizable than Excel spreadsheets.  This is good news for SEOs — this can lead to optimization opportunities and low-hanging fruit.  One way you can use Python to uncover these opportunities is by pairing it with natural language processing. This way, you can match how your audience searches with your...

READ POST
Making it easier to implement SEO changes on your website

Changes to the RankSense SEO rules interface

As we continue to improve the RankSense app for Cloudflare, we are always working to make the app more intuitive and easy to use. I'm pleased to share that we have made significant changes to our SEO rules interface in the settings tab of our app. It is now easier to publish multiple rules sheets and to see which changes have not yet been published to production.

READ POST

How to Find Content Gaps at Scale: Atrapalo vs Skyscanner

For the following Ranksense Webinar, we were joined by Antoine Eripret, who works at Liligo as an SEO lead. Liligo.com is a travel search engine which instantly searches all available flight, bus and train prices on an exhaustive number of travel sites such as online travel agencies, major and low-cost airlines and tour-operators. In this...

READ POST

Exciting News!
seoClarity acquires RankSense

X