If you want to become an expert you need to start thinking like one. People perceive you as an authority in your field not because you claim you are, but by listening to what you say or reading what you write. From my personal experience, the key seems to be the originality, usefulness and depth of what you have to share. Recently I was very honored to contribute to a link-building project. I wanted to share with you my idea, but more than that, in this blog I like to take extra time to explain the original thought process that helped me come up with the idea in the first place.
The Challenge
Toolbar PageRank was a very important factor in measuring the quality of a link for a long while. But Google has played so much with it that it can hardly be considered reliable these days. I like to see problems like these as challenges and opportunities, so I decided to look hard for alternatives. I know there are several other methods (like using the Yahoo backlink count, number of indexed pages, etc.) but I did not feel these directly reflected how the link was important to Google, or to any other specific search engine. Each search engine has its own evaluation criteria when it comes to links, so using metrics from one to measure another is not a reliable gauge in my opinion.
I knew the answer was out there, and I knew just where to look.
Research: Putting the pieces together
Especially beyond the beginner stage, you need to make it a habit to research and read as much as possible. For advanced SEO knowledge, my favorite sources of research material are search engine–related patents and information retrieval research papers. In order to avoid getting lost while reading such documents, I recommend first reading the excellent tutorials on linear algebra and Information Retrieval from Dr. Garcia. Bill Slawsky also has an excellent blog where he unearths useful patents and provides excellent commentary. In fact Bill unearthed a very relevant patent, which provided a valuable insight for my Toolbar PageRank challenge. It is called: Anchor tag indexing in a web crawler system. Here is part of Bill’s conclusion:
The information about crawling rates, and the possible role of PageRank along with frequency of changes in content, which could influence how frequently a page is crawled is also the most detailed that I can recall seeing in a patent from Google.
From the patent I learned that search engines (at least Google) define their crawler priorities based in part on how important they think the page is and how frequently the page is updated.
So here comes a logical conclusion. If Google (and maybe other search engines) use the real PageRank or importance of the page as a criterion to determine how frequently they should visit the page, then by studying how frequently they visit a page I can indirectly determine the real PageRank or importance of the page to the search engine. Bingo!
Digging Deeper: Your own research
If you want to dig deeper, and you definitely should, you must learn to find valuable sources yourself and come up with your own conclusions. I personally love Google’s patent search. You simply need to use keywords like “pagerank,” “yahoo,” “google,” “anchor text,” and so on; you can order them by date and select “Issued Patents” or “Patent Applications” to come up with a goldmine of information and topics. Many of the ideas are never implemented, but some are—and it is good to be a step ahead of the rest of your competitors. In order to find interesting research papers you can use Google Scholar and type in queries related to search engine research. The patent search usually returns more recent documents, however.
(Another rich source of research material is the bibliography and references of these research papers and patents. Sometimes I follow so many references that I lose track and focus, so be aware.)
Now, the real trick is how you go about reading this information—back to front. There’s no need to read the entire paper from start to finish to find something valuable. Check the title and if it is interesting enough, read the abstract. If the abstract is interesting then you can go straight to the conclusion and learn the most valuable information. Why? Because most papers concentrate on proving their conclusions, but since the authors have already taken great pains to do so, you don’t need to waste your time, too! Of course if you are curious about something you always have the option to go back to the explanation and see how the author came up with it.
Making Something Useful
Now, with my challenge identified, my research bearing valuable fruit, I just needed to put it all together to create a solution. As a technical guy, my job was now to figure out how to determine the crawl rate and update rate of the page. In reality, it didn’t take much research. Google stamps the cache pages with the time of the crawl/download and I was already familiar with the concept of “conditional HTTP GET”(this is explained in the article). It was just a matter of putting all the pieces together. I also included the indexing rate because it is useful in detecting duplicate content issues and hard-to-detect penalization problems, but in reality is not a requirement for the technique to work.
I hope you’ll read the full article to appreciate this new technique to evaluate links and pages.
Final Thoughts
Here’s a schematic to generalize the concepts explained here for your own challenges.
- Identify a problem/challenge and note the major roadblocks.
- Research and test alternative solutions to the problem. Each problem can be solved in many different ways. Some are better than others depending on what the goals and constraints are.
- Put all the pieces of your research and testing together by making logical connections. It is incredible how simple ideas can turn into great ones by combining the right pieces of information.
- Share and get peer feedback.
- Adjust your idea based on constructive criticism.
- Go back to step 1.
Arian
March 7, 2008 at 2:23 am
Hi Hamlet. it is always a pleasure to read your articles because you seem to be one of the few guys who actually think topics through before they post. So much for the flattery, I have a question about your article: To my humble mind, the cache date should be earlier than the index date, right? But if we look at this: Link Building Secrets by Hamlet Batista The world's top link building experts get together to provide a single source of never-before revealed link building tips, tricks and strategies. .../seo-sem/link-building-secrets/hamlet-batista.php - 24k - 1 Mar 2008 This is G o o g l e's cache of .../hamlet-batista.php as retrieved on 5 Mar 2008 13:03:53 GMT. So the cache date is actually later. But you said in your article: "The cache date is logically sooner." Am I not getting something? Cheers, Arian
Hamlet Batista
March 7, 2008 at 5:27 am
Hi Adrian, You are right. I thought "earlier" and "sooner" meant the same. I mean sooner in the sense that the cache date will be closer to the current time. Thanks for your kind words. I am glad you enjoyed the article.
Arian
March 7, 2008 at 5:41 am
Thanks for the clarification Hamlet So the date we see on the cached version of a page shows the time it was indexed, because obviously the time a page was indexed must always be later than the time is was crawled. The date that is shown in the SERP on the other hand is the date the page was crawled. Just wanted to make sure I get this right... Cheers
Hamlet Batista
March 7, 2008 at 6:00 am
No it is the reverse of what you are saying. Indexing happens after crawling but you normally see a more fresher cache because caching is easy and happens faster than indexing. Indexing is a more complex process and takes more time. From observation I can tell they use incremental crawling but I need to research a little bit to understand their criteria they use to make the updates.
Arian
March 7, 2008 at 6:25 am
I see, that makes sense. I'll be following your progress. Cheers
Gavin Mitchell
March 10, 2008 at 2:40 am
Nice article Hamlet (and in case anyone hasn't already done so, I recommend downloading the full PDF). I don't think I've seen any tools that analyse Crawl or Index rates to measure link value before. Another good signal to add into the mix - not least because it comes direct from Google rather than a 3rd party. There is definitely an over-reliance on the Yahoo backlinks query at present. No single measure is ever going to be perfect, but a tool that brings them all together would be a great time-saver.
MikeTek
March 10, 2008 at 5:34 am
This is a great post, Hamlet. I actually read it Friday night first and haven't had the chance to comment on it until now. I find your advice to use search engine patents to derive methods of developing SEO metrics simply invaluable. I learned quite a bit from this post - and found myself alive with new questions as to how we might use existing patent documentation in our research. In short, thank you very much for sharing your line of thinking here. I expect to have this post bookmarked for quite some time, and I will certainly be referring my colleagues to your blog in the future when they're looking for cutting edge SEO thinking. Cheers, Mike
Hamlet Batista
March 10, 2008 at 12:59 pm
Gavin - Thanks for your kind words. I agree that no single tool/process is enough. The more we add to our arsenal, the better. Mike - I am really glad you enjoyed the article.
Tin Pig
March 11, 2008 at 12:21 pm
Hamlet, Kudos on an excellent post. Given the current weight given to back-links, this is surely an excellent way to get a more accurate way to measure page / link value. I haven't read the full paper and forgive me if the answer to this question is provided there, but in your experience, how long do you need to collect data before getting an accurate measure of page value? Or does google maintain a cache history for a given page? Also, will you attempt to compute the value of a given individual link based on external out-bound links, etc.? Again, great post!
Jon Roberts
April 6, 2008 at 10:30 am
do you think that Yahoo backlink is a credible alternative to Google page rank? I agree that Google have messed with it quite a lot, but I don't agree that this has rendered it unreliable - just very difficult!
Tina
May 8, 2008 at 5:30 am
Dear Hamlet: Thank you for adding such a useful information in here. I personally agree to your ideas, techniques and the ways described by you in here. I feel this SEO Topic and the information is very useful both for the current legends in the market as well as specially for the new commers. We hope you will keep on sparing your time and add more useful informations appropriate and helpful for the users. This topic has really very much helped us. Thank You Kind regards, Tina