Usually I don’t cover basic material in this blog, but as a loyal reader, Paul Montwill, requested it, I’m happy to oblige. As I learned back in school, if one person asks a question, there are probably many others at the back of the class quietly wondering the same thing. So here is a brief explanation of web server redirects and their use to solve URL canonicalization issues.
And just what is that ecclesiastic-sounding word “canonicalization”? It was Matt Cutts and not the Pope that made it famous, as he used the nomenclature to describe a certain issue that popped up at Google. Here is the problem. All of us have these URLs:
1) sitename.com/
2) sitename.com/index.html
3) www.sitename.com
4) www.sitename.com/index.html
You know they are all the same page. I know they are all the same page. But computers — unfortunately, they aren't on the same page. They aren’t that smart and need to be told that each one of these addresses represents the same page. One way is for you to pick one of them and use it consistently in all your linking. The harder part, however, is getting other website owners linking to you to do the same. Some might use one, others another, and a few are bound to choose a third.
The best way to solve this is to pick one URL and have your web server automatically force all requests for other variations to go to the one you picked. We can use HTTP redirects to accomplish this.
HTTP redirects are simply web server response codes of this form (this is how it looks to the web browser):
HTTP 30x http://anotherurl.com
The number 30x is a status code from 300–307. The most commonly used are 301 and 302. (For a more complete description of each of the status codes, please read the HTTP Request For Comments (RFC2616), section 10.) We only need to use 301, which is the permanent redirect. This status code tells the crawler that the new address for the currently requested page is the one in the message. For example, you may want http://sitename.com to be your canonical page (like I do for my blog). If a visitor types http://www.sitename.com you want the web server to send back HTTP 301 http://sitename.com so that the crawler 'understands' that this is the proper, canonical page.
How do we do that?
There are two ways we can accomplish this with Apache — a basic one and an advanced one. Keep in mind that the basic one does not help with www vs non-www issues, though. It involves using the mod_alias module and directives: Redirect, RedirectPermanent or RedirectMatch.
In your .htaccess file, add one of these:
Redirect 301 /index.html http://sitename.com/
RedirectPermanent /index.html http://sitename.com/
RedirectMatch 301 /(.*)\.html http://sitename.com/$1.html
The more advanced one, which I recommend, is the one that I use. It involves changing the mod_rewrite module. Here is what my Apache configuration looks like:
# URL Rewriting
RewriteEngine on
RewriteLog logs/rewrite.log
RewriteLogLevel 0
RewriteCond %{HTTP_HOST} ^www\.hamletbatista\.com [NC]
RewriteRule ^/(.*) http://hamletbatista.com/$1 [R=301,L]
As you have probably noticed, I prefer http://hamletbatista.com. If I wanted http://www.hamletbatista.com/ instead, I would rewrite it this way:
RewriteCond %{HTTP_HOST} ^hamletbatista\.com [NC]
RewriteRule ^/(.*) http://www.hamletbatista.com/$1 [R=301,L]
If it was a regular website and not a blog, I'd add this line too.
RewriteRule ^/index.html http://hamletbatista.com/ [R=301,L]
As always, when you begin playing with files like these, it’s a good idea to check the Apache documentation for more details. It may not be the Bible, but for canonicalization issues, it’s as good as gospel.
Hamlet Batista
July 19, 2007 at 8:24 am
Mutiny, There is another post coming shortly that will answer your question. Sorry about the image, my designer was not in in time to include one.
Jason
July 19, 2007 at 10:19 am
Mod_Rewrite is certianly the way to go, and will offer the most flexibility when solving problems like this. However, for WordPress users that might not want to edit their .htaccess file or update the mod_rewrite module can get away with using <a href="http://www.justinshattuck.com/wordpress-www-redirect-plugin/" rel="nofollow">Justin Shattuck's WWW Redirect Plugin</a>. It's dead simple and can solve this problem with just a few clicks.
Hamlet Batista
July 19, 2007 at 5:48 pm
Jason - Thanks for the link. I will check that plugin out
Paul Montwill
July 25, 2007 at 12:27 am
Thanks for the post, Hamlet. I like your theory about one person asking a question. Usually I was the one in the class that kept asking :)
Hamlet Batista
July 26, 2007 at 5:33 pm
Good to know that you are not afraid to ask ;-)
How Google handles 301 Permanent Redirects | The Other Bloke's Blog
March 3, 2008 at 2:02 pm
[...] - a High Ranking Forums discussion 301, Parking and Other Redirects for SEOs ( FAQ ) - Ian McAnerin Canonicalization: The Gospel of HTTP 301 - Hamlet Batista writes on a related problem 301 Permanent Redirect to Error404.htm Page is a [...]