robots.txt File

 

ATTENTION: THIS PAGE IS Valid HTML 5 AND IS BEST VIEWED WITH HTML 5 - Please upgrade your browser or download one of the HTML 5 compatible browsers such as Mozilla Firefox, Chrome, Opera or IE 9 (March 14, 2011 or later). For more information see HTML 5 browsers.


If you find this helpful, please click the Google +1 Button to the left, if it is white, to make it turn blue or red. Thank you! (It also helps find this page again more easily.)


PDF mobile

robots.txt File

You can use robots.txt to disallow search engine crawling of specific directories or pages on your web site.

Patterns in the robots.txt file are matched against the request URI, which starts with a slash ("/"). Therefore, exclusions should also start with a "/":

User-agent: *
Disallow: /old/

The trailing slash ("/") should also be included on directories in order to avoid matches with the prefix of other names. For example, a URI such as "/oldies.html" does not match the pattern in the example above.

The importance of robots.txt should not be underestimated. For example, we learned from experience that Google Search favored the printer-friendly PDF versions of the pages on this site over the HTML documents in its search engine results, so it was important to disallow robots from indexing PDF files:

User-agent: *
Disallow: /old/
Disallow: *.pdf

Some robots may not recognize patterns with wildcards such as "*", so those exclusions should appear last in a User-agent group.

Alternatives to the robots.txt file include:

  • the <meta name="robots"/> tag
  • the rel="nofollow" attribute, which doesn't necessarily prevent a search engine from indexing a page but keeps it from discovering the page via the link on which the attribute appears

For more information, see the Search Engine Optimization (SEO) Tutorial.


Valid HTML 5