The Robots Exclusion Protocol: What is it, why you want it and how to do it
As a site owner you actually have some control over how search engines access and index your website. You can exclude pages from Google's crawler using a robots.txt file. A robots.txt file contains a list of pages from your site that you don’t want search engines to access. A typical robots.txt file that tells the spider to index your entire site looks like this:
User-agent: *
Disallow:
There are many different reasons this may come in handy to site owners. By laying out a few rules in this text file, you can tell robots not to crawl and index entire directories within your site, single pages, images or nothing at all. To create a robots.txt file, simply create a regular text file, name it "robots.txt" and place it in the root directory. A robots.txt file that tells crawlers to exclude your feedback forms may look something like this:
User-Agent: *
Disallow: /feedback/
The User-Agent line specifies which crawler your instructions are for while the Disallow line specifies which parts of your site you are disallowing it from crawling. The * indicates all crawlers, or you can make different specifications for different robots.
User-agent: *
Disallow:
There are many different reasons this may come in handy to site owners. By laying out a few rules in this text file, you can tell robots not to crawl and index entire directories within your site, single pages, images or nothing at all. To create a robots.txt file, simply create a regular text file, name it "robots.txt" and place it in the root directory. A robots.txt file that tells crawlers to exclude your feedback forms may look something like this:
User-Agent: *
Disallow: /feedback/
The User-Agent line specifies which crawler your instructions are for while the Disallow line specifies which parts of your site you are disallowing it from crawling. The * indicates all crawlers, or you can make different specifications for different robots.






0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home