RE: Disallowing files for robots

by Roxy <4Roxy(at)autumnweb.com>

 Date:  Sun, 26 Nov 2000 22:59:30 -0500
 To:  hwg-techniques(at)hwg.org
 References:  hotmail
  todo: View Thread, Original
At 08:58 PM 11/26/00 , Bob wrote:
>  Here is the correct robots .txt for the search engines.
>
><META NAME="Robot" CONTENT="NOINDEX">
><META NAME="Robot" CONTENT="NOFOLLOW">
>
>bob

Note, that's what can be put on a web page, in the meta tags. Some SEs 
don't "obey" this line. My stats from my host, and my log files show the 
amount of times the robots.txt file is accessed. I can see that dozens of 
robots check that file weekly, and most will obey that.

http://info.webcrawler.com/mak/projects/robots/exclusion-user.html
and
http://info.webcrawler.com/mak/projects/robots/exclusion-admin.html

There's LOTS of information about meta tags at 
http://www.searchenginewatch.com/ and they also cover the robots.txt file.

Many SEs have a "help" section, where they explain how to stop a SE spider 
from accessing certain files. Example, Alta Vista has a whole tutorial 
section for building web pages. They specifically state that they obey the 
"Robot Exclusion Standard,"

http://doc.altavista.com/adv_search/ast_haw_avoiding.html

and if you follow enough links, you'll find the information at

http://info.webcrawler.com/mak/projects/robots/faq.html#prevent

I hope that helps,
Roxanne
Not just putting your business on the Web
Promoting your business on the Web!
Autumn Web ~ http://autumnweb.com/
design / development / promotion / search engine optimization
+ tutorials, web page help, free graphics for personal sites.
* -- * -- * -- * -- * -- * -- * -- * -- * -- * -- *

HWG hwg-techniques mailing list archives, maintained by Webmasters @ IWA