Re: robots.txt files

by George Bray <listoid(at)linkalarm.com>

 Date:  Mon, 14 Feb 2000 11:11:47 +1100
 To:  "Rajnish Bhaskar" <9705228b(at)student.gla.ac.uk>,
hwg-basics(at)mail.hwg.org
 References:  ac
  todo: View Thread, Original


Rajnish,

No, the robots.txt exclusion protocol 
<http://info.webcrawler.com/mak/projects/robots/exclusion.html>

does not use the UNIX path name.  Rather, it uses directory names 
from the top of your web server.

So with your site on
http://www.gla.ac.uk/Clubs/WebSoc/~9705228b/

you'd need to ask the administrator of the server to modify the robots.txt file

http://www.gla.ac.uk/robots.txt

to add a line like

Disallow: /Clubs/WebSoc/~9705228b/

to stop robots trawling your site.

Adding
Disallow: /Clubs
will work for all directories in that directory, including yours

Note that the case is important. (They're disallowing /clubs but that 
won't work on the URL above.)

	cheers - George




At 3:48 PM +0000 12/2/00, Rajnish Bhaskar wrote:
>Hi all,
>	I was just wondering about robots.txt files.  The path that is given in
>them in the form:
>disallow: blah
>
>Is this a normal Unix path name?  And if so, does this mean that I can
>use the '~' symbol to represent my home dir (which I presume would be
>the public_html dir)?  I know that some apps don't allow you to use the
>tilde in this way, but what about the robots.txt?
>
>Thanks,
>Raj.
>------------------------------------------------------------------
>__        __
>|   |	|   |   Rajnish Bhaskar, University of Glasgow
>|_ / 	|_ /   E-Mail: rajy(at)i.am
>|   \	|   \   Home Page: http://i.am/rajy
>|    \	|__/
>Um, I tend to either agree or disagree with you
>-- Kev - to Nick Abbot on Talk Radio
>    22nd September 1998

--
George Bray - LinkAlarm - Web Site Quality Assurance
Web: http://linkalarm.com  Email: mailto:george.bray(at)linkalarm.com

HTML Guild: hwg-basics mailing list archives, maintained by Web Professionals @ IWA