Re: robots.txt files
by George Bray <listoid(at)linkalarm.com>
|
Date: |
Mon, 14 Feb 2000 11:11:47 +1100 |
To: |
"Rajnish Bhaskar" <9705228b(at)student.gla.ac.uk>, hwg-basics(at)mail.hwg.org |
References: |
ac |
|
todo: View
Thread,
Original
|
|
Rajnish,
No, the robots.txt exclusion protocol
<http://info.webcrawler.com/mak/projects/robots/exclusion.html>
does not use the UNIX path name. Rather, it uses directory names
from the top of your web server.
So with your site on
http://www.gla.ac.uk/Clubs/WebSoc/~9705228b/
you'd need to ask the administrator of the server to modify the robots.txt file
http://www.gla.ac.uk/robots.txt
to add a line like
Disallow: /Clubs/WebSoc/~9705228b/
to stop robots trawling your site.
Adding
Disallow: /Clubs
will work for all directories in that directory, including yours
Note that the case is important. (They're disallowing /clubs but that
won't work on the URL above.)
cheers - George
At 3:48 PM +0000 12/2/00, Rajnish Bhaskar wrote:
>Hi all,
> I was just wondering about robots.txt files. The path that is given in
>them in the form:
>disallow: blah
>
>Is this a normal Unix path name? And if so, does this mean that I can
>use the '~' symbol to represent my home dir (which I presume would be
>the public_html dir)? I know that some apps don't allow you to use the
>tilde in this way, but what about the robots.txt?
>
>Thanks,
>Raj.
>------------------------------------------------------------------
>__ __
>| | | | Rajnish Bhaskar, University of Glasgow
>|_ / |_ / E-Mail: rajy(at)i.am
>| \ | \ Home Page: http://i.am/rajy
>| \ |__/
>Um, I tend to either agree or disagree with you
>-- Kev - to Nick Abbot on Talk Radio
> 22nd September 1998
--
George Bray - LinkAlarm - Web Site Quality Assurance
Web: http://linkalarm.com Email: mailto:george.bray(at)linkalarm.com
HTML Guild: hwg-basics mailing list archives,
maintained by Web Professionals @ IWA