Re: search

by "Srinivasan Ramakrishnan" <srinivar(at)md3.vsnl.net.in>

 Date:  Wed, 15 Nov 2000 22:08:11 +0530
 To:  "Ed Sims" <edsims(at)htcomp.net>,
"Kathy" <phaz(at)phaz.net>
 Cc:  "Shull, Conrad" <cshull(at)shscares.org>, <hwg-languages(at)hwg.org>
 References:  ntmail1 kathy htcomp
  todo: View Thread, Original
Hi,

Though I haven't implemented Microsoft Index Server on a real project I know
that it comes with IIS and is fairly easy to configure, (I've done it
myself, now how difficult can that be?) and is a good deal better than the
Perl* scripts out there that claim to be search engines.

Search engine technology can be really complex. You start off with building
an index and then filtering noise words, while all the time polling the site
for new additions, and it isn't half as easy as it sounds. [Plus I left out
a few steps in the process, I'm not writing a primer on search tech. here!]

A perl* script on the other hand, and I'm talking about the scripts that are
typically available for free and don't do much

a) Can do a dumb scan of all documents for every search, or
b) Maintain a text file (a DB table if you are lucky) to store [word - URL]
pairs

IMHO it is not advisable even in the short term to use a script that you
know cannot scale, and will eventually have to be replaced.

If you have access to the server:
On NT/W2K:    Use MS Index Server
On *IX:       HotBot ? (I don't really remember it's a fairly popular SE and
has a red logo) offers a installable SE which uses the same technology as
the Search Engine itself. Correct me if it's a different SE.


If you don't have access to the server:
Use a service provider like atomz.com, I really find that service useful. If
you have more than the 500 pages, which is the limit for atomz as well as
most similar service providers, consider taking up their service plan. It is
a good investment, and the TCO may be cheaper than hiring a Guru to run your
own indexing search engine.


If you are really hard up for cash:
{Which means you can't hire a webserver which allows executable access
and neither can you pay for a commercial search service!!}

Use Google.........  seriously it's not a bad choice. You have to live with
the fact that the results won't be on your site, but atleast Google doesn't
carry ads (as of now).

Here's how you use Google:
a) First have Google index your site
b) Have all searches redirected as "searchterm site:yoursite.com"
The "site:yoursite.com" bit tells Google to restrict the search to this
domain only.
I also believe Google has a service plan for small site owners. I'm not sure
if you have to pay for it, so you may also try that.

It's easy to use JS to append the "site:yoursite.com" bit to all the search
queries and thus have a best of breed SE work for you at no cost.

Just my 2 bits. Ofcourse there are still more ways in which you can search
your
site for free including a manual index, but I don't think such ideas are
practical.

Perl*::
Such scripts are usually in Perl (replace Perl with a scripting lang. of
your choice), and I don't mean to start a flame war of any sort here. If you
have had good results by using a search script, by all means feel free to
continue using it. Nothing is the best for every scenario, but planning for
the big day from day one can mean lesser hassles later.

I know of scripts like the one's from Matt's script archive making
their way into many a website, since what 1996?

-Srini


----- Original Message -----
From: Ed Sims <edsims(at)htcomp.net>
To: Kathy <phaz(at)phaz.net>
Cc: Shull, Conrad <cshull(at)shscares.org>; <hwg-languages(at)hwg.org>
Sent: Wednesday, November 15, 2000 8:47 AM
Subject: Re: search


> Kathy,
>
>     If you can host on a windows server here is a solution that is free
and
> works. I know you said the site was on a Unix server but if you can run
this
> from an IIS server you can index every file on the Unix box for search.
> http://www.sharewire.com/ I use it and it works without a flaw.
>
> Ed
>
> Kathy wrote:
>
> > I have a site with over 12,000 html pages that I would like to add a
search
> > engine to Which would be best? CGI, mySQL or...? It is on a UNIX hosted
> > server.
> >
> > I am looking for something with a low learning curve.
> > If I am off topic I apologize.
>
>

HWG: hwg-languages mailing list archives, maintained by Webmasters @ IWA