Re: searching/indexing pdf files from a web page

by Eric Frazier <ef(at)kwinternet.com>

 Date:  Tue, 23 Jul 2002 18:52:01 -0400
 To:  Hank Marquardt <hmarq(at)yerpso.net>
 Cc:  Ann Ezzell <amcbainezzell(at)alum.mit.edu>, "'Hank Marquardt'" <hmarq(at)yerpso.net>, "'Johnson, Mark'" <JohnsonM(at)issaquah.wednet.edu>, hwg-techniques(at)mail.hwg.org
  todo: View Thread, Original
Hi,

This stuff is easy. 

Take a look at http://www.kscripts.com and http://www.foolabs.com/xpdf/

Ksearch makes use of xpdf,  I have searched far and wide and nothing is as
easy to use as xpdf, even if you don't want to use Ksearch, they just were
smart enough to find xpdf and make good use of it. Follow the install
instructions for Ksearch, and life will be good. 


Eric 


At 08:34 PM 7/23/02 -0500, Hank Marquardt wrote:
>Good to know there is something out there ... but it seems to be MS only
>which limits it's utility some for me anyway.   I guess you could always
>use a windows box to do the indexing and then use the index on whatever
>platform you need ... 
>
>On Tue, Jul 23, 2002 at 06:15:38PM -0700, Ann Ezzell wrote:
>> 
>> As I pointed out to Mark off-list, there's an iFilter for PDFs that you
>> can install for Index Server / Indexing Services. Works like a charm.
>> 
>> To see an example of this in action, go here:
>> 
>> http://www.gpworldwide.com/_sitesearch/default.asp
>> 
>> Search for biosparging.
>> 
>> You should get 3 results, two of which are PDFs.
>> 
>> 
>> > -----Original Message-----
>> > From: owner-hwg-techniques(at)hwg.org 
>> > [mailto:owner-hwg-techniques(at)hwg.org] On Behalf Of Hank Marquardt
>> > Sent: Tuesday, July 23, 2002 6:00 PM
>> > To: Johnson, Mark
>> > Cc: hwg-techniques(at)mail.hwg.org
>> > Subject: Re: searching/indexing pdf files from a web page
>> > 
>> > 
>> > I guess this will depend on your definition of 'search' ... 
>> > so you just
>> > mean the title(filename)?, perhaps a database of info 
>> > associated with the file, -or- do you mean search the 
>> > contents of the pdf file itself?
>> > 
>> > That last one doesn't seem plausible ... a quick google search didn't
>> > turn up anything useful, and running 'strings' on a couple 
>> > pdf files on
>> > my machine yeilded nothing useful ... 
>> > 
>> > If it's just the filename or you have something in text form 
>> > that can be
>> > searched, you can do this with any server side language you choose.
>> > 
>> > Need more detail of what there is to work with.
>> > 
>> > 
>> > 
>> > On Wed, Jul 24, 2002 at 12:33:47AM +0100, Johnson, Mark wrote:
>> > > I have an archive of 500 PDF documents.  I need to create a 
>> > web page that
>> > > searches the documents and creates an index of the results. 
>> >  What's the best
>> > > way to do this?
>> > > 
>> > > Thanks,  Mark
>> > 
>> > -- 
>> > Hank Marquardt <hank(at)yerpso.net>
>> > http://web.yerpso.net
>> > GPG Id: 2BB5E60C
>> > Fingerprint: D807 61BC FD18 370A AC1D  3EDF 2BF9 8A2D 2BB5 E60C
>> > *** Web Development: PHP, MySQL/PgSQL - Network Admin: Debian/FreeBSD
>> > *** PHP Instructor - Intnl. Webmasters Assn./HTML Writers Guild 
>> > *** Beginning PHP && PHP II -- Starting March 25, 2002 
>> > *** See http://www.hwg.org/services/classes
>> > 
>> 
>
>-- 
>Hank Marquardt <hank(at)yerpso.net>
>http://web.yerpso.net
>GPG Id: 2BB5E60C
>Fingerprint: D807 61BC FD18 370A AC1D  3EDF 2BF9 8A2D 2BB5 E60C
>*** Web Development: PHP, MySQL/PgSQL - Network Admin: Debian/FreeBSD
>*** PHP Instructor - Intnl. Webmasters Assn./HTML Writers Guild 
>*** Beginning PHP && PHP II -- Starting March 25, 2002 
>*** See http://www.hwg.org/services/classes
>

http://www.kwinternet.com/eric
(250) 655 - 9513 (PST Time Zone)

"Inquiry is fatal to certainty." -- Will Durant 

HWG hwg-techniques mailing list archives, maintained by Webmasters @ IWA