Re: Stripping content from other sites using "socket" connections

by "Octavian Rasnita" <orasnita(at)home.ro>

 Date:  Thu, 12 Dec 2002 15:08:24 +0200
 To:  "Mike Taylor" <lonewolf(at)one.net>,
<hwg-techniques(at)hwg.org>
 References:  oemcomputer
  todo: View Thread, Original
I haven't visited that page and I don't know what method they use, but this
is very easy with Perl.

Of course, you're right, if the  structure of that site changes, the program
needs to be a little modified, but not always.

If you want to do this with perl, you could use the LWP module to get the
page, analyse it using the HTML module and regular expressions, replace and
add some links, etc, and print it to the browser.

If you see that in an html or .shtml file and not a CGI one, this means that
perhaps that server is using server side includes and that page includes
that cgi program that does the job.

If you don't want to use the LWP module, you could use IO::Socket instead.

Of course these if you are using perl, but I guess the job can be also done
with PHP or asp, or ... cgi programs made in C, Java, etc.


Teddy,
Teddy's Center: http://teddy.fcc.ro/
Email: orasnita(at)home.ro

----- Original Message -----
From: "Mike Taylor" <lonewolf(at)one.net>
To: <hwg-techniques(at)hwg.org>
Sent: Thursday, December 12, 2002 1:35 PM
Subject: Stripping content from other sites using "socket" connections


I've found an increasing number of sites using some sort of "socket"
connection to query a third-party site, wait for its response, and then grab
the resulting HTML and integrate it seamlessly into the look and feel of
their own website to give one the impression that the data actually came
from them.  This is accomplished, from what I understand, without the use of
XML technology --they are quite literally sending the query off to the third
party site, then the third party site responds with the rendered HTML on the
backend, invisible to the user.

Is anyone directly familiar with this and could elaborate on how it works?

An example of this can be found here:
http://www.gonow.com/00_options_tools.html?menu2=o3

The calendar portion is coming from an entirely different site, but they've
been able to grab the HTML, manipulate it, and put it on this page
real-time.  So if the calendar were updated on the third-party site, the
results of those changes would still show up in the link above.

My guess is that this probably works beautifully unless the originating site
decided to suddenly change their own layout, but I'm still curious what type
of tools are needed to accomplish it.

Thanks,
Mike

HWG hwg-techniques mailing list archives, maintained by Webmasters @ IWA