searching list archive via Google gone wrong

Mark Wieder mwieder at ahsoftware.net
Sat Oct 21 23:34:21 EDT 2006


Sivakatirswami-

Saturday, October 21, 2006, 2:58:48 AM, you wrote:

> Can anyone confirm this perception?  I would love to be proved wrong: it
> goes  to strategy, a) depend on a Google search link on your site, for
> your  own site, b) set up you own search engine (HT DIG). c) pay Google
> the big  bucks to index your site completely.

> OK this is now *way* OT... if you have thoughts or hard information on
> this area... email me off list.

I'll put this on the list since, as you know, your mail server
software doesn't like me. <g>

This isn't OT: my ArchiveSearch plugin uses Google as one of the
options for backend searching of the list archives, and it's been
returning fewer and fewer hits. Switching the search preference to
Gmane or Nabble or Mail-archive does a better job.

There are areas where some search engines do a better job than others.
And while there's some overlap, you will also get hits on one engine
and not on another. And vice versa.

Another big problem for me with Google searches (at least as far as
listserv searches go) is that Google's interpretation of "within the
last three months" or a similar search is any page that has been
*updated* within that time period, no matter when it was originally
published. Thus searching for "externals" and "within the last three
months" on the runrev listserv using Google may turn up articles
posted two years ago, as long as the web page they reside on was
recently updated by the mailing list archival software.

On a semi-related topic (and moving somewhat OT) Alexa has started a
program to allow you to create your own search engines that run on
their servers. You can program them to do whatever you want: return
data more or less filtered than the major engines; create specialized
filters of your own; search for data posted within the last six hours;
aggregate data from multiple searches; etc. And very reasonably
priced, as well. http://websearch.alexa.com

-- 
-Mark Wieder
 mwieder at ahsoftware.net




More information about the use-livecode mailing list