Learning Revolution

Wilhelm Sanke sanke at hrz.uni-kassel.de
Wed Mar 9 16:48:44 EST 2005


On Tue, 08 Mar 2005, "J. Landman Gay" <jacque at hyperactivesw.com> wrote:

> >>  Does anyone know which search algorithm is being used in
> >> RR?  Searching seems sort of slow, leading me to believe that it isn't
> >> Boyer-Moore-Sunday.
>
> Well, there is searching and there is searching.  The built-in "find"
> command does a reasonable job. It isn't lightening fast, but it is quick
> enough. I don't know what algorithm it uses. The old docs (before
> version 2.5) were all housed in stacks and used the native "find"
> command to search. It was fast.
>
> To implement a more flexible system, in version 2.5 the stack-based
> documentation was removed to XML files on disk, which are now loaded
> into a one-card shell. There is no built-in Transcript command to search
> files on disk, so it is being done by scripts. While Revolution is very
> quick at reading files from disk, the searching itself has to be done by
> marching through all the text and parsing the XML, and it is very slow.
>
> I think the team is aware of this.
>
> -- Jacqueline Landman Gay



Searching through XML files can actually be satisfactorily quick; 
compare some benchmarks below.

On Feb 27 I had announced an update of my "Topsearch" tool on this list 
(ANN: Update of "Topsearch" for XML files), which at least is an attempt 
heading in the direction you are describing.

Screenshots can be viewed on my website 
<http://www.sanke.org/MetaMedia/Screenshots.htm>

The update is already usable, but still needs some fine-tuning before 
release

The "Dictionary"-, "Faq"-, "Topics"- and "Glossary"-folders can be 
searched. The results are displayed in the field on the right with the 
searchstring colored and the XML-file addresses inserted as links. 
Clicking on such a file link displays the complete article in the left 
field - again with the searchstring colored. If the article itself 
contains links these are displayed for further reference.
As you can see from the screenshots, listing the (all the) whole lines 
containing the searchstring gives you a better idea of the context and 
provides more information about what could be found in the full article 
than with just listing the filenames.

The full texts of the XML files are searched. Search times for the 1496 
files of the Transcript Dictionary  displaying the found lines ( + their 
addresses and the colored seachstring) and the first complete relevant 
article vary between 1 and 2 seconds. During the search the progress is 
indicated by the scrollbar and the accumulated number of found lines and 
files.

Several modes for searching are available:

- basic search: all words that equal the searchstring or contain the 
searchstring are found, i.e. searchstring "background" would find all 
instances of "background" and additionally such as "openbackground" or 
"backgroundbehavior"

- searchstring +  strings to the right: "background" and 
"backgroundbehavior", but not "openbackground"

- searchstring +  strings to the left

- only whole matches for the search: only lines and files containing 
"background" are found

- searching for phrases like "date and time" etc..

Examples (search times on a WindowsXP computer with 2 GHz) and each time 
searching the complete Dictionary with 1496 XML files

- basic search for "background" including additional strings on right 
and left: 288 found lines in 135 found  XML files - 1.7 seconds

- only "background" as whole matches: 46 lines in 24 XML files - 1 second

- "backgroundbehavior": 19 lines in 18 XML files - 1.1 seconds

- "custom properties": 49 lines in 22 XML files - 1.7 seconds.

Regards,

Wilhelm Sanke
<www.sanke.org/MetaMedia>



More information about the use-livecode mailing list