Learning Revolution
Wilhelm Sanke
sanke at hrz.uni-kassel.de
Wed Mar 9 16:48:44 EST 2005
On Tue, 08 Mar 2005, "J. Landman Gay" <jacque at hyperactivesw.com> wrote:
> >> Does anyone know which search algorithm is being used in
> >> RR? Searching seems sort of slow, leading me to believe that it isn't
> >> Boyer-Moore-Sunday.
>
> Well, there is searching and there is searching. The built-in "find"
> command does a reasonable job. It isn't lightening fast, but it is quick
> enough. I don't know what algorithm it uses. The old docs (before
> version 2.5) were all housed in stacks and used the native "find"
> command to search. It was fast.
>
> To implement a more flexible system, in version 2.5 the stack-based
> documentation was removed to XML files on disk, which are now loaded
> into a one-card shell. There is no built-in Transcript command to search
> files on disk, so it is being done by scripts. While Revolution is very
> quick at reading files from disk, the searching itself has to be done by
> marching through all the text and parsing the XML, and it is very slow.
>
> I think the team is aware of this.
>
> -- Jacqueline Landman Gay
Searching through XML files can actually be satisfactorily quick;
compare some benchmarks below.
On Feb 27 I had announced an update of my "Topsearch" tool on this list
(ANN: Update of "Topsearch" for XML files), which at least is an attempt
heading in the direction you are describing.
Screenshots can be viewed on my website
<http://www.sanke.org/MetaMedia/Screenshots.htm>
The update is already usable, but still needs some fine-tuning before
release
The "Dictionary"-, "Faq"-, "Topics"- and "Glossary"-folders can be
searched. The results are displayed in the field on the right with the
searchstring colored and the XML-file addresses inserted as links.
Clicking on such a file link displays the complete article in the left
field - again with the searchstring colored. If the article itself
contains links these are displayed for further reference.
As you can see from the screenshots, listing the (all the) whole lines
containing the searchstring gives you a better idea of the context and
provides more information about what could be found in the full article
than with just listing the filenames.
The full texts of the XML files are searched. Search times for the 1496
files of the Transcript Dictionary displaying the found lines ( + their
addresses and the colored seachstring) and the first complete relevant
article vary between 1 and 2 seconds. During the search the progress is
indicated by the scrollbar and the accumulated number of found lines and
files.
Several modes for searching are available:
- basic search: all words that equal the searchstring or contain the
searchstring are found, i.e. searchstring "background" would find all
instances of "background" and additionally such as "openbackground" or
"backgroundbehavior"
- searchstring + strings to the right: "background" and
"backgroundbehavior", but not "openbackground"
- searchstring + strings to the left
- only whole matches for the search: only lines and files containing
"background" are found
- searching for phrases like "date and time" etc..
Examples (search times on a WindowsXP computer with 2 GHz) and each time
searching the complete Dictionary with 1496 XML files
- basic search for "background" including additional strings on right
and left: 288 found lines in 135 found XML files - 1.7 seconds
- only "background" as whole matches: 46 lines in 24 XML files - 1 second
- "backgroundbehavior": 19 lines in 18 XML files - 1.1 seconds
- "custom properties": 49 lines in 22 XML files - 1.7 seconds.
Regards,
Wilhelm Sanke
<www.sanke.org/MetaMedia>
More information about the use-livecode
mailing list