use-livecode Digest, Vol 250, Issue 1
Neville Smythe
neville.smythe at optusnet.com.au
Tue Jul 2 21:05:54 EDT 2024
Thank Dick
I will try your idea - it may need a small mod, see below. I will probably post some timings later if people are still interested in this thread.
Mark’s method of converting to an array of lines to gain random access to the lines has given me at least a 50-fold increase in speed in many of my handlers (perhaps more - my analysis may have been somewhat flawed but I still have the gut feeling LineOffset -and finding line k of text- is at 50 to 100 times slower on unicode than on ascii). So this is the method I have implemented; re-factoring took a few hours but was well worth it]
However there is a drawback in using arrays rather than the original text - I often need to search the text, to find the first (or next) line containing a string, and there is no built-in arrayOffset. Binary (logarithmic) search is an extremely fast search algorithm for searching the elements of an array IF the array is pre-sorted appropriately for the search item beforehand, but that doesn’t suit my use-case at all. Sorting the keys of an array according to the contents of elements is another story (combine by return, sort, split by return? Split is OK as a once-off , but it gets expensive if it needs to be done multiple times - splitting 1700 lines of sample text took about 0.1 seconds).
There is however
filter elements of <array> with <str> into tLines; put line 1 of the keys of tLines into tFoundLine
And that is still very fast on unicode, even though it will find all the lines matching str, not just the first. This can be an advantage or not. [Note <str> does need to be set up to match a whole line]
I have a LineSearch algorithm alternative for lineOffset which searches for the first occurrence of a targetString in unicode text which avoids finding line endings and which is faster than using the filter method on arrays (particularly if the overhead of converting the text to an array with split by return is added). It uses the fact that matchChunk is still exceedingly fast on unicode (did someone complain that matchChunk appeared to be slow on unicode?? Who was that foolish boy? Wasn’t me Sir!). Slight drawback is you have to escape all the special regex characters in the target string before using the regex
"(?m)(?i)^(.*?" & pTargetStr & ".*?)$”
but replace is very fast, so this is messy but not a big deal. Big drawback is it gives the text of the line but not the line number, and so the algorithm is not adaptable to skipping lines; your version may answer that but looks like it needs the repeat loop which looks expensive. Hmm, wait, aren’t you looping on “line tLine in tLinesToSearch” which implicitly involves finding the line delimiters in the unicode text, which is back to the original problem?
Neville
> On 3 Jul 2024, at 2:00 am, use-livecode-request at lists.runrev.com wrote:
>
> Send use-livecode mailing list submissions to
> use-livecode at lists.runrev.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.runrev.com/mailman/listinfo/use-livecode
> or, via email, send a message with subject or body 'help' to
> use-livecode-request at lists.runrev.com
>
> You can reach the person managing the list at
> use-livecode-owner at lists.runrev.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of use-livecode digest..."
>
>
> you can find the archives for this list at:
>
> http://lists.runrev.com/pipermail/use-livecode/
>
> and search them using this link:
>
> https://www.mail-archive.com/use-livecode@lists.runrev.com/
>
>
> Today's Topics:
>
> 1. Re: Slow stack problem (Dick Kriesel)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 2 Jul 2024 01:31:13 -0700
> From: Dick Kriesel <dick.kriesel at mail.com>
> To: How to use LiveCode <use-livecode at lists.runrev.com>
> Subject: Re: Slow stack problem
> Message-ID: <07280E11-C947-41AF-9E81-0668454757C1 at mail.com>
> Content-Type: text/plain; charset=utf-8
>
>
>
>> On Jun 28, 2024, at 3:15?AM, Neville Smythe via use-livecode <use-livecode at lists.runrev.com> wrote:
>>
>> I have a solution or at least a workaround
>
> Hi, Neville. You may find a worthwhile improvement in speed if you avoid referring to the Unicode lines by their line numbers (as in "line k of fff").
>
> Here's a way:
>
> function findLineNumbersInUnicode pLinesToFind, tLinesToSearch -- returns a comma-delimited list of the line numbers of lines that contain any of the lines to find
>
> local tRegExp, tLineNumber, tLineNumbers
>
> repeat for each line tLineToFind in pLinesToFind
>
> put "(^[0-9-]*\t" & tLineToFind & ")" into tRegExp
>
> put 0 into tLineNumber
>
> repeat for each line tLine in tLinesToSearch
>
> add 1 to tLineNumber
>
> if matchChunk(tLine, tRegExp) then
>
> put tLineNumber & comma after tLineNumbers
>
> end if
>
> end repeat
>
> end repeat
>
> return char 1 to -2 of tLineNumbers
>
> end findLineNumbersInUnicode
>
>
>
> If you try the idea, please share your test results.
>
> ? Dick Kriesel
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> ------------------------------
>
> End of use-livecode Digest, Vol 250, Issue 1
> ********************************************
Neville Smythe
neville.smythe at optusnet.com.au
0414517719
More information about the use-livecode
mailing list