AW: Re: Regex help needed...
Paul Dupuis
paul at researchware.com
Sat Jan 30 19:28:22 EST 2016
Wow. I would not have expected such a significant difference. Regex has
been around a long time and lots of smart computer science types has
spent time coming up with ways to optimize its performance for pattern
matching. I assumed (falsely) that regex based filters in LC would be on
par or even superior than a custom function using chunks. This leads me to:
1) wondering if LC's hooks to whatever regex tool they are using under
the hood is a good as it should be
AND
2) planning on rewriting my code to use chunks.
Thanks for the post.
On 1/30/2016 6:45 PM, Richard Gaskin wrote:
> Regex is wonderfully compact to write relative to equivalent routines
> using chunk expressions, but sometimes paid for in execution time.
>
> When I come across a good regex example like the one you provided, if
> I have a moment I like to test things out to see where regex is faster
> and where it isn't. It's really great for many things, but carries
> quite a bit of overhead.
>
> Of course for this test to be relevant it assumes that most of the
> specifiers in the regex expression are merely to identify the elements
> you're looking for, and that the data is expected to fit the
> definition you provided.
>
> Given that, it's possible to make the regex a bit simpler (see foo2
> below), but only with a modest boost to performance. It can probably
> be simplified more, but the chunk-based alternative performed so well
> I didn't bother exploring the regex side any further.
>
> Writing a lengthier handler that uses chunk expressions seems to yield
> the same results you reported, running between 12 and 60 times faster
> (depending on the percentage of lines tested that match the criteria
> being looked for).
>
> For one-offs like validating email addresses regex can be an excellent
> fit, and even some larger tasks depending on the specifics.
>
> But for iterating across lists I've often been delightfully surprised
> by LiveCode's gracefully efficient chunk handling.
>
> Testing your original data replicated to become 250 lines long, and
> looking for page 1 among them, the script below yields:
>
> Regex: 9261 ms
> RegexLite: 7958 ms
> Chunks: 197 ms
> Chunks faster than orig regex by: 47.01 times
> Chunks faster than lite regex by: 40.4 times
> Same result? true
>
>
> on mouseUp
> put fld 1 into tList
> put 1 into tPage --< change this for different tests
> put 1000 into n
> --
> -- Test 1: original regex
> put the millisecs into t
> repeat n
> put foo1(tPage, tList) into r1
> end repeat
> put the millisecs - t into t1
> --
> -- Test 2: lighter regex
> put the millisecs into t
> repeat n
> put foo2(tPage, tList) into r2
> end repeat
> put the millisecs - t into t2
> --
> -- Test 3: chunks
> put the millisecs into t
> repeat n
> put foo3(tPage, tList) into r3
> end repeat
> put the millisecs - t into t3
> --
> -- Display results:
> set the numberformat to "0.##"
> put "Regex: "&t1 &" ms"&cr \
> &"RegexLite: "&t2 &" ms"&cr \
> &"Chunks: "& t3 &" ms"&cr \
> &"Chunks faster than orig regex by: "&(t1 / t3)&" times" &cr \
> &"Chunks faster than lite regex by: "&(t2 / t3)&" times" &cr \
> &"Same result? "& (r1=r3) &cr&cr& r1 &cr&cr& r3
> end mouseUp
>
>
> function foo1 pPage, tList
> put
> "(.+\t"&pPage&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&pPage&",\d+)|(.+\t"&pPage&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)"
> into tMatchPattern
> filter lines of tList with regex pattern tMatchPattern
> return tList
> end foo1
>
>
> function foo2 pPage, tList
> put "(.+\t"&pPage&",*)|(.+\t\d+,\d+,"&pPage&",*)|(.+\t"&pPage&",*)"
> into tMatchPattern
> filter lines of tList with regex pattern tMatchPattern
> return tList
> end foo2
>
>
>
> function foo3 pPage, tList
> repeat for each line tLine in tList
> set the itemdel to tab
> put item 3 of tLine into t1
> put pPage &"," into tPageMarker
> if "." is in t1 then
> if (t1 begins with tPageMarker) then
> put tLine &cr after tNuList
> end if
> else
> if ( t1 begins with tPageMarker) OR (item 4 of tLine begins with
> tPageMarker) then
> put tLine &cr after tNuList
> end if
> end if
> end repeat
> delete last char of tNuList
> return tNuList
> end foo3
>
>
>
>
>
>
>
>
>
>
More information about the use-livecode
mailing list