perl regex modifiers

Mark Brownell gizmotron at earthlink.net
Sat Jul 26 13:04:01 EDT 2003


Hi Dar,

This will be a fun one to respond to. I never get to discuss my 
invention, except to users that are mostly never interested.

On Saturday, July 26, 2003, at 09:40  AM, Dar Scott wrote:

>
> On Friday, July 25, 2003, at 07:21 PM, Mark Brownell wrote:
>
>> on mouseUp
>>   put "Do a <perl>web search for Perl regular expressions</perl> 
>> tutorials," into myVar
>>   put "<perl>(.*)(</perl>)" into regEx
>>
>>   -- perlRegEx
>>   put the milliseconds into tStartTime
>>   repeat with x = 1 to 500
>>     put matchText(myVar, regEx, tElement) into bbYes
>>   end repeat
>>   put (the milliseconds - tStartTime) into ptTime
>>   answer tElement
>>
>>   -- PNLP
>>   put the milliseconds into tStartTime
>>   repeat with i = 1 to 500
>>     put offset("<perl>", myVar) into tNumA
>>     put offset("</perl>", myVar) into tNumB
>>     put char (tNumA + 6) to (tNumB - 1) of myVar into tElement
>>   end repeat
>>   put (the milliseconds - tStartTime) into otTime
>>   answer tElement
>>
>>   -- show results
>>   answer "perlRegEx = "  & ptTime   & ", PNLP = "  & otTime
>> end mouseUp
>
> (Weird.  'myVar' is red in my script editor.)
>
> Part of the timing difference is that you are comparing apples and 
> oranges a little bit.
>
> The regex matches the text between the first <perl> and the last 
> </perl>.  If <perl></perl> can occur more than once, then that is not 
> what you need.

That would definitely foul up MTML's relational text gathering system.

>
> The offset method matches the first existence of <perl> and the first 
> existence of </perl>, that is, in any order.  It gets the text 
> between, which is empty if one </perl> is before <perl>.

Hence the name PNLP, Parallel Numerical Lineal Parser. The thing is 
meant to parse lineally. In older apps I had a check & see section for 
numbers that were less-than the appropriate matching start-tag number.

My markup language, my need to keep it well-formed; at least to the 
degree that MTML has a pseudo well-formed requirement that is. Since 
I'm working on integrating the WYSIWYG for MTML into the text 
environment, so that the user never sees the markup language, there 
will never be a case where a tag-set gets out of sequence. If a person 
where to create a document using a text editor then that could happen 
but the user app is far easier to create with. Anyone creating with a 
text editor aught to test it before deploying it anyway.

>
> Also, the matchText method sets bbYes, which you will need, I assume.  
> The offset method doesn't.
>
> You might have some "don't care" in your need, of course.

I have a lot of "don't care" in my need. It should work every time 
because the markup language should be implemented properly. I do, 
however check that it doesn't crash the app if it's improper in number.


> However, to compare these, I would make both match only the first 
> <perl></perl> pair (ignoring embedded pairs).  Also the offset method 
> should set bbYes.  This makes both take longer, but the resulting 
> differences in time are less.

Then one should also be aware that I sometimes try to pick up 
multiple-lines of text embedded between the <perl> </perl> sets. I 
noticed that the regEx example so far will not do that. I wonder what 
would happen to speed after that was added.


> Here is my try:

[snip]

> (My timing shows less of a difference.)
>
> Dar Scott

Very interesting. If I needed this form of validation then it would be 
worth the speed hit. There are cases where the PNLP will parse up to 
two megabytes of text for all instances of several MTML tag-sets from 
within the "<page> </page>" tag-sets where elements of each "<page> 
</page>" tag-sets are treated as individual objects, one at a time. I'm 
satisfied with the results that I'm getting from Rev. I did this all 
with the textCruncher Xtra for Director in shockwave before. T 
extCruncher Xtra is just a little faster, written in C and probably 
uses the string class or perl regEx in C scripted handlers to speed 
Director up.

Mark




More information about the use-livecode mailing list