shilling for my feature request [1926]
Mark Brownell
gizmotron at earthlink.net
Fri Aug 20 19:42:07 EDT 2004
On Friday, August 20, 2004, at 03:46 PM, Alex Tweedly wrote:
> At 10:27 20/08/2004 -0700, Mark Brownell wrote:
>
>> Hi,
>>
>> I finally found what I was looking for in the basic core for all my
>> pull-parser needs. With the help of those at Run Rev this was found:
>
> Sorry Mark, I'm going to have to ask for another lesson :-)
>
> How does this proposal help with a pull-parser ?
It replaces the need to run a loop to find all instances of a tag. If
you control the schema type of the XML and write it so that pulling the
information makes the pull-parser design optimal then you have a simple
fast solution. If you need to validate then use a standard XML parser.
So adding to the example:
get matchGlobal("<a>foo</a><a>bar</a><a>baz</a>", "<a>") would give
it[1] = 1
it[2] = 10
it[3] = 20
get matchGlobal("<a>foo</a><a>bar</a><a>baz</a>", "</a>") would give
it[1] = 7
it[2] = 17
it[3] = 27
so I would use:
put get matchGlobal(myString", "<a>") into startTagArray
put get matchGlobal(myString", "</a>") into endTagArray
put char ( startTagArray[1] +3) to (endTagArray[1] - 1) of mystring
into A1-1
put char ( startTagArray[2] +3) to (endTagArray[2] - 1) of mystring
into A2-1
put char ( startTagArray[3] +3) to (endTagArray[3] - 1) of mystring
into A3-1
My XML example uses 3 instances of "<a>" tag-sets for each instance of
"<task>".
So if I were to encapsulate these three previous sets into multiple
sets with another tag-set I could create an array of "<task>" sets.
So this:
"<task><a>foo1</a><a>bar1</a><a>baz1</a></task>"
"<task><a>foo2</a><a>bar2</a><a>baz2</a></task>"
"<task><a>foo3</a><a>bar3</a><a>baz3</a></task>"
"<task><a>foo4</a><a>bar4</a><a>baz4</a></task>"
put get matchGlobal(myString", "<task>") into startTagArray
put get matchGlobal(myString", "</task>") into endTagArray
So I have the start points and end points for each respective
occurrence of a tag-sets, "<task>".
All I have to do is use them to populate an array. (the old tried and
true way that is)
Now this matchGlobal( ) function is starting to get faster than using
an array because you have to loop through the array and this thing can
drop that loop step. You know how offset() has the ability to start the
next search after a certain point. Well you could use this
matchGlobal() function to find the resultant contents of an "<a>" set
that matched your search and then build a pulled array from the
numerical numbers that were just below in the startTagArray[?] and just
above in the endTagArray[?] for each instance found of "<a>" that fits.
example:
itS[1] = 1
itS[2] = 10
itS[3] = 20
itE[1] = 7
itE[2] = 17
itE[3] = 27
If the one I like is at char 14 so I build the data from char (itS[2]
+ 3) to (itE[2] - 1)
All I need is a quick way to loop once through the itS[] and itE[]
arrays to get the ones that I want. I should be able to do this in one
pass through the arrays. I might add that I'm now not storing my data
in an "<task>" data array. MyString is now the permanent container of
data and subject to high speed changes using the replace function. If
I can control the XML schema then I can make this pull technique negate
the use of databases and just use the data in the XML state. I can't
wait to test the results on MTML were currently pages are stored in
each numerical key of an array receptively {1], [2], [3]. etc...
> This looks (simply) like a scheme to do fast searching for multiple
> occurrences of a string; could give a significant speed gain over
> repeated calls to offset, if only because the B-M setup time can be
> done once rather than each call (or each call with caching), as well
> as the speed gain from a single call versus multiple calls.
That's what I think also.
> But this seems less useful than your earlier
> split by string1 to string2
> proposal, which would (more obviously) allow incremental parsing.
This "split by string1 to string2" idea put it into an array. I think
that matchGlobal() might be faster if the loop through that "split by"
array is eliminated.
I could use a form of repeat that exits after my target is greater than
the last result of looping through the matchGlobal( ) arrays. Anyway I
would search for the fastest repeat technique to get the start and end
points I was interested in.
It all means less arrays and less trips through arrays using repeat
loops.
> Not that that means it wouldn't be a useful high-speed parsing
> technique - I just don't see how it could be used to create a
> pull-parser.-- Alex.
HTH,
Mark
More information about the use-livecode
mailing list