shilling for my feature request [1926]

Fri Aug 20 19:42:07 EDT 2004

On Friday, August 20, 2004, at 03:46 PM, Alex Tweedly wrote:

> At 10:27 20/08/2004 -0700, Mark Brownell wrote:
>
>> Hi,
>>
>> I finally found what I was looking for in the basic core for all my
>> pull-parser needs. With the help of those at Run Rev this was found:
>
> Sorry Mark, I'm going to have to ask for another lesson :-)
>
> How does this proposal help with a pull-parser ?

It replaces the need to run a loop to find all instances of a tag. If 
you control the schema type of the XML and write it so that pulling the 
information makes the pull-parser design optimal then you have a simple 
fast solution. If you need to validate then use a standard XML parser. 
So adding to the example:

  get matchGlobal("<a>foo</a><a>bar</a><a>baz</a>", "<a>") would give
   it[1] = 1
   it[2] = 10
   it[3] = 20

get matchGlobal("<a>foo</a><a>bar</a><a>baz</a>", "</a>") would give
   it[1] = 7
   it[2] = 17
   it[3] = 27

so I would use:
put get matchGlobal(myString", "<a>") into startTagArray
put get matchGlobal(myString", "</a>") into endTagArray
put char ( startTagArray[1] +3) to (endTagArray[1] - 1) of mystring 
into A1-1
put char ( startTagArray[2] +3) to (endTagArray[2] - 1) of mystring 
into A2-1
put char ( startTagArray[3] +3) to (endTagArray[3] - 1) of mystring 
into A3-1

My XML example uses 3 instances of "<a>" tag-sets for each instance of 
"<task>".

So if I were to encapsulate these three previous sets into multiple 
sets with another tag-set I could create an array of "<task>" sets.

So this:
"<task><a>foo1</a><a>bar1</a><a>baz1</a></task>"
"<task><a>foo2</a><a>bar2</a><a>baz2</a></task>"
"<task><a>foo3</a><a>bar3</a><a>baz3</a></task>"
"<task><a>foo4</a><a>bar4</a><a>baz4</a></task>"

put get matchGlobal(myString", "<task>") into startTagArray
put get matchGlobal(myString", "</task>") into endTagArray

So I have the start points and end points for each respective 
occurrence of a tag-sets, "<task>".

All I have to do is use them to populate an array. (the old tried and 
true way that is)

Now this matchGlobal( ) function is starting to get faster than using 
an array because you have to loop through the array and this thing can 
drop that loop step. You know how offset() has the ability to start the 
next search after a certain point. Well you could use this 
matchGlobal() function to find the resultant contents of an "<a>" set  
that matched your search and then build a pulled array from the 
numerical numbers that were just below in the startTagArray[?] and just 
above in the endTagArray[?] for each instance found of "<a>" that fits.

example:
  itS[1] = 1
   itS[2] = 10
   itS[3] = 20

   itE[1] = 7
   itE[2] = 17
   itE[3] = 27

If the one I like is at  char 14 so I build the data from char (itS[2] 
+ 3) to (itE[2] - 1)

All I need is a quick way to loop once through the itS[] and itE[] 
arrays to get the ones that I want. I should be able to do this in one 
pass through the arrays. I might add that I'm now not storing my data 
in an "<task>" data array. MyString is now the permanent container of 
data and subject to high speed   changes using the replace function. If 
I can control the XML schema then I can make this pull technique negate 
the use of databases and just use the data in the XML state. I can't 
wait to test the results on MTML were currently pages are stored in 
each numerical key of an array receptively {1], [2], [3]. etc...

> This looks (simply) like a scheme to do fast searching for multiple 
> occurrences of a string; could give a significant speed gain over 
> repeated calls to offset, if only because the B-M setup time can be 
> done once rather than each call (or each call with caching), as well 
> as the speed gain from a single call versus multiple calls.

That's what I think also.

> But this seems less useful than your earlier
>    split by string1 to string2
> proposal, which would (more obviously) allow incremental parsing.

This "split by string1 to string2" idea put it into an array. I think 
that matchGlobal() might be faster if the loop through that "split by" 
array is eliminated.

I could use a form of repeat that exits after my target is greater than 
the last result of looping through the matchGlobal( ) arrays. Anyway I 
would search for the fastest repeat technique to get the start and end 
points I was interested in.

It all means less arrays and less trips through arrays using repeat 
loops.

> Not that that means it wouldn't be a useful high-speed parsing 
> technique - I just don't see how it could be used to create a 
> pull-parser.-- Alex.

HTH,

Mark