Accessing parts of arrays

Thu Sep 30 16:41:33 EDT 2004

On Thursday, September 30, 2004, at 12:35 PM, Jan Schenkel wrote:

> If my memory serves me well, Geoff Canyon started a
> thread on the xTalk mailing list a while ago that
> proposed functions itemOffsets, wordOffsets and
> lineOffsets which would return all the occurences'
> locations.
>
> So if we could have an elementOffsets() function, this
> would be the best solution for the above request, I
> think.
>
> Jan Schenkel.

I have been talking to people at the mothership about this.

First off my software that I created with Rev is starting to sell. This 
gives me the money to pay for the externals suggested below by Mark. 
The reason that I bring it up here on the list in the open is that the 
external will speed up my XML based database and if it where later to 
be added to the engine then it would speed up my software that I'm 
selling now.  This sounds like it might be a great tool for array power 
if you are willing to use a parser for the manipulations.

Before I proceed does this suggestion sound good for this array thread? 
(see below) Pull-parsing an XML structure at high speed could give us 
all kinds of array manipulations if you where to use numbered tag sets 
like <1>[data]</1>, <2>[more data here]</2>,  <3>[even more 
data]</3>etc...  and <1,1>  <1,2> and <1,3,1> for  dimensional arrays.

Mark Brownell

On Wednesday, September 29, 2004, at 09:50 AM, Mark Waddingham wrote:

> Hi Mark,
>
[snip]
> In terms of your request for the suggested matchGlobal function [see 
> below] then while
> it would be nice to have, in comparison with other feature requests 
> that
> we have, it is difficult to justify putting development time into this 
> as
> opposed to other extensions/enhancements and features that people have
> requested.
>
> However, as I mentioned before, we would be perfectly willing to 
> develop
> an external with the functionality you require which can then be
> integrated into the engine at the next opportunity. This both mitigates
> the development cost to us, and provides you a more flexible solution
> should you require specialization and/or optimization of the functions 
> in
> the future.
>
> If you are interested in proceeding in this manner then I will happily 
> put
> together a more concrete proposal to you, including technical details
> and time costings, and leave you to negotiate with Kevin the costs and
> finer contractual details.
>
> To give you an idea of the substance of such a proposal I would suggest
> implementing an external with the following functions:
>
>   matchOffsets(<needle>, <haystack>, [ <from> ], [ <to> ])
>   - return a list of offsets of the <needle> in char <from> to <to> of
>     <haystack> one per line.
>
>   matchParallelOffsets(<needles>, <needle_sep>, <haystack>, [ <from> 
> ], [ <to> ])
>   - return a list of offsets of each chunk of <needles> in char <from> 
> to
>     <to> of <haystack>
>     The chunks of <needles> would be delimited by the character 
> <needle_sep>.
>     Each line of this list would be of the form
>       offset of <needle_1>, offset of <needle_2>, ...,, offset of 
> <needle_n>
>   (i.e. the functionality of your parser would be given by doing a 
> single
>     call of matchParallelOffsets with two chunks in the <needles>)
>
>   matchSetCacheSize <size>
>   - The Boyer-Moore algorithm has a set-up cost for each pattern which
>     incurs a memory overhead. This call would set the maximum number of
>     patterns that should be cached at any one time.
>
> To give an idea about how these might be implemented in the engine, 
> then
> Jeanne's suggestion for syntax is a good one (assuming it doesn't cause
> any conflicts - I make no promises as to whether this syntax is 
> feasible):
>
>   the offsets of <needle> in <haystack>
>   the offsets of the lines/words/items of <needles> in <haystack>
>
> Anyway, I shall leave you to think on this way forward, and I promise 
> to
> be more efficient in getting back to you next time.
>
> Warmest Regards,
>
> Mark.
>
> On Thu, 16 Sep 2004, Mark Brownell wrote:
>
>> Hi Mark,
>>
>> I was wondering, now that things might have gotten a little less
>> hectic, what or if any progress has been made on adding this to the 
>> Rev
>> engine? This is exactly what I was hoping to get. I can use it to
>> isolate large portions of huge documents for the purpose of creating
>> something I might need very badly in the next few months.  also this
>> single function could be highly useful to others as you pointed out.
>>
>> Thanks,
>>
>> Mark Brownell
>>
>> On Wednesday, August 18, 2004, at 03:10 AM, Mark Waddingham wrote:
>>
>>> The one of most interest is the Boyer-Moore algorithm as this is
>>> reputed
>>> to be the fastest.
>>>
>>> So, one idea is to implement a function:
>>>   matchGlobal(stringToSearch, token)
>>> returning a list of all indices in stringToSearch of token.
>>>
>>> e.g.
>>>   get matchGlobal("<a>foo</a><a>bar</a><a>baz</a>", "<a>")
>>> would give
>>>   it[1] = 1
>>>   it[2] = 10
>>>   it[3] = 20
>>