Making Revolution faster with really big arrays
Dennis Brown
see3d at writeme.com
Tue Apr 12 21:49:48 EDT 2005
Thanks Mark,
Your suggestion would not help for my application, the offset function
is not faster than counting up returns (line chunk statement) to get to
the proper line, but I have used it before for parsing a large USDA
food nutrition database and it helped a lot. However, it does give me
some more ideas about how I can use a hybrid approach to speed things
up a bit. I know exactly what line and item the data I want is in,
and it is always the next one. I might be able to suffer with the
chunk specification for the line#, then use a repeat for each item and
put 2500 items in an array. That way I will only need 2500 array items
at any one time instead of 125,000,000 array items per data file. But
I will still have to put 125,000,000 items into array elements and then
read them back out again once per data pass. Perhaps 10-100 times
slower than an "access" keyword instead of 1000-10,000 times slower. I
will do some sample tests and see what I come up with.
Dennis
On Apr 12, 2005, at 7:29 PM, Mark Brownell wrote:
> Hi Dennis,
>
> I have found that large data files can be broken down into smaller
> objects using simplified XML where access is obtained using a
> pull-parser. Unlike the XML parser in Revolution a very fast
> pull-parser can be used to break down objects and parse out specific
> fields without ever building a full parsing in the more traditional
> form using standard parsers. So if you can break down your data and
> transform it using simple element type XML structuring then you might
> be able to create a system that can find information in large data
> objects.
>
> I once asked the creators of Rev to add or create a faster
> pull-parser. They came up with something that would improve on my
> Transcript based pull-parser by about 20%. I found out that all I
> needed to do was lock the screen and unlock it after I was done
> parsing my files in order to get the speeds I was looking for. In
> other words what I did in Transcript was very fast for a native
> written pull-parser.
>
> Here it is one more time:
>
> HTH,
>
> Mark
>
> ==================
>
> -- put getElementsArray("<record>", "</record>", tZap) into theArray
> function getElementsArray tStartTag, tEndTag, StringToSearch
> put empty into tArray
> put 0 into tStart1
> put 0 into tStart2
> put 1 into tElementNum
> put the number of chars in tStartTag into dChars
> repeat
> put offset(tStartTag,StringToSearch,tStart1) into tNum1
> put (tNum1 + tStart1) into tStart1
> if tNum1 < 1 then exit repeat
> put offset(tEndTag,StringToSearch,tStart2) into tNum2
> put (tNum2 + tStart2) into tStart2
> if tNum2 < 1 then exit repeat
> --if tNum2 < tNum1 then exit repeat
> put char (tStart1 + dChars) to (tStart2 - 1) of StringToSearch
> into zapped
> put zapped into tArray[tElementNum]
> add 1 to tElementNum
> end repeat
> return tArray
> end getElementsArray
>
> -- put getElement("<record>", "</record>", tZap) into theElement
> function getElement tStTag, tEdTag, stngToSch
> put empty into zapped
> put the number of chars in tStTag into dChars
> put offset(tStTag,stngToSch) into tNum1
> put offset(tEdTag,stngToSch) into tNum2
> if tNum1 < 1 then
> return "error"
> exit getElement
> end if
> if tNum2 < 1 then
> return "error"
> exit getElement
> end if
> put char (tNum1 + dChars) to (tNum2 - 1) of stngToSch into zapped
> return zapped
> end getElement
>
> =================
>
>
>> The Idea is to break apart the essential functional elements of the
>> repeat for each control to allow more flexibility. This sample has a
>> bit more refinement than what I posted yesterday in Bugzilla.
>>
>> The new keyword would be "access" , but could be something else.
>>
>> An example of the use of the new keywords syntax would be:
>>
>> access each line X in arrayX--initial setup of pointers and X value
>> access each item Y in arrayY --initial setup of pointers and Y value
>> repeat for number of lines of arrayX times --same as a repeat for each
>> put X & comma & Y & return after ArrayXY --merged array
>> next line X --puts the next line value in X
>> next item Y --if arrayY has fewer elements than arrayX, then empty
>> is supplied, could also put "End of String" in the result
>> end repeat
>>
>> Another advantage of this syntax is that it provides for more
>> flexibility in structure of loops. You could repeat forever, then
>> exit repeat when you run out of values (based on getting an empty
>> back). The possibilities for high speed sequential access data
>> processing are much expanded which opens up more possibilities for
>> Revolution.
>>
>> I would love to get your feedback or other ideas about solving this
>> problem.
>>
>> Dennis
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
More information about the use-livecode
mailing list