How can we dynamically create variable names from changing value "x" on a loop?
Ben Rubinstein
benr_mc at cogapp.com
Tue Nov 8 07:46:10 EST 2016
Hi Mark,
There's a reason why I haven't posted the code of explodeRow... but I'm sure
it _could_ be efficient!
Thanks for reminding me about split with one delimiter - I never use that.
I think when I first encountered it I was so annoyed by the thought that it
was pointless, because what I was looking for was for it have the opposite
effect (key on the chunk text, value being the index) that I never considered
it again. But of course it makes a lot of sense in a context where dipping
into indexed items repeatedly is going to be expensive - I'll try to remember
its use in the future.
Ben
On 08/11/2016 12:23, Mark Waddingham wrote:
> Apologies - I clicked the wrong button in my email client and managed to send
> a partially composed message. Here is the correct version!
>
> On 2016-11-08 12:48, Ben Rubinstein wrote:
>> The point is that in my first pattern, I have outside the loop
>> assigned column (item) indices to named variables (based on the items
>> of the first, header, row). In the loop LC then has to locate the
>> indexed items in an individual data row.
>
> In the first pattern:
>
> repeat for each line tRec in tTSVdata
> doSomething item viUserID of tRec, item viUserName of tRec
> ...
> end repeat
>
> The 'item <constant> of tRec' expressions cause the engine to iterate through
> tRect until it has found the relevant item. This means that this single line
> will be looking through the tRec string twice from the start - the first time
> up until the viUserID'd item, the second time up to the viUserName'd item. The
> speed of this will largely depend on how large the item indicies are, and how
> large tRec is (and where the items fall in tRec).
>
> If the item indices are small, close and near to the start, and tRec is small,
> and you don't use 'item ... of tRec' anywhere else in the loop, then it will
> likely be faster than anything else.
>
>> In the second pattern, the code which happens to be in a function for
>> neatness has to create a new empty array, and chunk both the data row
>> and the header row in order to get column names and values to put into
>> the array. You can loop over one set of items, but not both, so LC
>> still has to locate indexed items in at least one case.
>
> put line 1 of tTSVdata into tColumnNames
> delete line 1 of tTSVdata
> repeat for each line tRec in tTSVdata
> put explodeRow(tRec, tColumnNames) into aData
> doSomething aData["User ID"], aData["User Name"]
> ...
> end repeat
>
> The performance will largely depend on the implementation of explodeRow and
> (as you said subsequently) how many columns you want from the row.
>
> If you only want 2 then unless each tRec is very long and you are fetching two
> items near the end then the non-array version will likely be faster. If,
> however, the two items are near the end of the row or you are wanting to
> access lots of items then this will be faster than either:
>
> repeat for each line tRec in tTSVdata
> split tRec by tab
> doSomething tRec[viUserID], tRec[viUserName]
> ...
> end repeat
>
> The difference here is that with the 'item' approach the speed will reduce
> quadratically with the length of tRec and the max(viUserId, viUserName); with
> the 'split' approach the speed will reduce linearly with the length of tRec.
>
> Depending on the average lengths of tRec and values of viUserId / viUserName,
> at somepoint the 'item' approach will start to be significantly slower than
> the 'split' version.
>
> The explodeRow approach sounds like it has lots of overhead. A fair amount of
> the overhead could probably be eliminated by doing 'split tColumnNames by
> tab', and then using array access in explodeRow to form the aData array (also
> making sure explodeRow is private will help too).
>
> Just my two pence.
>
> Mark.
>
More information about the use-livecode
mailing list