Deleting a column ...

Richard Gaskin ambassador at fourthworld.com
Mon Feb 9 18:54:14 EST 2009


Alex Tweedly wrote:

> I haven't tested extensively, but in a simple case or two, it seems to 
> work just doing
> 
>   split tVar by column
>   delete variable tVar[3]   -- to delete the 3rd column
>   combine tVar by column

Oddly enough, the split-and-combine method benchmark about 30% slower 
than "repeat for each".  This isn't too suprising given that split and 
combine are very computationally intensive operations, each effectively 
doing its own "repeat for each" under the hood (albeit in compiled code).

Also, there's an anomaly with combine that adds a new line to the data, 
with the right number of columns each being empty.  Not sure what's up 
with that, but to get the same result I just added a line to the 
function to delete it  - here's the code I tested:

-- Field 1 holds the data; in this test I used congressional
-- contact info, with 540 lines of 8 items each.

on mouseUp
   put 100 into n
   put fld 1 into tData
   --
   put the millisecs into t
   repeat n
     put DelCol1(tData, 3) into r1
   end repeat
   put the millisecs - t into t1
   --
   put the millisecs into t
   repeat n
     put DelCol2(tData, 3) into r2
   end repeat
   put the millisecs - t into t2

   put t1 && t2
end mouseUp


function DelCol1 pData, pCol
   put empty into tNuData
   set the itemdel to tab
   repeat for each line tLine in pData
     delete item 3 of tLine
     put tLine &cr after tNuData
   end repeat
   delete last char of tNuData
   return tNuData
end DelCol1


function DelCol2 pData, pCol
   set the columnDelimiter to tab
   split pData by column
   delete variable pData[3]   -- to delete the 3rd column
   combine pData by column
   delete last line of pData
   return pData
end DelCol2


RESULT: 123 165


Given the internal overhead of split and combine, I tend to use them 
only when I need to do a large number of lookups of specific records 
since the array hash is blindingly fast to traverse, esp. relative to 
the ultra-slow "get line x of tData".

For single lookups the split/combine overhead eats up more than the 
savings from not using "get line x". I haven't yet done enough testing 
to determine the number of lookups that would represent the cutoff on 
that (and there may also be a relationship with data size, and that's 
way too much testing for me to do in my spare time <g>).

--
  Richard Gaskin
  Fourth World
  Revolution training and consulting: http://www.fourthworld.com
  Webzine for Rev developers: http://www.revjournal.com



More information about the use-livecode mailing list