Extracting a column

Richard Gaskin ambassador at fourthworld.com
Mon Dec 10 11:24:42 EST 2007


Klaus Major wrote:

>> Does somebody know if there is a "quick" way to extract a column  
>> from a tab limited list (in a field or a variable)?
>> By "quick" I mean I'm not obliged to cycle through all the lines of  
>> my var because it can be quite long.
>>
>> I've tried to use array but the transpose function doesn't work if  
>> the number of columns is not the same as the number of lines.
>> Or can I do something else with an array to achieve that goal?
> 
> use the new "split" command!
> ...
> put "your data here" into myvar
> put 2 into my_column
> ## The number of column you want to extract
> split myvar by column
> ## turns your data into an array!
> put myvar[my_column] into my_column_data
> ...

Well done, Klaus.  I'd forgotten that the "split" command has been 
extended with the "column" token, and since I have a data management 
library that I use in a number of apps I decided to test this against 
the "repeat for each line" method I'm currently using.

It seems that even with the convenience of the new form of "split", the 
"repeat for each line" method is still faster - here are the results of 
this morning's test:

   Split: 1101 ms (490.46 lines/ms)
   Repeat: 499 ms (1082.16 lines/ms)
   Same results?: true

(MacBook Pro 2.16GHz, OS X 10.4.11)

While the relative benchmarks favor "repeat for each", in absolute terms 
being able to extract a column from half a million lines per second 
isn't bad. :)


Here's the code - please let me know if I've missed something here which 
may be skewing the results:

on mouseUp
   set cursor to watch
   --
   -- Number of times to run the test:
   put 1000 into n
   --
   -- "src" contains a tab-delimited list of 540 lines:
   put fld  "src" into tData
   --
   -- TEST 1: split
   put the millisecs into t
   repeat n
     put GetCol1(tData, 2) into tmp1
   end repeat
   put the millisecs - t into t1
   --
   -- TEST 2: repeat for each:
   put the millisecs into t
   repeat n
     put GetCol2(tData, 2) into tmp2
   end repeat
   put the millisecs - t into t2
   --
   -- Display results:
   put tmp1 into fld "r1"
   put tmp2 into fld "r2"
   --
   -- Display times and verify that the
   -- results are the same:
   put N * the number of lines of tData into x
   set the numberformat to "0.##"
   put "Split: "&t1 &" ms ("& x/t1 &" lines/ms)"& \
       cr& "Repeat: "&t2 &" ms ("& x/t2 &" lines/ms)"&\
       cr&"Same results?: "&(tmp1 = tmp2)
end mouseUp

--
--  TEST 1: split
--
function GetCol1 pData, pCol
   split pData by column
   return pData[pCol]
end GetCol1

--
-- TEST 2: repeat for each
--
function GetCol2 pData, pCol
   put empty into tVal
   set the itemdel to tab
   repeat for each line tLine in pData
     put item pCol of tLine &cr after tVal
   end repeat
   delete last char of tVal
   return tVal
end GetCol2



My test stack with a 540-line source field is at:

    go url "http://fourthworldlabs.com/getcol_test.rev"


-- 
  Richard Gaskin
  Managing Editor, revJournal
  _______________________________________________________
  Rev tips, tutorials and more: http://www.revJournal.com



More information about the use-livecode mailing list