Extracting a column
Richard Gaskin
ambassador at fourthworld.com
Mon Dec 10 11:24:42 EST 2007
Klaus Major wrote:
>> Does somebody know if there is a "quick" way to extract a column
>> from a tab limited list (in a field or a variable)?
>> By "quick" I mean I'm not obliged to cycle through all the lines of
>> my var because it can be quite long.
>>
>> I've tried to use array but the transpose function doesn't work if
>> the number of columns is not the same as the number of lines.
>> Or can I do something else with an array to achieve that goal?
>
> use the new "split" command!
> ...
> put "your data here" into myvar
> put 2 into my_column
> ## The number of column you want to extract
> split myvar by column
> ## turns your data into an array!
> put myvar[my_column] into my_column_data
> ...
Well done, Klaus. I'd forgotten that the "split" command has been
extended with the "column" token, and since I have a data management
library that I use in a number of apps I decided to test this against
the "repeat for each line" method I'm currently using.
It seems that even with the convenience of the new form of "split", the
"repeat for each line" method is still faster - here are the results of
this morning's test:
Split: 1101 ms (490.46 lines/ms)
Repeat: 499 ms (1082.16 lines/ms)
Same results?: true
(MacBook Pro 2.16GHz, OS X 10.4.11)
While the relative benchmarks favor "repeat for each", in absolute terms
being able to extract a column from half a million lines per second
isn't bad. :)
Here's the code - please let me know if I've missed something here which
may be skewing the results:
on mouseUp
set cursor to watch
--
-- Number of times to run the test:
put 1000 into n
--
-- "src" contains a tab-delimited list of 540 lines:
put fld "src" into tData
--
-- TEST 1: split
put the millisecs into t
repeat n
put GetCol1(tData, 2) into tmp1
end repeat
put the millisecs - t into t1
--
-- TEST 2: repeat for each:
put the millisecs into t
repeat n
put GetCol2(tData, 2) into tmp2
end repeat
put the millisecs - t into t2
--
-- Display results:
put tmp1 into fld "r1"
put tmp2 into fld "r2"
--
-- Display times and verify that the
-- results are the same:
put N * the number of lines of tData into x
set the numberformat to "0.##"
put "Split: "&t1 &" ms ("& x/t1 &" lines/ms)"& \
cr& "Repeat: "&t2 &" ms ("& x/t2 &" lines/ms)"&\
cr&"Same results?: "&(tmp1 = tmp2)
end mouseUp
--
-- TEST 1: split
--
function GetCol1 pData, pCol
split pData by column
return pData[pCol]
end GetCol1
--
-- TEST 2: repeat for each
--
function GetCol2 pData, pCol
put empty into tVal
set the itemdel to tab
repeat for each line tLine in pData
put item pCol of tLine &cr after tVal
end repeat
delete last char of tVal
return tVal
end GetCol2
My test stack with a 540-line source field is at:
go url "http://fourthworldlabs.com/getcol_test.rev"
--
Richard Gaskin
Managing Editor, revJournal
_______________________________________________________
Rev tips, tutorials and more: http://www.revJournal.com
More information about the use-livecode
mailing list