Working with csv files that are 5000 lines or more
ambassador at fourthworld.com
Wed Apr 9 21:35:50 EDT 2008
Jim Schaubeck wrote:
> Very good feedback for me. You are correct , the method I was
> using was very slow (I had no idea).
Superficially the two main forms of repeat look very similar, but under
the hood they do very different things.
When you do this:
repeat with i = 1 to the number of lines of tData
get line i of tData
...that second line has to count the lines from 1 to i each time through
the loop. That's why you saw the increasing slowdown the farther it got
into the data.
But when you do this:
repeat for each line tLine in tData
...then the engine makes the assumption that the data in tData won't be
changing, so it doesn't need to count as it goes. Instead it parses as
it goes, automatically putting the value of each line into tLine each
time through the loop.
For data sets with just 5000 records, or even 50,000 records, you
probably don't need an RDBMS to handle them.
In cases where you're processing all records in sequence you probably
don't even need an array. Arrays are lightning fast for random access,
such as when you have a list of keys and you need to retrieve their
values. But the split and combine commands are very computationally
intensive, so for sequential processing of the full data set the
overhead of split and combine usually benchmarks as taking longer than
simply using "repeat for each" on the delimited text.
My WebMerge customers regularly process data sets in the hundreds of
thousands of lines, and write me happy notes about how good the
performance is. :)
I wish I could take credit for it, but really it's all Scott Raney, the
fella who owned the engine at the time the "repeat for each" form was
added. It's a godsend.
Managing Editor, revJournal
Rev tips, tutorials and more: http://www.revJournal.com
More information about the Use-livecode