Working with csv files that are 5000 lines or more

Richard Gaskin ambassador at fourthworld.com
Wed Apr 9 21:35:50 EDT 2008


Jim Schaubeck wrote:
 > Very good feedback for me.  You are correct , the method I was
 > using was very slow (I had no idea).

Superficially the two main forms of repeat look very similar, but under 
the hood they do very different things.

When you do this:

repeat with i = 1 to the number of lines of tData
    get line i of tData
end repeat

...that second line has to count the lines from 1 to i each time through 
the loop.  That's why you saw the increasing slowdown the farther it got 
into the data.

But when you do this:

repeat for each line tLine in tData
   get tLine
end repeat

...then the engine makes the assumption that the data in tData won't be 
changing, so it doesn't need to count as it goes.  Instead it parses as 
it goes, automatically putting the value of each line into tLine each 
time through the loop.

For data sets with just 5000 records, or even 50,000 records, you 
probably don't need an RDBMS to handle them.

In cases where you're processing all records in sequence you probably 
don't even need an array.  Arrays are lightning fast for random access, 
such as when you have a list of keys and you need to retrieve their 
values.  But the split and combine commands are very computationally 
intensive, so for sequential processing of the full data set the 
overhead of split and combine usually benchmarks as taking longer than 
simply using "repeat for each" on the delimited text.

My WebMerge customers regularly process data sets in the hundreds of 
thousands of lines, and write me happy notes about how good the 
performance is. :)

I wish I could take credit for it, but really it's all Scott Raney, the 
fella who owned the engine at the time the "repeat for each" form was 
added. It's a godsend.

-- 
  Richard Gaskin
  Managing Editor, revJournal
  _______________________________________________________
  Rev tips, tutorials and more: http://www.revJournal.com



More information about the use-livecode mailing list