Reading a (BIG) text file one line at a time - in reality...

J. Landman Gay jacque at hyperactivesw.com
Wed Nov 24 01:06:50 EST 2004


On 11/23/04 10:17 PM, Richard Gaskin wrote:

> If any of you have time to improve the buffering method below I'd be 
> interested in any significant changes to your test results.

If we want the buffering method to be as fast as possible, so as to test 
the method itself rather than the script that runs it, then we can speed 
up the script by rewriting method #3 like this:

   put the millisecs into t
   --
   put 0 into tWordCount3
   open file tFile for text read
   put empty into tBuffer
   repeat
     read from file tFile for 32000
     put tBuffer before it -- stores only 1 line from previous read
     if it is empty then exit repeat
     if the number of lines in it > 1 then
       put last line of it into tBuffer
       delete last line of it
     else
       put empty into tBuffer
     end if
     --
     repeat for each line l in it
       add the number of words of l to tWordCount3
     end repeat
   end repeat
   --
   put the millisecs - t into t3
   close file tFile
   --
   --

This script assumes that the last line in each 32K block is incomplete, 
which will almost always be the case. If the line isn't incomplete, it 
doesn't hurt anything to treat it like it is.

Problem is, I'm getting a slightly different word count than your 
original method. I didn't debug that because it's getting late, but it 
is off by just a few chars and I suspect it has to do with the very last 
line in the file. At any rate, the idea is that the difference in speed 
is pretty high; in my test the original took about 850 milliseconds and 
the revised one above took about 125. This would probably change your 
benchmarks a bit.

I added a "close file" command for completeness. If I get a chance, I'll 
try to figure out why my count is off, if someone else doesn't do it first.

-- 
Jacqueline Landman Gay         |     jacque at hyperactivesw.com
HyperActive Software           |     http://www.hyperactivesw.com


More information about the use-livecode mailing list