the large file challenge

Scott Raney raney at metacard.com
Sun Nov 10 19:08:01 EST 2002


On Sun, 10 Nov 2002 Richard Gaskin <ambassador at fourthworld.com> wrote:

> My hunch is that reading for lines is slower than reading a
> specified number of chars, since with lines it needs to evaluate
> each incoming character to determine if it's a return -- Scott, am I
> right or should they be about the same?

You're right, though I wouldn't think it would make *that* much
difference.

As for my guess as to the fastest way to do this, it'd probably be a
hybrid approach, using both "read for x" and "repeat for each line".
You'd start by opening the file for binary read (faster than other
modes).  Then read for X characters, where X would be some large
number experimentally determined for each system (it'd probably some
large percentage of the free RAM, and so probably on the order of a
few MB), and then use "repeat for each line l in it".

The trick is that the last line will be incomplete in this case, so
for the second and subsequent reads you subtract the length of the
last line from X, and do "read for X at Y", where Y is a running total
of what's been read, after subtracting the partial lines of course.
Some extra bookkeeping will be required in this case (e.g., if the tag
you're looking for is in the partial last line you need to subtract 1
from the count so you don't count it twice).  Exactly how to do this
part most efficiently is left as an excercise for the reader ;-)
  Regards,
    Scott

> -- 
>  Richard Gaskin 
>  Fourth World Media Corporation
>  Developer of WebMerge 2.0: Publish any database on any site
>  ___________________________________________________________
>  Ambassador at FourthWorld.com       http://www.FourthWorld.com
>  Tel: 323-225-3717                       AIM: FourthWorldInc

********************************************************
Scott Raney  raney at metacard.com  http://www.metacard.com
MetaCard: You know, there's an easier way to do that...




More information about the metacard mailing list