"read...until <string>" -- buffer size

Alex Tweedly alex at tweedly.net
Tue Aug 5 17:16:55 EDT 2014

Summary : Richard said :
> Should we consider adding an optional argument for "read...until 
> <string>" to specify the buffer size the engine will use?

LC is supposed to be an easy to use language/system. I don't need to 
deal with malloc/free, I'm not vulnerable to memory leaks, I don't need 
to fiddle with unnecessary detail - LC deals with it for me. So rather 
than this "enhancement request", I'd say this would be a
     "report of a bug due to unacceptable performance",
and it should just say something like "Make this work at acceptable 
speed so it can be used" :-).

Even if you or I were able and willing to experiment with different 
buffer sizes, how could we choose suitable sizes for all the different 
OSes, disk setups, etc. that users might finish up using ?

I wonder if the problem is some generality in the string search for the 
termination ? Shouldn't be, really - but if it is, I guess it would be 
OK to have a more restricted form, such as

  read lines from file ...

if that would get better performance.

But I would definitely want to make it easier to get good results, not 
harder :-)
-- Alex.

On 05/08/2014 16:31, Richard Gaskin wrote:
> The "read...until <string>" form is wonderfully convenient, but very 
> slow.
> It's so slow, in fact, that I've found I can write a few dozen lines 
> of code to perform a functionally identical task at much greater speed.
> The algo I use I picked up from some old HyperCard article back in the 
> day.  In short, I read from disk into a buffer by a specified amount, 
> then walk through the lines within the buffer in memory. When I reach 
> the last item, which doesn't have a terminator, I read another batch 
> from the file, appending my buffer, and repeat this process until I've 
> completed my traversal of the file.
> While this has given me a satisfying speed bump well worth the time it 
> took to write it, it occurs to me that it shouldn't be necessary at all.
> After all, it's not like "read...until <string>" is reading only one 
> byte at a time from disk; I haven't read that part of the source but 
> I'd be surprised if it's not reading at least one block's worth in 
> each pass (usually 4k, depending on the file system).
> So in essence, it would seem the current "read...until <string>" algo 
> in the engine is nearly identical to what I'm doing in script, with 
> only one critical difference:  the buffer size.
> Oddly enough, experimenting with different buffer sizes has yielded 
> surprising results.  At first I thought that minimizing disk access 
> would be the primary boost, so I tried reading in 1 MB chunks but 
> found far greater performance with much smaller amounts, even though 
> it meant touching the disk more often. Ultimately, it seems the 
> optimal buffer size in my experiments on one system was 128k; when 
> reading smaller amounts the extra disk accesses take a toll, and when 
> reading larger amounts it's also slower, perhaps due to malloc 
> anomalies or the way LC uses malloc in that context.
> All this leaves me with a proposal:
> Should we consider adding an optional argument for "read...until 
> <string>" to specify the buffer size the engine will use?
> E.g.:
>   read from file tMyFile until CR with buffer 128000
> ...or:
>   read from file tMyFile until CR for 128000
> ...or if we're going to try to be more English-like about it:
>   read from file tMyFile until CR for 128k
> Worth submitting as an enhancement request?
> Anyone here in a position to implement this themselves?
> And is there anyone here who happens to know the buffer size the 
> engine currently uses for this?

More information about the use-livecode mailing list