"read...until <string>" -- buffer size
Alex Tweedly
alex at tweedly.net
Tue Aug 5 17:16:55 EDT 2014
Summary : Richard said :
> Should we consider adding an optional argument for "read...until
> <string>" to specify the buffer size the engine will use?
IMHO, No.
LC is supposed to be an easy to use language/system. I don't need to
deal with malloc/free, I'm not vulnerable to memory leaks, I don't need
to fiddle with unnecessary detail - LC deals with it for me. So rather
than this "enhancement request", I'd say this would be a
"report of a bug due to unacceptable performance",
and it should just say something like "Make this work at acceptable
speed so it can be used" :-).
Even if you or I were able and willing to experiment with different
buffer sizes, how could we choose suitable sizes for all the different
OSes, disk setups, etc. that users might finish up using ?
I wonder if the problem is some generality in the string search for the
termination ? Shouldn't be, really - but if it is, I guess it would be
OK to have a more restricted form, such as
read lines from file ...
if that would get better performance.
But I would definitely want to make it easier to get good results, not
harder :-)
-- Alex.
On 05/08/2014 16:31, Richard Gaskin wrote:
> The "read...until <string>" form is wonderfully convenient, but very
> slow.
>
> It's so slow, in fact, that I've found I can write a few dozen lines
> of code to perform a functionally identical task at much greater speed.
>
> The algo I use I picked up from some old HyperCard article back in the
> day. In short, I read from disk into a buffer by a specified amount,
> then walk through the lines within the buffer in memory. When I reach
> the last item, which doesn't have a terminator, I read another batch
> from the file, appending my buffer, and repeat this process until I've
> completed my traversal of the file.
>
> While this has given me a satisfying speed bump well worth the time it
> took to write it, it occurs to me that it shouldn't be necessary at all.
>
> After all, it's not like "read...until <string>" is reading only one
> byte at a time from disk; I haven't read that part of the source but
> I'd be surprised if it's not reading at least one block's worth in
> each pass (usually 4k, depending on the file system).
>
> So in essence, it would seem the current "read...until <string>" algo
> in the engine is nearly identical to what I'm doing in script, with
> only one critical difference: the buffer size.
>
> Oddly enough, experimenting with different buffer sizes has yielded
> surprising results. At first I thought that minimizing disk access
> would be the primary boost, so I tried reading in 1 MB chunks but
> found far greater performance with much smaller amounts, even though
> it meant touching the disk more often. Ultimately, it seems the
> optimal buffer size in my experiments on one system was 128k; when
> reading smaller amounts the extra disk accesses take a toll, and when
> reading larger amounts it's also slower, perhaps due to malloc
> anomalies or the way LC uses malloc in that context.
>
> All this leaves me with a proposal:
>
> Should we consider adding an optional argument for "read...until
> <string>" to specify the buffer size the engine will use?
>
> E.g.:
>
> read from file tMyFile until CR with buffer 128000
>
> ...or:
>
> read from file tMyFile until CR for 128000
>
> ...or if we're going to try to be more English-like about it:
>
> read from file tMyFile until CR for 128k
>
>
> Worth submitting as an enhancement request?
>
> Anyone here in a position to implement this themselves?
>
> And is there anyone here who happens to know the buffer size the
> engine currently uses for this?
>
More information about the use-livecode
mailing list