Buffer size (was Looking for parser for Email (MIME))
Mark Waddingham
mark at livecode.com
Tue Mar 22 12:09:36 EDT 2016
On 2016-03-22 15:24, Richard Gaskin wrote:
> What is the size of the read buffer used when reading until <char>?
>
> I'm assuming it isn't reading a single char per disk access, probably
> at least using the file system's block size, no?
Well, the engine will memory map files if it can (if there is available
address space) so for smaller (sub 1Gb) files they are essentially all
buffered. For larger files, the engine uses the stdio FILE abstraction
so will get buffering from that.
> Given that the engine is probably already doing pretty much the same
> thing, would it make sense to consider a readBufferSize global
> property which would govern the size of the buffer the engine uses
> when executing "read...until <char>"?
Perhaps - the read until routines could potentially be made more
efficient. For some streams, buffering is inappropriate unless
explicitly stated (which isn't an option at the moment). For example,
for serial port streams and process streams you don't want to read any
more than you absolutely need to as the other end can block if you ask
it for more data than it has available. At the moment the engine favours
the 'do not read any more than absolutely necessary' approach as the
serial/file/process stream processing code is the same.
> In my experiments I was surprised to find that larger buffers (>10MB)
> were slower than "read...until <char>", but the sweet spot seemed to
> be around 128k. Presumably this has to do with the overhead of
> allocating contiguous memory, and if you have any insights on that it
> would be interesting to learn more.
My original reasoning on this was a 'working set' argument. Modern CPUs
heavily rely on various levels of memory cache, access getting more
expensive as the cache is further away from the processor. If you use a
reasonable sized buffer to implement processing in a stream fashion,
then the working set is essentially just that buffer which means less
movement of blocks of memory from physical memory to/from the processors
levels of cache.
However, having chatted to Fraser, he pointed out that Linux tends to
have a file read ahead of 64kb-128kb 'builtin'. This means that the OS
will proactively prefetch the next 64-128kb of data after it has
finished fetching the one you have asked for. The result is that data is
being read from disk by the OS whilst your processing code is running
meaning that things get done quicker. (In contrast, if you have a 10Mb
buffer then you have to wait to read 10Mb before you can do anything
with it, and then do that again when the buffer is empty).
> Pretty much any program will read big files in chunks, and if LC can
> do so optimally with all the grace and ease of "read...until <char>"
> it makes one more strong set of use cases where choosing LC isn't a
> tradeoff but an unquestionable advantage.
If you have the time to submit a report in the QC with a sample stack
measuring the time of a simple 'read until cr' type loop with some data
and comparing it to the more efficient approach you found then it is
something we (or someone else) can do some digging into at some point to
see what we can do to improve its performance.
As I said initially, for smaller files I'd be surprised if we could do
that much since those files will be memory mapped; however, it might be
there are some improvements which could be made for larger (non memory
mappable) files.
Warmest Regards,
Mark.
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list