Buffer size (was Looking for parser for Email (MIME))

Mark Waddingham mark at livecode.com
Tue Mar 22 12:09:36 EDT 2016


On 2016-03-22 15:24, Richard Gaskin wrote:
> What is the size of the read buffer used when reading until <char>?
> 
> I'm assuming it isn't reading a single char per disk access, probably
> at least using the file system's block size, no?

Well, the engine will memory map files if it can (if there is available 
address space) so for smaller (sub 1Gb) files they are essentially all 
buffered. For larger files, the engine uses the stdio FILE abstraction 
so will get buffering from that.

> Given that the engine is probably already doing pretty much the same
> thing, would it make sense to consider a readBufferSize global
> property which would govern the size of the buffer the engine uses
> when executing "read...until <char>"?

Perhaps - the read until routines could potentially be made more 
efficient. For some streams, buffering is inappropriate unless 
explicitly stated (which isn't an option at the moment). For example, 
for serial port streams and process streams you don't want to read any 
more than you absolutely need to as the other end can block if you ask 
it for more data than it has available. At the moment the engine favours 
the 'do not read any more than absolutely necessary' approach as the 
serial/file/process stream processing code is the same.

> In my experiments I was surprised to find that larger buffers (>10MB)
> were slower than "read...until <char>", but the sweet spot seemed to
> be around 128k.  Presumably this has to do with the overhead of
> allocating contiguous memory, and if you have any insights on that it
> would be interesting to learn more.

My original reasoning on this was a 'working set' argument. Modern CPUs 
heavily rely on various levels of memory cache, access getting more 
expensive as the cache is further away from the processor. If you use a 
reasonable sized buffer to implement processing in a stream fashion, 
then the working set is essentially just that buffer which means less 
movement of blocks of memory from physical memory to/from the processors 
levels of cache.

However, having chatted to Fraser, he pointed out that Linux tends to 
have a file read ahead of 64kb-128kb 'builtin'. This means that the OS 
will proactively prefetch the next 64-128kb of data after it has 
finished fetching the one you have asked for. The result is that data is 
being read from disk by the OS whilst your processing code is running 
meaning that things get done quicker. (In contrast, if you have a 10Mb 
buffer then you have to wait to read 10Mb before you can do anything 
with it, and then do that again when the buffer is empty).

> Pretty much any program will read big files in chunks, and if LC can
> do so optimally with all the grace and ease of "read...until <char>"
> it makes one more strong set of use cases where choosing LC isn't a
> tradeoff but an unquestionable advantage.

If you have the time to submit a report in the QC with a sample stack 
measuring the time of a simple 'read until cr' type loop with some data 
and comparing it to the more efficient approach you found then it is 
something we (or someone else) can do some digging into at some point to 
see what we can do to improve its performance.

As I said initially, for smaller files I'd be surprised if we could do 
that much since those files will be memory mapped; however, it might be 
there are some improvements which could be made for larger (non memory 
mappable) files.

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps





More information about the use-livecode mailing list