Buffer size (was Looking for parser for Email (MIME))

Richard Gaskin ambassador at fourthworld.com
Tue Mar 22 12:54:03 EDT 2016


Very helpful info - thanks!

I'll see if I can dig up my old experiment code and submit a tidy 
version with an enhancement request.

My hope was that it might be as simple as "Aha, yes, as bigger buffer 
size!", but few things in life are that simple. :)

-- 
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  Ambassador at FourthWorld.com                http://www.FourthWorld.com


Mark Waddingham wrote:
> On 2016-03-22 15:24, Richard Gaskin wrote:
>> What is the size of the read buffer used when reading until <char>?
>>
>> I'm assuming it isn't reading a single char per disk access, probably
>> at least using the file system's block size, no?
>
> Well, the engine will memory map files if it can (if there is available
> address space) so for smaller (sub 1Gb) files they are essentially all
> buffered. For larger files, the engine uses the stdio FILE abstraction
> so will get buffering from that.
>
>> Given that the engine is probably already doing pretty much the same
>> thing, would it make sense to consider a readBufferSize global
>> property which would govern the size of the buffer the engine uses
>> when executing "read...until <char>"?
>
> Perhaps - the read until routines could potentially be made more
> efficient. For some streams, buffering is inappropriate unless
> explicitly stated (which isn't an option at the moment). For example,
> for serial port streams and process streams you don't want to read any
> more than you absolutely need to as the other end can block if you ask
> it for more data than it has available. At the moment the engine favours
> the 'do not read any more than absolutely necessary' approach as the
> serial/file/process stream processing code is the same.
>
>> In my experiments I was surprised to find that larger buffers (>10MB)
>> were slower than "read...until <char>", but the sweet spot seemed to
>> be around 128k.  Presumably this has to do with the overhead of
>> allocating contiguous memory, and if you have any insights on that it
>> would be interesting to learn more.
>
> My original reasoning on this was a 'working set' argument. Modern CPUs
> heavily rely on various levels of memory cache, access getting more
> expensive as the cache is further away from the processor. If you use a
> reasonable sized buffer to implement processing in a stream fashion,
> then the working set is essentially just that buffer which means less
> movement of blocks of memory from physical memory to/from the processors
> levels of cache.
>
> However, having chatted to Fraser, he pointed out that Linux tends to
> have a file read ahead of 64kb-128kb 'builtin'. This means that the OS
> will proactively prefetch the next 64-128kb of data after it has
> finished fetching the one you have asked for. The result is that data is
> being read from disk by the OS whilst your processing code is running
> meaning that things get done quicker. (In contrast, if you have a 10Mb
> buffer then you have to wait to read 10Mb before you can do anything
> with it, and then do that again when the buffer is empty).
>
>> Pretty much any program will read big files in chunks, and if LC can
>> do so optimally with all the grace and ease of "read...until <char>"
>> it makes one more strong set of use cases where choosing LC isn't a
>> tradeoff but an unquestionable advantage.
>
> If you have the time to submit a report in the QC with a sample stack
> measuring the time of a simple 'read until cr' type loop with some data
> and comparing it to the more efficient approach you found then it is
> something we (or someone else) can do some digging into at some point to
> see what we can do to improve its performance.
>
> As I said initially, for smaller files I'd be surprised if we could do
> that much since those files will be memory mapped; however, it might be
> there are some improvements which could be made for larger (non memory
> mappable) files.
>
> Warmest Regards,
>
> Mark.





More information about the use-livecode mailing list