Buffer size (was Looking for parser for Email (MIME))

Richard Gaskin ambassador at fourthworld.com
Tue Mar 22 10:24:27 EDT 2016


Mark Waddingham wrote:

> open file ...
> repeat forever
>    read from file ... until return
>    if the result is not empty then
>      exit repeat
>    end if
>    if *it is a new message boundary* then
>      ... finish processing current message ...
>      ... start processing new boundary ...
>    else
>      ... append line to current message ...
>    end if
> end repeat

What is the size of the read buffer used when reading until <char>?

I'm assuming it isn't reading a single char per disk access, probably at 
least using the file system's block size, no?

I ask because some months ago I wrote a needed to parse a 6GB file and 
"read...until CR" was slower than I preferred so I experimented with a 
complicated routine that reads into a buffer of about 128k and then 
parses the buffer.

If I can turn up the code it may be mildly interesting, but the main 
question it raised for me was:

Given that the engine is probably already doing pretty much the same 
thing, would it make sense to consider a readBufferSize global property 
which would govern the size of the buffer the engine uses when executing 
"read...until <char>"?

In my experiments I was surprised to find that larger buffers (>10MB) 
were slower than "read...until <char>", but the sweet spot seemed to be 
around 128k.  Presumably this has to do with the overhead of allocating 
contiguous memory, and if you have any insights on that it would be 
interesting to learn more.

I recognize this sort of things may seem like mere performance 
fetishism, but I believe this has useful application for making LC an 
ever better solution for working with large amounts of data.

Pretty much any program will read big files in chunks, and if LC can do 
so optimally with all the grace and ease of "read...until <char>" it 
makes one more strong set of use cases where choosing LC isn't a 
tradeoff but an unquestionable advantage.

-- 
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  Ambassador at FourthWorld.com                http://www.FourthWorld.com




More information about the use-livecode mailing list