Looking for parser for Email (MIME)

Mark Waddingham mark at livecode.com
Tue Mar 22 09:16:20 EDT 2016


On 2016-03-22 12:45, Roland Huettmann wrote:
> How to know how much we can read into memory? Is there any function to 
> know
> this? Is there a size limit for variables?

LiveCode has a limit of 2Gb characters for strings but that depends on 
how much memory a single process can have on your system.

On 32-bit systems, you're generally limited to 768Mb-1Gb contiguous 
block of memory (32-bit Windows has an address space of 3Gb for a user 
process which also has to include all mapped resources such as 
executables and shared libraries; Mac has a user process address space 
of 4Gb which also has to include all mapped resources so you can 
generally get up to around 1.5Gb contiguous allocated memory block).

On 64-bit systems then you should be able to many 2Gb strings (or 
similar in LiveCode), although obviously how fast they will operate will 
depend on the amount of physical ram in the machine - disk paged virtual 
memory taking up the slack).

> It is not possible to read backwards - which could be a nice way 
> reading a
> file in some special cases. So "read from file fName at eof until 
> -1000"
> does not work.

Well, reading backwards in that way is equivalent to knowing how long 
the file is:

    read ... at -1000 until EOF

is the same as

    read ... at (fileSize - 1000) until EOF

> So, the only way reading very large file is reading a chunk of data of 
> n
> bytes (whatever is allowed in memory), processing this, and then 
> reading
> the next chunk until the remaining part of the file is small enough to 
> be
> read until eof.

For such a large file (38gb) your only solution is to read and parse it 
in chunks. MBOX files are a sequence of records, so you need to use a 
process which reads in blocks from the file when there is not enough 
data left to find the current record boundary - that way you only load 
into memory (at any one time) enough of the file to process completely 
the next record.

In terms of finding the size of a file in LiveCode you can use 'the 
detailed files'.

It is worth pointing out that using 'open file' and 'read from file' are 
*stream* based in approach. From memory, the MBOX format is essentially 
line-based, so you should be able to write a relatively simple parsing 
loop with that in mind:

open file ...
repeat forever
   read from file ... until return
   if the result is not empty then
     exit repeat
   end if
   if *it is a new message boundary* then
     ... finish processing current message ...
     ... start processing new boundary ...
   else
     ... append line to current message ...
   end if
end repeat

Of course, one thing to bear in mind, is that with a 38Gb file you are 
never going to fit all of that into memory; so the best approach would 
probably be to parse your mail messages and then store them into a 
storage scheme which doesn't require everything to appear in memory at 
once - e.g. an sqlite db or a more traditional dbms, or even lots of 
discrete files in a filesystem in some suitable hierarchy.

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list