Looking for parser for Email (MIME)
Mark Waddingham
mark at livecode.com
Tue Mar 22 09:16:20 EDT 2016
On 2016-03-22 12:45, Roland Huettmann wrote:
> How to know how much we can read into memory? Is there any function to
> know
> this? Is there a size limit for variables?
LiveCode has a limit of 2Gb characters for strings but that depends on
how much memory a single process can have on your system.
On 32-bit systems, you're generally limited to 768Mb-1Gb contiguous
block of memory (32-bit Windows has an address space of 3Gb for a user
process which also has to include all mapped resources such as
executables and shared libraries; Mac has a user process address space
of 4Gb which also has to include all mapped resources so you can
generally get up to around 1.5Gb contiguous allocated memory block).
On 64-bit systems then you should be able to many 2Gb strings (or
similar in LiveCode), although obviously how fast they will operate will
depend on the amount of physical ram in the machine - disk paged virtual
memory taking up the slack).
> It is not possible to read backwards - which could be a nice way
> reading a
> file in some special cases. So "read from file fName at eof until
> -1000"
> does not work.
Well, reading backwards in that way is equivalent to knowing how long
the file is:
read ... at -1000 until EOF
is the same as
read ... at (fileSize - 1000) until EOF
> So, the only way reading very large file is reading a chunk of data of
> n
> bytes (whatever is allowed in memory), processing this, and then
> reading
> the next chunk until the remaining part of the file is small enough to
> be
> read until eof.
For such a large file (38gb) your only solution is to read and parse it
in chunks. MBOX files are a sequence of records, so you need to use a
process which reads in blocks from the file when there is not enough
data left to find the current record boundary - that way you only load
into memory (at any one time) enough of the file to process completely
the next record.
In terms of finding the size of a file in LiveCode you can use 'the
detailed files'.
It is worth pointing out that using 'open file' and 'read from file' are
*stream* based in approach. From memory, the MBOX format is essentially
line-based, so you should be able to write a relatively simple parsing
loop with that in mind:
open file ...
repeat forever
read from file ... until return
if the result is not empty then
exit repeat
end if
if *it is a new message boundary* then
... finish processing current message ...
... start processing new boundary ...
else
... append line to current message ...
end if
end repeat
Of course, one thing to bear in mind, is that with a 38Gb file you are
never going to fit all of that into memory; so the best approach would
probably be to parse your mail messages and then store them into a
storage scheme which doesn't require everything to appear in memory at
once - e.g. an sqlite db or a more traditional dbms, or even lots of
discrete files in a filesystem in some suitable hierarchy.
Warmest Regards,
Mark.
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list