Error in mail list MBOX file format 2012 February

Richard Gaskin ambassador at fourthworld.com
Fri Feb 10 23:20:24 EST 2012


I asked Heather about that and apparently the team is using the list
manager's default settings, so it's not clear what more they can do.

I've looked into this before, and it seems that if you need to parse
it you can do so reliably not by relying on cr&"From" but look instead
for cr&"From:".

For example, if you look at message ID
<1328119050.73544.YahooMailNeo at web65408.mail.ac4.yahoo.com> from this
month's archive you'll see Jan quoted Bob's full post, headers and
all, but the list manager indented the quoted "From:" string by one
space so it won't get confused with actual message boundaries.

A bit non-standard, perhaps, but so far it seems reliable in the
archives I've worked with.

So if you don't mind my asking:  Why are you machine-parsing the
archives?  More Lucene experiments?

-- 
  Richard Gaskin
  Fourth World
  LiveCode training and consulting: http://www.fourthworld.com
  Webzine for LiveCode developers: http://www.LiveCodeJournal.com
  LiveCode Journal blog: http://LiveCodejournal.com/blog.irv




More information about the use-livecode mailing list