[Ticket#: 2006040510000641] Re: [OT] Articles to read

Alex Tweedly alex at tweedly.net
Wed Apr 12 16:12:44 EDT 2006


Marielle Lange wrote:

>
> I wrote the following awk script. Basically, it looks for a line  
> starting with "From:  ..." that is right after a line starting with  
> "From ..." (if you download an archive file you will see that this  
> systematically and unambiguously corresponds to the start of a post).
>
It's actually easier than that .... the archives are in mbox format, so 
the start of a message is unambiguously marked by a line which begins 
"From ...."  (Any line within a message that starts with these 5 letters 
must be modified - usually by prepending with ">" before that line can 
be put into an mbox file).

There is no guarantee that the From: line will immediately follow it; 
although it seems that it does in these archives, other versions of mbox 
format will put the From: lines later in the header section. But it 
doesn't matter - the "From " line is in itself unambiguous, and carries 
all the info we need (except for the synonyms for changed names).

I'd handle the synonyms by having an array (for example)
  put "soapdog at hidden" into  tMainAlias["agarzia at hidden"]
(NB only need to do this for those which have synonyms).

So then we have

>     set the caseSensitive to true
>     repeat for each line tFile in tFiles
>         repeat for each line L in URL ("file:" & tFile)
>             if char 1 to 5 of L = "From " then
>                 put word 2 to 4 of L into t
>                 replace " at " with "@" in t
>                 if tMainAlias[t] is not empty then put tMainAlias[t]  
> into t
>                 add 1 to tArray[t]
>             end if
>         end repeat
>     end repeat
>     put empty into tSubmitters
>     repeat for each line L in the keys of tArray
>         put L && tArray[L] & cr after tSubmitters
>     end repeat
>     sort lines of tSubmitters descending numeric by word 2 of each
>     put tSubmitters after msg
>     repeat for each line L in tSubmitters
>         put word 1 of L && TAB && bars(word 2 of L) & CR after field 
> "Field 1"
>     end repeat
>     
> end mouseUp
>
> function bars pN
>     repeat pN times
>         put "|" after t
>     end repeat
>     return t
> end bars
>
There - one simple solution in Rev rather than using awk and Excel :-)

-- 
Alex Tweedly       http://www.tweedly.net



-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.385 / Virus Database: 268.4.1/309 - Release Date: 11/04/2006




More information about the use-livecode mailing list