[Ticket#: 2006040510000641] Re: [OT] Articles to read
Marielle Lange
mlange at lexicall.org
Wed Apr 12 12:32:48 EDT 2006
Hi Mark,
> Wow! I did not realize I even posted this many messages. While
> this is a low number, I thought I was more quite [I assumed
> quieter] than that. How did you derive this information?
That's over a year period, that makes between 1 mail every 2-3 days.
I wrote the following awk script. Basically, it looks for a line
starting with "From: ..." that is right after a line starting with
"From ..." (if you download an archive file you will see that this
systematically and unambiguously corresponds to the start of a post).
{
c++; if (c > 1) { exit }
filelist = "2005-April.txt,2005-August.txt,2005-December.txt,2005-
July.txt,2005-June.txt,2005-May.txt,2005-November.txt,2005-
October.txt,2005-September.txt,2006-February.txt,2006-January.txt,
2006-March.txt"
split(filelist,afiles,",")
for (i in afiles) {
print afiles[i]
while (getline < afiles[i]) {
if (lineB4 == 1 && $0 ~ /^From: /) {
gsub("\"", "", $0)
gsub("^From:[\t ]*", "", $0)
gsub(" at ", "@", $0)
frFrom[$0]++
}
lineB4 = 0
if ($0 ~ /^From /) {
lineB4 = 1
}
}
}
}
Because over a year period some of us have changed of email, I have
added a synonym system, where if a synonym exists
# Andre Garzia
synonym["agarzia at hidden (Andre Garzia)"] = "soapdog at hidden (Andre
Garzia)"
I checked for synomyms by sorting on the (Andre Garzia) part and the
name part of the email, with alerts for duplicates (used excel for
this with if(B2=B1; "!!!", ""). I was particularly careful about this
for the 20 first contributors on the list.
Once I was satisfied to have declared all alternative emails for a
given person, I executed the program again. Loop through the frFrom
to have the synonym's count being added to the one of the main term.
Then looped again through the frFrom array and printed out.
The "||||" representation is obtained with rept("|", frFrom[key]), in
excel.
If you need a coder to hire for medium to complex parsing problems,
take contact ;-).
Marielle
------------------------------------------------------------------------
--------
Marielle Lange (PhD), Psycholinguist
Alternative emails: mlange at blueyonder.co.uk,
Homepage
http://homepages.widged.com/mlange/
Easy access to lexical databases http://
lexicall.widged.com/
Supporting Education Technologists http://
revolution.widged.com/wiki/
More information about the use-livecode
mailing list