[Ticket#: 2006040510000641] Re: [OT] Articles to read

Marielle Lange mlange at lexicall.org
Wed Apr 12 12:32:48 EDT 2006


Hi Mark,

> Wow!  I did not realize I even posted this many messages.  While  
> this is a low number, I thought I was more quite [I assumed  
> quieter] than that.  How did you derive this information?

That's over a year period, that makes between 1 mail every 2-3 days.

I wrote the following awk script. Basically, it looks for a line  
starting with "From:  ..." that is right after a line starting with  
"From ..." (if you download an archive file you will see that this  
systematically and unambiguously corresponds to the start of a post).

{
	c++; if (c > 1) { exit }

	filelist = "2005-April.txt,2005-August.txt,2005-December.txt,2005- 
July.txt,2005-June.txt,2005-May.txt,2005-November.txt,2005- 
October.txt,2005-September.txt,2006-February.txt,2006-January.txt, 
2006-March.txt"
	split(filelist,afiles,",")
	for (i in afiles) {
	 	print afiles[i]
		while (getline < afiles[i]) {
			if (lineB4 == 1 && $0 ~ /^From: /) {
				gsub("\"", "", $0)
				gsub("^From:[\t ]*", "", $0)
				gsub(" at ", "@", $0)
				frFrom[$0]++
			}
			lineB4 = 0
			if ($0 ~ /^From /) {
   		              lineB4 = 1
			}
		}
	}
}

Because over a year period some of us have changed of email, I have  
added a synonym system, where if a synonym exists
	# Andre Garzia
	synonym["agarzia at hidden (Andre Garzia)"] = "soapdog at hidden (Andre  
Garzia)"

I checked for synomyms by sorting on the (Andre Garzia) part and the  
name part of the email, with alerts for duplicates (used excel for  
this with if(B2=B1; "!!!", ""). I was particularly careful about this  
for the 20 first contributors on the list.

Once I was satisfied to have declared all alternative emails for a  
given person, I executed the program again. Loop through the frFrom  
to have the synonym's count being added to the one of the main term.  
Then looped again through the frFrom array and printed out.

The "||||" representation is obtained with rept("|", frFrom[key]), in  
excel.

If you need a coder to hire for medium to complex parsing problems,  
take contact ;-).

Marielle


------------------------------------------------------------------------ 
--------
Marielle Lange (PhD),  Psycholinguist

Alternative emails: mlange at blueyonder.co.uk,

Homepage                                                            
http://homepages.widged.com/mlange/
Easy access to lexical databases                    http:// 
lexicall.widged.com/
Supporting Education Technologists              http:// 
revolution.widged.com/wiki/





More information about the use-livecode mailing list