ANN Daily Crytoquote--my misspelling

Marielle Lange M.Lange at ed.ac.uk
Thu Jun 23 13:54:35 EDT 2005


>What an enterprising person (not me) would do is take the text of
>several books and create a fequency-of-occurence list using Scott's
>algorithm, and then delete all words in the dictionary which don't
>have the necessary frequency.

Jim,

Excellent suggestion. In fact, it is exactly what the guys from the MRC database
have done, back in 1981. This work has been repeated by different teams more
than 20 times ;-). The best resource in this area being Celex (English, Dutch,
German) -- web interface at: http://www.mpi.nl/world/celex/, but more complex
to use than the trick I gave you.

Okay, I don't use my academic signature on this list. Okay, I avoid to play it;
sometimes I even play it dumb. Seriously, you have one of the world experts in
lexical databases on this list. Don't be an entreprising person... ask. I have
2GB of lexical databases and more than 200 scripts to extract all informations
you can think of from these lexical databases on my computer. This is my job.

If you don't seem to need more than a not too long list of words, I provide no
more than that. If you need a solution to any other problem, I either aleady
have it or know where you can find what you need.

If you want to write an application for kids and limit the words you use to ones
that are understood by kids of a given age, I can provide this (this is called
Age of Acquisition). If you want to only select easy or difficult to imagine
words, to design a pictionary-like game (with some items being easy or
difficult to draw), I can provide this (this is called imageability or
meaningfullness). If you need words that are part of a same semantic category
(for instance clothing items), I can provide this (check out wordnet
http://wordnet.princeton.edu/). If  you need a list of homophones, homographs,
synonyms, etc. I have that on my computer. If you want to write a rhyming book
or an application to help kids learn reading (phonics method), I can provide
you with the full lists of words which have a specific letter-sound
relationship in them:
http://www.psy.unsw.edu.au/Users/mlange/GPC/GPC_EN/GPC.html
(click on one line on the left, you will have all the words that contain that
grapheme-phoneme relationship in them -- a grapheme is a unit of spelling that
matches a unit of sound, like ai in pain)

It's just that I am an academic. My job security and promotion prospects depend
on the papers I publish. I am not supposed to spend any of my time sharing these
resources or even knowledge about these resources more largely :-/. I mean, I am
encouraged by existing european funding etc., but good researchers know they
need to avoid to spend time doing something that doesn't lead to a publication
in a well-ranked journal.

So, for about 4 years now, I have had about a meter high pile with a printout of
page 1 of any website that contains information about words. Because  I am an
academic, this remains in my office. Yes, I find it stupid too. I find that
even more stupid when I get to read papers published by colleagues who would
have done research of better quality if they had known about some of these
resources.

Worse, as a brilliant academic I am just about to submit a big, thick paper
which demonstrates that in my field, we have been for the last 20 years
providing solutions sometimes simple, sometimes elaborate to a wrongly
specified problem (in short, we have been studying one-syllable words only;
models efficient at reading one-syllable words are not at two syllable words --
it's more complex than speech synthesis, this is about integrating findings from
patients with brain damage, accounting for various word properties like meaning,
explaining the learning of reading, explaining second language acquisition,
etc., etc.). In short, a better analysis is needed but this would require
skills and tools that only about 10-20% of my colleagues have.

Just a year of funding and what could be done!!! Ok, I say goodbye to a
promising career. That's fine by me. I do more for the advancement of my field
and possibly of science than I would ever be able to do on the brilliant
academic career path.

        Nothing in the world is as soft and yielding as water,
        Yet nothing can better overcome the hard and strong,
        For they can neither control nor do away with it.

        The soft overcomes the hard,
        The yielding overcomes the strong;
        Every person knows this,
        But no one can practice it.

        Who attends to the people would control the land and grain;
        Who attends to the state would control the whole world;
        Truth is easily hidden by rhetoric.

        (From  the Tao Te Ching)

-------------------------------------------------------------------------------
Marielle Lange (PhD),  Psycholinguistics, Lecturer in Psychology and Informatics
University of Edinburgh, UK

Homepage:  http://homepages.inf.ed.ac.uk/mlange/
Lexicall project: http://lexicall.org
Revolution-education project: http://revolution.lexicall.org



More information about the use-livecode mailing list