concordance of RR stack

Leraillez Benoit Leraillez at netinfo.fr
Tue Feb 24 13:01:54 EST 2004


Le 24/02/04 15:59, « Jose L. Rodriguez Illera » <jlrodrig at ariadna.d5.ub.es>
a écrit :

> I have a pseudo-concordance tool that calculates the word index, frecuency
> and lenght from a text. Then, you may select any word and it calculates all
> its contexts with 5 words before and after. If you select one of them it
> finds the page where it appears and select the word. May be need small
> tunning. I do not know if it is as fast as FreeText, but it is enough for
> most uses. 

  If you're interested we have a text analysis code that brings strings to
their dictionary entries (singular or infinitive form). It does so by first
going through a grammar analysis of the phrase to tell if a string is a verb
or noun or... (like "end" in English or "partie" in French, Spanish and
German version are in the works). We use it to index documents on what they
"talk about" purging text of redundant information necessary for human to
human comprehension (I love to read Shannon & Weaver ;-).

  For example the preceding paragraph could be resumed to only nouns due to
it's length and still keep its subject for future search: text analysis code
string dictionary form grammar phrase verb noun end English French Spanish
German version work document information human comprehension Shannon Weaver.
The only meaningful strings missing when you limit to only nouns are verbs
and adjectives which in the example would have been "singular, infinitive
and index" but you also leave out "be, have, love, read, like, go..." that
are rarely searched for, so it is the master's decision to know what he
wants to index.

  Another thing concerning French nouns, a lot don¹t have an "s" for a
plural form but a totally different string. And bringing everything back to
singular form assures that people looking for an "oeil" (eye) will also find
documents about "yeux" (eyes).

-- 
Benoît Leraillez

Souvenez-vous qu'on peut être hermétique -- et ne rien renfermer. N'oubliez
pas qu'hermétique cela veut dire aussi « bouché » ! (S. Guitry)



More information about the use-livecode mailing list