[OT] Text analysis and author, anyone done it?

Peter Brigham MD pmbrig at gmail.com
Sat Jun 25 11:18:12 EDT 2011

On Jun 24, 2011, at 11:46 PM, Peter Alcibiades wrote:

> It can be done statistically. Various methods have been proposed and used. 
> One general kind of measure is the probability of another word coming, as a
> function of the past n words.  Another is to measure the length of gap
> between occurrences of pairs of a given word.  There is technical literature
> on it, and I guess LC would permit writing something to do it.  Not that its
> the best thing to do it in, that seems to be R, but its what I know.
> But it would be nice if someone had already done it, in any language.  Save
> a huge lot of work.
> Peter

Don't know if anyone has already tackled this kind of thing in LC, but it should be fairly easy to do. (Whether the algorithms actually work to distinguish different authors is something I know nothing about.) The gap between pairs of a given word, in particular, is nearly trivial. The question would be speed, and since LC is blindingly fast at processing text strings, I'd be optimistic about that, unless you're talking really huge texts.

-- Peter

Peter M. Brigham
pmbrig at gmail.com

More information about the Use-livecode mailing list