[OT] Text analysis and author, anyone done it?

Peter Alcibiades palcibiades-first at yahoo.co.uk
Fri Jul 1 03:27:48 EDT 2011


I do think its possible, and has actually been done successfully.  The Bible
is a difficult case since we don't have value free assessments of
authorship.  Consequently it is reasonable to argue that what the programs
do is successfully implement the prejudices of their authors.  

However, when we apply this to Dickens, and then ask whether the various
completions of Edwin Drood were completed by him, and we apply it to Jane
Austen and ask whether the software shows the same person to have written
the works of Austen and Fanny Burney, we are dealing with definitely known
authorship, so we can assume that if the algorithms discriminate correctly
in these cases they will probably work on other material where authorship is
unknown.

The case which I'm looking to apply this to is a bit more like the literary
case.  There a number of texts of which the authorship is definitely known
and not subject to dispute.  There is then one text whose authorship is
unknown.  The question is whether it is probably by one of the known
authors.  

We do also have a case like the Biblical case - where there are texts under
one signature that we suspect to have come from more than one author, and
perhaps from the author of the text of primary interest.  It would be nice
to be able to discriminate between authors in this body of work as well.

Thanks for the very helpful references.  Its early days yet, but there is no
reason why statistical analysis should not illuminate this question, and
there are some promising leads.  Certainty is not to be expected, but
statistical text analysis is definitely one weapon in the armory.

I've no views on Paul and Hebrews.  Whether an author really does write in
statistically different ways depending on audience, well, its an empirical
question, haven't come across any studies.  Yes, the working assumption in
the stats is that they don't.  One could probably tell by looking at the
work of some prolific authors who have published in different segments under
different names.  There are such cases.

--
View this message in context: http://runtime-revolution.278305.n4.nabble.com/Re-Text-analysis-and-author-anyone-done-it-tp3636729p3637660.html
Sent from the Revolution - User mailing list archive at Nabble.com.




More information about the use-livecode mailing list