[OT] Text analysis and author, anyone done it?

Terry Judd tsj at unimelb.edu.au
Fri Jul 1 05:40:57 EDT 2011


On 01/07/2011 05:27 PM, "Peter Alcibiades" <palcibiades-first at yahoo.co.uk>
wrote:

> 
> The case which I'm looking to apply this to is a bit more like the literary
> case.  There a number of texts of which the authorship is definitely known
> and not subject to dispute.  There is then one text whose authorship is
> unknown.  The question is whether it is probably by one of the known
> authors.  
> 
> We do also have a case like the Biblical case - where there are texts under
> one signature that we suspect to have come from more than one author, and
> perhaps from the author of the text of primary interest.  It would be nice
> to be able to discriminate between authors in this body of work as well.
> 

One fairly simple approach that you could certainly implement in LiveCode
involve compressing (zipping) chunks of text separately and combined and
comparing their lengths. If two chunks of text have a relatively high degree
of similarity then their combined compressed length will be less than for
two equivalent but dissimilar chunks.

So, in the case of authorship, if you have text from 3 known authors and one
unknown author you combine the unknown one with each of the known ones and
compare the zipped length of these combined text to the zipped length of the
3 individual texts. The combined text that has the smallest increase in
length relative to the individual length of its know text is then most
likely to have both texts authored by the same person (I hope that makes
sense).

Terry...

--
Dr Terry Judd | Senior Lecturer in Medical Education
Medical Education Unit
Melbourne Medical School
The University of Melbourne







More information about the use-livecode mailing list