dot POS files and Corpus Linguistics

stephen barncard stephenREVOLUTION2 at
Tue Apr 27 15:03:00 EDT 2010

Richmond, it appears that .pos files are LOTUS NOTES,  among many others


On 27 April 2010 11:04, Richmond Mathewson <richmondmathewson at>wrote:

>  Well, Yippee-doo; the good folks at the University of
> Oxford have sent me the files of the
> York-Toronto-Helsinki Parsed Corpus of Old English Prose
> (try saying that with your mouth full of cornflakes).
> Jolly generous considering it is normally restricted to British
> Higher Education Institutions (somehow the University of
> Plovdiv, Paisii Hilendarski doesn't fit in that category).
> HOWEVER; the corpus comes in .pos files whcih cheeses me
> off immensely; on opening them with the redoubtable
> TextWrangler they are heavily formatted in some odd fashion
> suggesting some sort of meta-tagging.
> The Java-based CS_2.002.74.jar, a.k.a 'CorpusSearch' doesn't run
> for some funny reason on ye olde G4 (have yet to try it on the
> Ubu-Box); but that doesn't really fuss me as ye olde academics
> have decided the parameters of their stuff in advance and my feet
> are too big for their shoes (hey; it's mixed metaphors time again).
> So; I am looking to build a Runrev data-miner / chewer / masticator
> / whatever; but, until I can work out what a .pos file can be opened with
> (so I can hae a keek at its formatin) the whole thing is on standby.
> Once I can see what a .pos file should look like in some sort of POS-file
> reader I can cobble together a suitably algorithmic sieve to make the
> file look like it should inside a text field prior to 'chewin the fat'.
> Google comes up with unintentionally witty results about 'point of sale'
> and so forth, as well as something about Arabic linguistic corpora,
> Chinese linguistic corpora and so forth (well, at least they are going
> in the right direction).
> Having written one of those slimy messages back, where one thanks people
> fulsomely and then shoves in the 'However'; I got a "we cannot comment on
> other methods of accessing the corpus" message. Well; at least I signed my
> name with
> my second name (Richmond) otherwise I would have had what the Americans
> call
> a 'Dear John' message . . .  :)
> Any help re POS-file readers would be most welcome.
> sincerely, Richmond Mathewson.
> _______________________________________________
> use-revolution mailing list
> use-revolution at
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:

Stephen Barncard
Back home in SF

More information about the Use-livecode mailing list