dot POS files and Corpus Linguistics
stephenREVOLUTION2 at barncard.com
Tue Apr 27 14:03:00 CDT 2010
Richmond, it appears that .pos files are LOTUS NOTES, among many others
On 27 April 2010 11:04, Richmond Mathewson <richmondmathewson at gmail.com>wrote:
> Well, Yippee-doo; the good folks at the University of
> Oxford have sent me the files of the
> York-Toronto-Helsinki Parsed Corpus of Old English Prose
> (try saying that with your mouth full of cornflakes).
> Jolly generous considering it is normally restricted to British
> Higher Education Institutions (somehow the University of
> Plovdiv, Paisii Hilendarski doesn't fit in that category).
> HOWEVER; the corpus comes in .pos files whcih cheeses me
> off immensely; on opening them with the redoubtable
> TextWrangler they are heavily formatted in some odd fashion
> suggesting some sort of meta-tagging.
> The Java-based CS_2.002.74.jar, a.k.a 'CorpusSearch' doesn't run
> for some funny reason on ye olde G4 (have yet to try it on the
> Ubu-Box); but that doesn't really fuss me as ye olde academics
> have decided the parameters of their stuff in advance and my feet
> are too big for their shoes (hey; it's mixed metaphors time again).
> So; I am looking to build a Runrev data-miner / chewer / masticator
> / whatever; but, until I can work out what a .pos file can be opened with
> (so I can hae a keek at its formatin) the whole thing is on standby.
> Once I can see what a .pos file should look like in some sort of POS-file
> reader I can cobble together a suitably algorithmic sieve to make the
> file look like it should inside a text field prior to 'chewin the fat'.
> Google comes up with unintentionally witty results about 'point of sale'
> and so forth, as well as something about Arabic linguistic corpora,
> Chinese linguistic corpora and so forth (well, at least they are going
> in the right direction).
> Having written one of those slimy messages back, where one thanks people
> fulsomely and then shoves in the 'However'; I got a "we cannot comment on
> other methods of accessing the corpus" message. Well; at least I signed my
> name with
> my second name (Richmond) otherwise I would have had what the Americans
> a 'Dear John' message . . . :)
> Any help re POS-file readers would be most welcome.
> sincerely, Richmond Mathewson.
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
Back home in SF
More information about the use-livecode