dot POS files and Corpus Linguistics

stephen barncard stephenREVOLUTION2 at barncard.com
Tue Apr 27 14:03:00 CDT 2010


Richmond, it appears that .pos files are LOTUS NOTES,  among many others

http://file-extension.net/seeker/file_extension_pos

http://filext.com/file-extension/POS

http://en.wikipedia.org/wiki/Lotus_Notes

http://www.computerfileextensions.com/file-extensions.php/POS

FILE FORMAT:
http://www.x-ways.net/winhex/POS_Format_2_0.html



On 27 April 2010 11:04, Richmond Mathewson <richmondmathewson at gmail.com>wrote:

>  Well, Yippee-doo; the good folks at the University of
> Oxford have sent me the files of the
> York-Toronto-Helsinki Parsed Corpus of Old English Prose
> (try saying that with your mouth full of cornflakes).
>
> Jolly generous considering it is normally restricted to British
> Higher Education Institutions (somehow the University of
> Plovdiv, Paisii Hilendarski doesn't fit in that category).
>
> HOWEVER; the corpus comes in .pos files whcih cheeses me
> off immensely; on opening them with the redoubtable
> TextWrangler they are heavily formatted in some odd fashion
> suggesting some sort of meta-tagging.
>
> The Java-based CS_2.002.74.jar, a.k.a 'CorpusSearch' doesn't run
> for some funny reason on ye olde G4 (have yet to try it on the
> Ubu-Box); but that doesn't really fuss me as ye olde academics
> have decided the parameters of their stuff in advance and my feet
> are too big for their shoes (hey; it's mixed metaphors time again).
>
> So; I am looking to build a Runrev data-miner / chewer / masticator
> / whatever; but, until I can work out what a .pos file can be opened with
> (so I can hae a keek at its formatin) the whole thing is on standby.
> Once I can see what a .pos file should look like in some sort of POS-file
> reader I can cobble together a suitably algorithmic sieve to make the
> file look like it should inside a text field prior to 'chewin the fat'.
>
> Google comes up with unintentionally witty results about 'point of sale'
> and so forth, as well as something about Arabic linguistic corpora,
> Chinese linguistic corpora and so forth (well, at least they are going
> in the right direction).
>
> Having written one of those slimy messages back, where one thanks people
> fulsomely and then shoves in the 'However'; I got a "we cannot comment on
> other methods of accessing the corpus" message. Well; at least I signed my
> name with
> my second name (Richmond) otherwise I would have had what the Americans
> call
> a 'Dear John' message . . .  :)
>
> Any help re POS-file readers would be most welcome.
>
> sincerely, Richmond Mathewson.
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>



-- 
-------------------------
Stephen Barncard
Back home in SF



More information about the use-livecode mailing list