Jane Austen's peculiarity
Peter M. Brigham
pmbrig at gmail.com
Sat Aug 8 13:48:39 EDT 2015
On Aug 8, 2015, at 12:42 PM, Richmond wrote:
> Jane Austen [amongst others] uses an interesting type of grammatical construction of this sort:
>
> After breakfast, the girls walked to Meryton to inquire if Mr. Wickham
> _were returned_, and to lament over his absence from the Netherfield ball.
>
> Pride and Prejudice.
>
> I would like to analyse a million word corpus that I have been granted access to for this type of construction.
>
> However, I don't want to find examples of only 'were returned', but all examples of
>
> were + infinitive / preterite / past participle
>
> and, presumably for that I shall have to use wildcards . . .
>
> OR ???
I'll leave it to those who speak Regex to suggest a wildcard solution. Here's another one (not tested) that will catch past participles ending in "ed". Not sure how this will scale with large texts:
function findWere pText
-- returns a comma-delim list of all the word offsets matching "were *ed"
put wordOffsets("were", pText, true) into offList
repeat for each item w in offList
put word w+1 of pText into testWord
if testWord ends with "ed" then put w & comma after outList
end repeat
return item 1 to -1 of outList
end if
function wordOffsets str, pContainer, matchWhole
-- returns a comma-delimited list of all the wordOffsets of str in pContainer
-- if matchWhole = true then only whole words are located
-- else will find word matches everywhere str is part of a word in pContainer
-- note that in LC words will include adjacent puncutation,
-- so using matchWhole = true may exclude too many "words"
-- duplicates are stripped out
-- eg wordOffsets("co","the common coconut") = 2,3 not 2,3,3
-- note: to get the last wordOffset of a string in a container (often useful)
-- use "item -1 of wordOffsets(...)"
-- by Peter M. Brigham, pmbrig at gmail.com — freeware
-- requires offsets()
if matchWhole = empty then put false into matchWhole
put offsets(str,pContainer) into offList
if offList = 0 then return 0
repeat for each item i in offList
put the number of words of (char 1 to i of pContainer) into wdNbr
if matchWhole then
if word wdNbr of pContainer <> str then next repeat
end if
put 1 into A[wdNbr]
-- using an array avoids duplicates
end repeat
put the keys of A into wordList
sort lines of wordList ascending numeric
replace cr with comma in wordList
return wordList
end wordOffsets
function offsets str, pContainer
-- returns a comma-delimited list of all the offsets of str in pContainer
-- returns 0 if not found
-- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
-- ie, overlapping offsets are not counted
-- note: to get the last occurrence of a string in a container (often useful)
-- use "item -1 of offsets(...)"
-- by Peter M. Brigham, pmbrig at gmail.com — freeware
if str is not in pContainer then return 0
put 0 into startPoint
repeat
put offset(str,pContainer,startPoint) into thisOffset
if thisOffset = 0 then exit repeat
add thisOffset to startPoint
put startPoint & comma after offsetList
add length(str)-1 to startPoint
end repeat
return item 1 to -1 of offsetList -- delete trailing comma
end offsets
P.S. I love Jane Austen. One of my favorite books of all time is "Pride and Prejudice." It's so beautifully constructed.
-- Peter
Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig
More information about the use-livecode
mailing list