Jane Austen's peculiarity

Peter M. Brigham pmbrig at gmail.com
Sat Aug 8 19:48:39 CEST 2015

On Aug 8, 2015, at 12:42 PM, Richmond wrote:

> Jane Austen [amongst others] uses an interesting type of grammatical construction of this sort:
> After breakfast, the girls walked to Meryton to inquire if Mr. Wickham
> _were returned_, and to lament over his absence from the Netherfield ball.
> Pride and Prejudice.
> I would like to analyse a million word corpus that I have been granted access to for this type of construction.
> However, I don't want to find examples of only 'were returned', but all examples of
> were + infinitive / preterite / past participle
> and, presumably for that I shall have to use wildcards . . .
> OR ???

I'll leave it to those who speak Regex to suggest a wildcard solution. Here's another one (not tested) that will catch past participles ending in "ed". Not sure how this will scale with large texts:

function findWere pText
   -- returns a comma-delim list of all the word offsets matching "were *ed"
   put wordOffsets("were", pText, true) into offList
   repeat for each item w in offList
      put word w+1 of pText into testWord
      if testWord ends with "ed" then put w & comma after outList
   end repeat
   return item 1 to -1 of outList
end if

function wordOffsets str, pContainer, matchWhole
   -- returns a comma-delimited list of all the wordOffsets of str in pContainer
   -- if matchWhole = true then only whole words are located
   --    else will find word matches everywhere str is part of a word in pContainer
   --    note that in LC words will include adjacent puncutation,
   --       so using matchWhole = true may exclude too many "words"
   -- duplicates are stripped out
   --    eg wordOffsets("co","the common coconut") = 2,3   not   2,3,3
   -- note: to get the last wordOffset of a string in a container (often useful)
   --    use "item -1 of wordOffsets(...)"
   -- by Peter M. Brigham, pmbrig at gmail.com — freeware
   -- requires offsets()
   if matchWhole = empty then put false into matchWhole
   put offsets(str,pContainer) into offList
   if offList = 0 then return 0
   repeat for each item i in offList
      put the number of words of (char 1 to i of pContainer) into wdNbr
      if matchWhole then
         if word wdNbr of pContainer <> str then next repeat
      end if
      put 1 into A[wdNbr]
      -- using an array avoids duplicates
   end repeat
   put the keys of A into wordList
   sort lines of wordList ascending numeric
   replace cr with comma in wordList
   return wordList
end wordOffsets

function offsets str, pContainer
   -- returns a comma-delimited list of all the offsets of str in pContainer
   -- returns 0 if not found
   -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
   --     ie, overlapping offsets are not counted
   -- note: to get the last occurrence of a string in a container (often useful)
   --     use "item -1 of offsets(...)"
   -- by Peter M. Brigham, pmbrig at gmail.com — freeware
   if str is not in pContainer then return 0
   put 0 into startPoint
      put offset(str,pContainer,startPoint) into thisOffset
      if thisOffset = 0 then exit repeat
      add thisOffset to startPoint
      put startPoint & comma after offsetList
      add length(str)-1 to startPoint
   end repeat
   return item 1 to -1 of offsetList -- delete trailing comma
end offsets

P.S. I love Jane Austen. One of my favorite books of all time is "Pride and Prejudice." It's so beautifully constructed.

-- Peter

Peter M. Brigham
pmbrig at gmail.com

More information about the use-livecode mailing list