Jane Austen's peculiarity

Richmond richmondmathewson at gmail.com
Sat Aug 8 13:56:55 EDT 2015


On 08/08/15 20:48, Peter M. Brigham wrote:
> On Aug 8, 2015, at 12:42 PM, Richmond wrote:
>
>> Jane Austen [amongst others] uses an interesting type of grammatical construction of this sort:
>>
>> After breakfast, the girls walked to Meryton to inquire if Mr. Wickham
>> _were returned_, and to lament over his absence from the Netherfield ball.
>>
>> Pride and Prejudice.
>>
>> I would like to analyse a million word corpus that I have been granted access to for this type of construction.
>>
>> However, I don't want to find examples of only 'were returned', but all examples of
>>
>> were + infinitive / preterite / past participle
>>
>> and, presumably for that I shall have to use wildcards . . .
>>
>> OR ???
> I'll leave it to those who speak Regex to suggest a wildcard solution. Here's another one (not tested) that will catch past participles ending in "ed".

Looks good; however, I am really looking for ALL preterites; such as 
'become', so your 'ed' trap won't catch that.

I am wondering about using a listField of all the preterites that I am 
looking for.

> Not sure how this will scale with large texts:
>
> function findWere pText
>     -- returns a comma-delim list of all the word offsets matching "were *ed"
>     put wordOffsets("were", pText, true) into offList
>     repeat for each item w in offList
>        put word w+1 of pText into testWord
>        if testWord ends with "ed" then put w & comma after outList
>     end repeat
>     return item 1 to -1 of outList
> end if
>
> function wordOffsets str, pContainer, matchWhole
>     -- returns a comma-delimited list of all the wordOffsets of str in pContainer
>     -- if matchWhole = true then only whole words are located
>     --    else will find word matches everywhere str is part of a word in pContainer
>     --    note that in LC words will include adjacent puncutation,
>     --       so using matchWhole = true may exclude too many "words"
>     -- duplicates are stripped out
>     --    eg wordOffsets("co","the common coconut") = 2,3   not   2,3,3
>     -- note: to get the last wordOffset of a string in a container (often useful)
>     --    use "item -1 of wordOffsets(...)"
>     -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>     -- requires offsets()
>     
>     if matchWhole = empty then put false into matchWhole
>     put offsets(str,pContainer) into offList
>     if offList = 0 then return 0
>     repeat for each item i in offList
>        put the number of words of (char 1 to i of pContainer) into wdNbr
>        if matchWhole then
>           if word wdNbr of pContainer <> str then next repeat
>        end if
>        put 1 into A[wdNbr]
>        -- using an array avoids duplicates
>     end repeat
>     put the keys of A into wordList
>     sort lines of wordList ascending numeric
>     replace cr with comma in wordList
>     return wordList
> end wordOffsets
>
> function offsets str, pContainer
>     -- returns a comma-delimited list of all the offsets of str in pContainer
>     -- returns 0 if not found
>     -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
>     --     ie, overlapping offsets are not counted
>     -- note: to get the last occurrence of a string in a container (often useful)
>     --     use "item -1 of offsets(...)"
>     -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>     
>     if str is not in pContainer then return 0
>     put 0 into startPoint
>     repeat
>        put offset(str,pContainer,startPoint) into thisOffset
>        if thisOffset = 0 then exit repeat
>        add thisOffset to startPoint
>        put startPoint & comma after offsetList
>        add length(str)-1 to startPoint
>     end repeat
>     return item 1 to -1 of offsetList -- delete trailing comma
> end offsets
>
> P.S. I love Jane Austen. One of my favorite books of all time is "Pride and Prejudice." It's so beautifully constructed.


Glad to hear that another programmer doesn't spend all their time in 
front of a computer screen!

>
> -- Peter
>
>

Richmond.




More information about the Use-livecode mailing list