Jane Austen's peculiarity

Richmond richmondmathewson at gmail.com
Sat Aug 8 14:55:44 EDT 2015


On 08/08/15 21:18, Peter M. Brigham wrote:
> On Aug 8, 2015, at 1:56 PM, Richmond wrote:
>
>> On 08/08/15 20:48, Peter M. Brigham wrote:
>>> On Aug 8, 2015, at 12:42 PM, Richmond wrote:
>>>
>>>> Jane Austen [amongst others] uses an interesting type of grammatical construction of this sort:
>>>>
>>>> After breakfast, the girls walked to Meryton to inquire if Mr. Wickham
>>>> _were returned_, and to lament over his absence from the Netherfield ball.
>>>>
>>>> Pride and Prejudice.
>>>>
>>>> I would like to analyse a million word corpus that I have been granted access to for this type of construction.
>>>>
>>>> However, I don't want to find examples of only 'were returned', but all examples of
>>>>
>>>> were + infinitive / preterite / past participle
>>>>
>>>> and, presumably for that I shall have to use wildcards . . .
>>>>
>>>> OR ???
>>> I'll leave it to those who speak Regex to suggest a wildcard solution. Here's another one (not tested) that will catch past participles ending in "ed".
>> Looks good; however, I am really looking for ALL preterites; such as 'become', so your 'ed' trap won't catch that.
>>
>> I am wondering about using a listField of all the preterites that I am looking for.
> if you do that then just make the repeat loop as follows:
>     repeat for each item w in offList
>        put word w+1 of pText into testWord
>        if testWord ends with "ed" then put w & comma after outList
>        else if testWord is among the words of fld "preteritesList"
>        then put w & comma after outList
>     end repeat
>
> This will be faster if you put the preteritesList field into a variable before the repeat loop, since it's significantly faster for the engine to access the contents of a variable compared with the contents of a field.

Thanks for that one I've just made a fool of myself using a listField of 
the verb forms and the "thing" is glacially slow.

As soon as the stack has run its course I will implement your suggestion.

Richmond.

>
> -- Peter
>
> Peter M. Brigham
> pmbrig at gmail.com
> http://home.comcast.net/~pmbrig
>
>
>>> Not sure how this will scale with large texts:
>>>
>>> function findWere pText
>>>     -- returns a comma-delim list of all the word offsets matching "were *ed"
>>>     put wordOffsets("were", pText, true) into offList
>>>     repeat for each item w in offList
>>>        put word w+1 of pText into testWord
>>>        if testWord ends with "ed" then put w & comma after outList
>>>     end repeat
>>>     return item 1 to -1 of outList
>>> end if
>>>
>>> function wordOffsets str, pContainer, matchWhole
>>>     -- returns a comma-delimited list of all the wordOffsets of str in pContainer
>>>     -- if matchWhole = true then only whole words are located
>>>     --    else will find word matches everywhere str is part of a word in pContainer
>>>     --    note that in LC words will include adjacent puncutation,
>>>     --       so using matchWhole = true may exclude too many "words"
>>>     -- duplicates are stripped out
>>>     --    eg wordOffsets("co","the common coconut") = 2,3   not   2,3,3
>>>     -- note: to get the last wordOffset of a string in a container (often useful)
>>>     --    use "item -1 of wordOffsets(...)"
>>>     -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>>>     -- requires offsets()
>>>       
>>>     if matchWhole = empty then put false into matchWhole
>>>     put offsets(str,pContainer) into offList
>>>     if offList = 0 then return 0
>>>     repeat for each item i in offList
>>>        put the number of words of (char 1 to i of pContainer) into wdNbr
>>>        if matchWhole then
>>>           if word wdNbr of pContainer <> str then next repeat
>>>        end if
>>>        put 1 into A[wdNbr]
>>>        -- using an array avoids duplicates
>>>     end repeat
>>>     put the keys of A into wordList
>>>     sort lines of wordList ascending numeric
>>>     replace cr with comma in wordList
>>>     return wordList
>>> end wordOffsets
>>>
>>> function offsets str, pContainer
>>>     -- returns a comma-delimited list of all the offsets of str in pContainer
>>>     -- returns 0 if not found
>>>     -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
>>>     --     ie, overlapping offsets are not counted
>>>     -- note: to get the last occurrence of a string in a container (often useful)
>>>     --     use "item -1 of offsets(...)"
>>>     -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>>>      
>>>     if str is not in pContainer then return 0
>>>     put 0 into startPoint
>>>     repeat
>>>        put offset(str,pContainer,startPoint) into thisOffset
>>>        if thisOffset = 0 then exit repeat
>>>        add thisOffset to startPoint
>>>        put startPoint & comma after offsetList
>>>        add length(str)-1 to startPoint
>>>     end repeat
>>>     return item 1 to -1 of offsetList -- delete trailing comma
>>> end offsets
>>>
>>> P.S. I love Jane Austen. One of my favorite books of all time is "Pride and Prejudice." It's so beautifully constructed.
>>
>> Glad to hear that another programmer doesn't spend all their time in front of a computer screen!
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list