Searching for a word when it's more than one word

Richmond Mathewson richmondmathewson at gmail.com
Sat Sep 1 12:57:49 EDT 2018


That sounds remarkably like two women who are friends of my parents:

One is called "Gay" and the other one is called "Loveday". They were 
friends at school 60 years ago
and when they were both widowed they moved in together; although the son 
of one of them fell out
with his wife and now lives with them as well.

Assumptions are sometimes difficult to avoid.

Although my younger son did actually dislocate his knee jumping to 
conclusions . . .

This was mainly because he was trying to skip a difficult bit . . .

But I digress.

Richmond.

On 1/9/2018 6:39 pm, J. Landman Gay via use-livecode wrote:
> There is a town in Texas called West, made infamous a few years ago by 
> a giant explosion. I don't think you can make assumptions about names 
> of places.
>
> Mark's suggestion to check for words ending in "s" will fail on many 
> towns, though apostrophe-s may be safe.
> -- 
> Jacqueline Landman Gay | jacque at hyperactivesw.com
> HyperActive Software | http://www.hyperactivesw.com
> On September 1, 2018 5:49:30 AM Richmond Mathewson via use-livecode 
> <use-livecode at lists.runrev.com> wrote:
>
>> I can see that the "problem", which my stack does not address, is with 2
>> or 3 part place names:
>>
>> The Rochester/Chester problem is easily dealt with.
>>
>> While it should be realtively easy to have a subroutine to deal with
>> words such as "West" (after all, there are no places just called 
>> "West"),
>> places like a town my parents once lived in called "Haselbury Plucknett"
>> would cause problems.
>>
>> AND, places such as "Ruyton of the Eleven Towns"
>> (https://en.wikipedia.org/wiki/Ruyton-XI-Towns)
>> would really throw a spanner in the works.
>>
>> Come to think of things . . .
>>
>> Unless anyone's code can cope with "Ruyton of the Eleven Towns" it won't
>> stand up: we could even go further and call
>> this the "Ruyton of the Eleven Towns Test".
>>
>> More muffled background noises.
>>
>> Richmond.
>>
>> On 1/9/2018 1:29 pm, Mark Waddingham via use-livecode wrote:
>>> On 2018-09-01 12:05, Richmond Mathewson via use-livecode wrote:
>>>> Obviously, when considering names of places such as Colchester,
>>>> Rochester and Chester one has
>>>> to search for the longer names first and exclude them from later
>>>> searches.
>>>
>>> The 'substring' problem (i.e. Chester being 'in' Rochester) isn't
>>> relevant in the above algorithm because we are 'tokenising' input and
>>> phrases - essentially changing the alphabet.
>>>
>>> i.e. "Rochester Chester Colchester" is turned into ABC, and we match
>>> A, B or C as atomic units.
>>>
>>> I should perhaps point out that the 'processText' operation probably
>>> needs to be a little better in practice - to at least include a 'stop'
>>> token for punctuation. For example:
>>>
>>> "The man walked starting from East Hartford, West Hartford could be
>>> seen in the distance."
>>>
>>> In the case where 'Hartford West' and 'Hartford' are the 'known' towns
>>> (and not 'East Hartford') - the proposed tokenization would result in:
>>>
>>> The,man,walked,starting,from,East,Hartford,West,Hartford,could,be,seen,in,the,distance 
>>>
>>>
>>> Which means you'd get "Hartford West" and "Hartford" - when you should
>>> only get "Hartford" (assuming you care about the linguistic structure
>>> of the text, at least).
>>>
>>> Indeed, the above actually means in preprocessing the text, you can
>>> actually vastly reduce the number of words to search - any sequences
>>> of words which aren't in any pharse (or important punctuation) can be
>>> replaced by "*" say. So the above would become:
>>>
>>> *,East,Hartford,*,West,Hartford,*
>>>
>>> The "*" tokens block matching multi-word phrases.
>>>
>>> Warmest Regards,
>>>
>>> Mark.
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your 
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your 
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




More information about the use-livecode mailing list