Searching for a word when it's more than one word

David V Glasgow dvglasgow at gmail.com
Mon Sep 3 09:54:09 EDT 2018


My family was stranded for a while during a transfer at Frankfurt airport, while  a computer system refused to accept that ‘Glasgow’ was not a destination. ( At least, in that instance)

Having said that, the same error is much more commonly made by taxi drivers, who can’t avoid showing great disappointment, when I am just going to the local station.

Cheers,

David Glasgow

> On 1 Sep 2018, at 5:57 pm, Richmond Mathewson via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
> That sounds remarkably like two women who are friends of my parents:
> 
> One is called "Gay" and the other one is called "Loveday". They were friends at school 60 years ago
> and when they were both widowed they moved in together; although the son of one of them fell out
> with his wife and now lives with them as well.
> 
> Assumptions are sometimes difficult to avoid.
> 
> Although my younger son did actually dislocate his knee jumping to conclusions . . .
> 
> This was mainly because he was trying to skip a difficult bit . . .
> 
> But I digress.
> 
> Richmond.
> 
> On 1/9/2018 6:39 pm, J. Landman Gay via use-livecode wrote:
>> There is a town in Texas called West, made infamous a few years ago by a giant explosion. I don't think you can make assumptions about names of places.
>> 
>> Mark's suggestion to check for words ending in "s" will fail on many towns, though apostrophe-s may be safe.
>> -- 
>> Jacqueline Landman Gay | jacque at hyperactivesw.com
>> HyperActive Software | http://www.hyperactivesw.com
>> On September 1, 2018 5:49:30 AM Richmond Mathewson via use-livecode <use-livecode at lists.runrev.com> wrote:
>> 
>>> I can see that the "problem", which my stack does not address, is with 2
>>> or 3 part place names:
>>> 
>>> The Rochester/Chester problem is easily dealt with.
>>> 
>>> While it should be realtively easy to have a subroutine to deal with
>>> words such as "West" (after all, there are no places just called "West"),
>>> places like a town my parents once lived in called "Haselbury Plucknett"
>>> would cause problems.
>>> 
>>> AND, places such as "Ruyton of the Eleven Towns"
>>> (https://en.wikipedia.org/wiki/Ruyton-XI-Towns)
>>> would really throw a spanner in the works.
>>> 
>>> Come to think of things . . .
>>> 
>>> Unless anyone's code can cope with "Ruyton of the Eleven Towns" it won't
>>> stand up: we could even go further and call
>>> this the "Ruyton of the Eleven Towns Test".
>>> 
>>> More muffled background noises.
>>> 
>>> Richmond.
>>> 
>>> On 1/9/2018 1:29 pm, Mark Waddingham via use-livecode wrote:
>>>> On 2018-09-01 12:05, Richmond Mathewson via use-livecode wrote:
>>>>> Obviously, when considering names of places such as Colchester,
>>>>> Rochester and Chester one has
>>>>> to search for the longer names first and exclude them from later
>>>>> searches.
>>>> 
>>>> The 'substring' problem (i.e. Chester being 'in' Rochester) isn't
>>>> relevant in the above algorithm because we are 'tokenising' input and
>>>> phrases - essentially changing the alphabet.
>>>> 
>>>> i.e. "Rochester Chester Colchester" is turned into ABC, and we match
>>>> A, B or C as atomic units.
>>>> 
>>>> I should perhaps point out that the 'processText' operation probably
>>>> needs to be a little better in practice - to at least include a 'stop'
>>>> token for punctuation. For example:
>>>> 
>>>> "The man walked starting from East Hartford, West Hartford could be
>>>> seen in the distance."
>>>> 
>>>> In the case where 'Hartford West' and 'Hartford' are the 'known' towns
>>>> (and not 'East Hartford') - the proposed tokenization would result in:
>>>> 
>>>> The,man,walked,starting,from,East,Hartford,West,Hartford,could,be,seen,in,the,distance 
>>>> 
>>>> Which means you'd get "Hartford West" and "Hartford" - when you should
>>>> only get "Hartford" (assuming you care about the linguistic structure
>>>> of the text, at least).
>>>> 
>>>> Indeed, the above actually means in preprocessing the text, you can
>>>> actually vastly reduce the number of words to search - any sequences
>>>> of words which aren't in any pharse (or important punctuation) can be
>>>> replaced by "*" say. So the above would become:
>>>> 
>>>> *,East,Hartford,*,West,Hartford,*
>>>> 
>>>> The "*" tokens block matching multi-word phrases.
>>>> 
>>>> Warmest Regards,
>>>> 
>>>> Mark.
>>> 
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> 
>> 
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list