Searching for a word when it's more than one word

Richmond Mathewson richmondmathewson at gmail.com
Sat Sep 1 16:03:44 EDT 2018


East or West, home is a comfy LiveCode stack . . .

Well; here's my third version, which does better than the first 2:

https://www.dropbox.com/s/r3yocmqzwhwu4ta/Text%20analyzer%20X.livecode.zip?dl=0

Richmond.

On 1/9/2018 6:39 pm, J. Landman Gay via use-livecode wrote:
> There is a town in Texas called West, made infamous a few years ago by 
> a giant explosion. I don't think you can make assumptions about names 
> of places.
>
> Mark's suggestion to check for words ending in "s" will fail on many 
> towns, though apostrophe-s may be safe.
> -- 
> Jacqueline Landman Gay | jacque at hyperactivesw.com
> HyperActive Software | http://www.hyperactivesw.com
> On September 1, 2018 5:49:30 AM Richmond Mathewson via use-livecode 
> <use-livecode at lists.runrev.com> wrote:
>
>> I can see that the "problem", which my stack does not address, is with 2
>> or 3 part place names:
>>
>> The Rochester/Chester problem is easily dealt with.
>>
>> While it should be realtively easy to have a subroutine to deal with
>> words such as "West" (after all, there are no places just called 
>> "West"),
>> places like a town my parents once lived in called "Haselbury Plucknett"
>> would cause problems.
>>
>> AND, places such as "Ruyton of the Eleven Towns"
>> (https://en.wikipedia.org/wiki/Ruyton-XI-Towns)
>> would really throw a spanner in the works.
>>
>> Come to think of things . . .
>>
>> Unless anyone's code can cope with "Ruyton of the Eleven Towns" it won't
>> stand up: we could even go further and call
>> this the "Ruyton of the Eleven Towns Test".
>>
>> More muffled background noises.
>>
>> Richmond.
>>
>> On 1/9/2018 1:29 pm, Mark Waddingham via use-livecode wrote:
>>> On 2018-09-01 12:05, Richmond Mathewson via use-livecode wrote:
>>>> Obviously, when considering names of places such as Colchester,
>>>> Rochester and Chester one has
>>>> to search for the longer names first and exclude them from later
>>>> searches.
>>>
>>> The 'substring' problem (i.e. Chester being 'in' Rochester) isn't
>>> relevant in the above algorithm because we are 'tokenising' input and
>>> phrases - essentially changing the alphabet.
>>>
>>> i.e. "Rochester Chester Colchester" is turned into ABC, and we match
>>> A, B or C as atomic units.
>>>
>>> I should perhaps point out that the 'processText' operation probably
>>> needs to be a little better in practice - to at least include a 'stop'
>>> token for punctuation. For example:
>>>
>>> "The man walked starting from East Hartford, West Hartford could be
>>> seen in the distance."
>>>
>>> In the case where 'Hartford West' and 'Hartford' are the 'known' towns
>>> (and not 'East Hartford') - the proposed tokenization would result in:
>>>
>>> The,man,walked,starting,from,East,Hartford,West,Hartford,could,be,seen,in,the,distance 
>>>
>>>
>>> Which means you'd get "Hartford West" and "Hartford" - when you should
>>> only get "Hartford" (assuming you care about the linguistic structure
>>> of the text, at least).
>>>
>>> Indeed, the above actually means in preprocessing the text, you can
>>> actually vastly reduce the number of words to search - any sequences
>>> of words which aren't in any pharse (or important punctuation) can be
>>> replaced by "*" say. So the above would become:
>>>
>>> *,East,Hartford,*,West,Hartford,*
>>>
>>> The "*" tokens block matching multi-word phrases.
>>>
>>> Warmest Regards,
>>>
>>> Mark.
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your 
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your 
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




More information about the use-livecode mailing list