Searching for a word when it's more than one word

Stephen MacLean smaclean at madmansoft.com
Sat Sep 1 09:20:59 EDT 2018


Wow, this is awesome, thank you all!!

Sorry, on the road taking my daughter to college, would love to try some of this out. 

One thing to keep in mind is that as that I’m checking for names against the town list, I may not know what town I’m actually looking for. Usually i do, but not always. 

Therefore i’ve been counting how many of each name I’ve come across and do some calculations at the end to make a best guess. 

Really appreciate the responses!!

Thank you,

Steve

> On Sep 1, 2018, at 7:53 AM, Richmond Mathewson via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
> 
> 
>> On 1/9/2018 2:50 pm, Mark Waddingham via use-livecode wrote:
>>> On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
>>> I've already shovelled Ruyton of the Eleven Towns quite effectively:
>>> 
>>> https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0 
>>> 
>>> No tokenising, in fact very basic stuff indeed.
>>> 
>>> Not wishing to bang on about over-complcating things . . . . .
>> 
>> There is actually a 'correct' more shovelistic approach (at least I *think* this is correct):
>> 
>> -- Ensure all punctuation is surrounded by space
>> repeat for each char tPuncChar in ",.';:()[]{}<>!@£$%^&*-_+=~`?/\|#€" & quote
>>  replace tPuncChar with space & tPuncChar & space in tText
>> end repeat
> 
> Thats a "point" (pun intended) as I just fell foul of a full stop.
>> 
>> -- Ensure all whitespace is space
>> replace return with space in tText
>> replace tab with space in tText
>> 
>> -- Ensure there is never two spaces next to each other in tText
>> repeat while tText contains "  "
>>  replace "  " with " " in tText
>> end repeat
>> 
>> -- Ensure there is only ever one space between words in phrases
>> repeat while tPhrases contains "  "
>>  replace "  " with " " in tPhrases
>> end repeat
>> 
>> -- We can now use an itemDelimiter of space
>> set the itemDelimiter to space
>> 
>> -- Sort the phrases by descending word length.
>> sort lines of tPhrases descending numeric by the number of items in each
>> 
>> -- Now check for, and remove each phrase from the source text in turn
>> set the wholeMatches to true
>> repeat for each line tPhrase in tPhrases
>>  -- If the phrase is not present then skip to the next
>>  if itemOffset(tPhrase, tText) is 0 then
>>    next repeat
>>  end if
>> 
>>  -- Accumulate the phrase on the output list
>>  put tPhrase & return after tFoundPhrases
>> 
>>  -- Remove the phrase from the input text (we assume here that * does not appear in any phrase)
>>  replace tPhrase with "*" in tText
>> end repeat
>> 
>> Warmest Regards,
>> 
>> Mark.
>> 
>> P.S. The above will be reasonable quick for small sets of phrases / small source texts - but I think as the size of either increases it will get very slow, very quickly!
>> 
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode






More information about the use-livecode mailing list