is among the words AND find words

Jim Hurley jhurley0305 at sbcglobal.net
Thu Dec 22 10:17:47 EST 2011


Very good Hugh. This adds another level of versatility, first by the addition of a "rule" and then by returning a word count.

You are right about what constitutes a word. That is what is nice about having a "rule".

BTW: The previous "token" method can be changed into a count by replacing this code

repeat for each item tWordNum in tNums
     put word tWordNum of tList into tTestWord      
    if tWord is among the tokens of tTestWord then return true   
end repeat
   --If all the tests fail, then return false
 return false

With the following:

put 0 into tCount
repeat for each item tWordNum in tNums
     put word tWordNum of tList into tTestWord      
    if tWord is among the tokens of tTestWord then add 1 to tCount  
end repeat
return tCount

Jim



> Aha. So we are now only testing 'exist', and not the word number? Since I've
> already written this, I'll post it anyway...
> 
> The problem (as always when this topic is raised) is the definition of a
> 'word', as indicated by the additional inclusion of 'token' in the language,
> and thus the definition of 'punctuation'. I believe LC inherited the
> definition of a 'word' from HyperCard for compatibility.
> 
> Is $10 one word or two?
> Is my_Var one word or two?
> Is half-baked one word or two?
> Do the same rules apply across non-English languages? And unicode langauges?
> 
> Perhaps there is no single definition and some modifiers are needed such as
> 'strict'...
> 
> on mouseUp
>   put findWord(fld 1,"men","strict")
> end mouseUp
> 
> function findWord pContent,pStr,pRule
>   if pRule="strict" then
>     put COMMA&QUOTE&TAB&CR&":;'." into tPunctuation
>   else
>     --| Adjust according to requirements...
>     put COMMA&QUOTE&TAB&CR&":;'.^&*()_-+={}[]@~#<>/|\!?" into tPunctuation
>   end if
>   repeat for each char L in tPunctuation
>     replace L with SPACE in pContent
>   end repeat
>   if pStr is among the words of pContent then
>     return num of words of char 1 to offset(pStr,pContent) of pContent
>   else return 0
> end findWord
> 
> 
> Hugh Senior
> FLCo
> 
> 
> Jim Hurley wrote:
> 
> Strike most of my last message. It appears that most of the function can be
> replaced with an examination of the entire text (dah)  as in:
> 
>     put tWord is among the tokens of tList into tTest
>     return tTest
> 
> This tests the whole text; it is not necessary to test each string
> containing  the word individually.
> 
> But remove the quotes and periods first.
> 
> Jim
> 
> 
> 
> > Thanks to all for their help with this. I learned a new key word in
> "token".
> >
> > So far the function below handles everything reasonable I have thrown at
> it, including finding "time" in the less than reasonable  text in field 1:
> >
> >   "Now is timely the timeless time.-for, all good."
> >
> > on mouseUp
> >   put field 1 into tText
> >   put theWordIsAmongTheWords("time", tText) into msg box --returns true
> > end mouseUp
> >
> > function theWordIsAmongTheWords tWord, tList
> >   --The quote and period are irrelevant to the test for the word, so
> delete them.
> >   replace quote with "" in tList
> >   replace "." with "" in tList
> >   put empty into tNums
> >
> >   --Collect all the strings that wordOffset would find.
> >   repeat
> >      put wordOffset(tWord,tList, last item of tNums) into tNum
> >      if tNum = 0 then exit repeat
> >      put the last item of tNums + tNum & comma after tNums
> >   end repeat
> >
> >   --Test each of these strings aginst the word being tested.
> >   --With the quotes and periods gone, the tokens of sting found work well.
> >   repeat for each item tWordNum in tNums
> >      put word tWordNum of tList into tTestWord
> >      if tWord is among the tokens of tTestWord then return true
> >   end repeat
> >
> >   --If all the tests fail, then return false
> >   return false
> > end theWordIsAmongTheWords




More information about the use-livecode mailing list