is among the words AND find words
Jim Hurley
jhurley0305 at sbcglobal.net
Thu Dec 22 10:17:47 EST 2011
Very good Hugh. This adds another level of versatility, first by the addition of a "rule" and then by returning a word count.
You are right about what constitutes a word. That is what is nice about having a "rule".
BTW: The previous "token" method can be changed into a count by replacing this code
repeat for each item tWordNum in tNums
put word tWordNum of tList into tTestWord
if tWord is among the tokens of tTestWord then return true
end repeat
--If all the tests fail, then return false
return false
With the following:
put 0 into tCount
repeat for each item tWordNum in tNums
put word tWordNum of tList into tTestWord
if tWord is among the tokens of tTestWord then add 1 to tCount
end repeat
return tCount
Jim
> Aha. So we are now only testing 'exist', and not the word number? Since I've
> already written this, I'll post it anyway...
>
> The problem (as always when this topic is raised) is the definition of a
> 'word', as indicated by the additional inclusion of 'token' in the language,
> and thus the definition of 'punctuation'. I believe LC inherited the
> definition of a 'word' from HyperCard for compatibility.
>
> Is $10 one word or two?
> Is my_Var one word or two?
> Is half-baked one word or two?
> Do the same rules apply across non-English languages? And unicode langauges?
>
> Perhaps there is no single definition and some modifiers are needed such as
> 'strict'...
>
> on mouseUp
> put findWord(fld 1,"men","strict")
> end mouseUp
>
> function findWord pContent,pStr,pRule
> if pRule="strict" then
> put COMMA"E&TAB&CR&":;'." into tPunctuation
> else
> --| Adjust according to requirements...
> put COMMA"E&TAB&CR&":;'.^&*()_-+={}[]@~#<>/|\!?" into tPunctuation
> end if
> repeat for each char L in tPunctuation
> replace L with SPACE in pContent
> end repeat
> if pStr is among the words of pContent then
> return num of words of char 1 to offset(pStr,pContent) of pContent
> else return 0
> end findWord
>
>
> Hugh Senior
> FLCo
>
>
> Jim Hurley wrote:
>
> Strike most of my last message. It appears that most of the function can be
> replaced with an examination of the entire text (dah) as in:
>
> put tWord is among the tokens of tList into tTest
> return tTest
>
> This tests the whole text; it is not necessary to test each string
> containing the word individually.
>
> But remove the quotes and periods first.
>
> Jim
>
>
>
> > Thanks to all for their help with this. I learned a new key word in
> "token".
> >
> > So far the function below handles everything reasonable I have thrown at
> it, including finding "time" in the less than reasonable text in field 1:
> >
> > "Now is timely the timeless time.-for, all good."
> >
> > on mouseUp
> > put field 1 into tText
> > put theWordIsAmongTheWords("time", tText) into msg box --returns true
> > end mouseUp
> >
> > function theWordIsAmongTheWords tWord, tList
> > --The quote and period are irrelevant to the test for the word, so
> delete them.
> > replace quote with "" in tList
> > replace "." with "" in tList
> > put empty into tNums
> >
> > --Collect all the strings that wordOffset would find.
> > repeat
> > put wordOffset(tWord,tList, last item of tNums) into tNum
> > if tNum = 0 then exit repeat
> > put the last item of tNums + tNum & comma after tNums
> > end repeat
> >
> > --Test each of these strings aginst the word being tested.
> > --With the quotes and periods gone, the tokens of sting found work well.
> > repeat for each item tWordNum in tNums
> > put word tWordNum of tList into tTestWord
> > if tWord is among the tokens of tTestWord then return true
> > end repeat
> >
> > --If all the tests fail, then return false
> > return false
> > end theWordIsAmongTheWords
More information about the use-livecode
mailing list