Searching for a word when it's more than one word
Quentin Long
cubist at aol.com
Sun Sep 2 05:09:16 EDT 2018
Have pondered the question, and come up with some code which may or may not solve the problem at hand, but which may at least prove helpful in looking for a real solution:
==========================
Assumption: You’ve got a text document (not HTML, not RTF, just plain TXT) which contains, among other things, however-many place names.
Assumption: You have a return-list of place names, which may or may not be single words
Assumption: The text document is in the variable SourceDoc
Assumption: The list of place names is in the variable NamesList
Assumption: You want a document which contains a complete census of exactly which of the place-names in NamesList occur in SourceDoc
Assumption: For each place-name which does occur within SourceDoc, you want a list of which word-numbers each such occurrance begins at
put “” into PlaceNamesCensus
repeat for each line DisName in NamesList
put the number of words in DisName into DisNameWords
put 0 into SearchOffset
put “” into FoundLocs
repeat
put offset (DisName, SourceDoc, SearchOffset) into DisLoc
if DisLoc = 0 then
-- there is no character string which matches the place name in question
end repeat
else
—- there is a character string which matches the place name in question
—- is it the actual placename, and not finding “chester” in “colchester”?
put the number of words in (char 1 to DisLoc of SourceDoc) into StartWord
if DisName = (word StartWord to (StartWord + DisNameWords - 1) of SourceDoc) then
-- it’s a match, yay!
put StartWord into item (1 + the number of items in FoundLocs) of FoundLocs
end if
add DisLoc to SearchOffset
end if
end repeat
if FoundLocs <> “” then
—- nope, DisName wasn’t in SourceDoc
put “[nil]” into DeseLocs
else
—- yay! DisName *was* in SourceDoc! at least once!
put FoundLocs into DeseLocs
end if
put DisName & comma & DeseLocs into line (1 + the number of lines in PlaceNamesCensus) of PlaceNamesCensus
end repeat
==========================
Known issue: The above code does not pretend to locate possessive instances of place names (i.e., California's, the United Kingdom's, etc). Am thinking that pre-processing of SourceDoc will be helpful-to-necessary. This pre-processing may need to accommodate more issues than just possessives.
"Bewitched" + "Charlie's Angels" - Charlie = "At Arm's Length"
Read the webcomic at [ http://www.atarmslength.net ]!
If you like "At Arm's Length", support it at [ http://www.patreon.com/DarkwingDude ].
More information about the use-livecode
mailing list