Searching for a word when it's more than one word

Quentin Long cubist at aol.com
Sun Sep 2 05:09:16 EDT 2018


Have pondered the question, and come up with some code which may or may not solve the problem at hand, but which may at least prove helpful in looking for a real solution:

==========================

Assumption: You’ve got a text document (not HTML, not RTF, just plain TXT) which contains, among other things, however-many place names.
Assumption: You have a return-list of place names, which may or may not be single words
Assumption: The text document is in the variable SourceDoc
Assumption: The list of place names is in the variable NamesList

Assumption: You want a document which contains a complete census of exactly which of the place-names in NamesList occur in SourceDoc
Assumption: For each place-name which does occur within SourceDoc, you want a list of which word-numbers each such occurrance begins at

put “” into PlaceNamesCensus
repeat for each line DisName in NamesList
  put the number of words in DisName into DisNameWords
  put 0 into SearchOffset
  put “” into FoundLocs
  repeat
    put offset (DisName, SourceDoc, SearchOffset) into DisLoc
    if DisLoc = 0 then
      -- there is no character string which matches the place name in question
      end repeat
    else
      —- there is a character string which matches the place name in question
      —- is it the actual placename, and not finding “chester” in “colchester”?
      put the number of words in (char 1 to DisLoc of SourceDoc) into StartWord
      if DisName = (word StartWord to (StartWord + DisNameWords - 1) of SourceDoc) then
        -- it’s a match, yay!
        put StartWord into item (1 + the number of items in FoundLocs) of FoundLocs
      end if
      add DisLoc to SearchOffset
    end if   
  end repeat
  if FoundLocs <> “” then
    —- nope, DisName wasn’t in SourceDoc
    put “[nil]” into DeseLocs
  else
    —- yay! DisName *was* in SourceDoc! at least once!
    put FoundLocs into DeseLocs
  end if
      put DisName & comma & DeseLocs into line (1 + the number of lines in PlaceNamesCensus) of PlaceNamesCensus
end repeat

==========================

Known issue: The above code does not pretend to locate possessive instances of place names (i.e., California's, the United Kingdom's, etc). Am thinking that pre-processing of SourceDoc will be helpful-to-necessary. This pre-processing may need to accommodate more issues than just possessives.
 

"Bewitched" + "Charlie's Angels" - Charlie = "At Arm's Length"
Read the webcomic at [ http://www.atarmslength.net ]!
If you like "At Arm's Length", support it at [ http://www.patreon.com/DarkwingDude ].


More information about the use-livecode mailing list