itemDelimiter

Jim MacConnell jmac at consensustech.com
Thu Feb 12 16:02:38 EST 2004


Tom,

--- sure you have multiple answers by now but I'm just now having a minute
--- Also no sleep last night so long winded.. Will refrain in future but
--- I've done this already so here goes... Btw Untested

While I hesitate to give scripting advice as I am just relearning, and prone
to really dumb mistakes ... I'm working through a similar thing.  The
routine (corrected) that Brian has is basically the way I do it. There are a
couple of things to consider when you are looking for repeating types of
info like all the paragraphs <p> xxxx </p>
 
= = = = = = =
The first is to carefully keep track of where you are looking by resetting
the startOffset after you have found an instance of what you are looking for
so you don't "find" the same one again again...

-- Make sure we start at the beginning when looking for a particular tag
-- and that our end starts past our start
Put 0 into startOffset
Put 1 into endOffset

-- Substitute whatever tags you are looking for for the <title> and </title>
-- For example "<p>" and "</p"
put "<title>" into startTag
put "</title>" into endTag

-- Add a loop to keep going...
-- Here (endOffset = 0) flags  having found the last tag
Repeat while (endOffset is not 0)
 
    -- NOTE: Added startOffset into the offset(..)
    put offset(startTag, theHTML, startOffset) into startOffset
    if (startOffset > 0) then
       put offset(endTag, theHTML, startOffset + length(startTag) - 1)
            into endOffset
       if (endOffset > 0) then
           put char (startOffset + length(startTag) - 1) to (startOffset +
            endOffset - 1) of theHTML into theText
       end if

    -- Then reset the starting point based on how far you've gone
    -- and starting our search from that point on
    put startOffset + endOffset into startOffset
    
    end if
    -- Here do what you want with theText since you
    -- are putting this in a loop. For example:
    put "*****" & return & theText after theAnswer

end repeat


= = = = = = =
Another approach is to be less careful about the startOffset and just blow
away the text you've already been through... This means your "offset" lines
can be a little simpler but you need a separate place to store the text..
Not sure if this is an advantage but it makes looking at the HTML text
easier cuz there's less of it as you go

-- Make sure we make a copy of our HTML text
Put theHTML into aSafeCopyoftheHTML

-- Substitute whatever tags you are looking for for the <title> and </title>
-- For example "<p>" and "</p"
put "<title>" into startTag
put "</title>" into endTag

-- Make sure your endTag exists and set up loop
Put Offset(endTag, theHTML ) into endOffset

-- Add a loop to keep going... Using endOffset
-- as a flag having found the last tag
Repeat while (endOffset is not 0)
    put offset(startTag, theHTML ) into startOffset
    put offset(endTag, theHTML , startOffset + length(startTag) - 1) into
endOffset
    if (endOffset > 0) then
        put char (startOffset + length(startTag) - 1) to (startOffset +
endOffset - 1) of theHTML into theText
    end if

    -- Now clean up theHTML by getting rid of what you've used
    delete char 1 to endOffset + length(endTag) of theHTML
   
    -- Here do what you want with theText since you
    -- are putting this in a loop. For example:
    put "*****" & return & theText after theAnswer

end repeat
= = = = =


Finally... Seems like it could/should be broken into a separate function
usage: put grabText(theHTML,"<p>,"</p>","All") into theParagraphs

Function grabText theHTML, startTag, endTag, oneOrAll

    Put 0 into startOffset
    Put 1 into endOffset
    Put 0 into numFound
    Put empty into theFoundtext
    Repeat while (endOffset is not 0)
        put offset(startTag, theHTML, startOffset) into startOffset
        if (startOffset > 0) then
            put offset(endTag, theHTML, startOffset + length(startTag) - 1)
into endOffset
            if (endOffset > 0) then
                put char (startOffset + length(startTag) - 1) to
(startOffset + endOffset - 1) of theHTML into theText
             put startOffset + endOffset into startOffset
         end if
     end if
     
    -- Here is the stuff added for the return info
    -- in this case using items to store the results
    If numFound > 0 then put theText into item numFound of theFoundText
    end repeat
    return theFoundText
End grabText



More information about the use-livecode mailing list