Getting the URL out of <![CDATA[

Peter Haworth pete at lcsql.com
Sat Jan 17 17:35:13 EST 2015


I can't resist posting a regex solution.

Have to admit I'm not entirely clear about what the OP wants to achieve,
but I think it's to get hold of a random URL from an xml file so here goes.
lcSQL_GetRegexMatches is a library function I use all the time to find all
the matches of a regex in a string.

function getRandomURL pxml

   constant kRegex="https://www.domainB.com/show.php\?l=0&u=156&id=\d*"
--Need to escape the "?"

   local tMatches

   put lcSQL_GetRegexMatches(pxml,kRegex) into tMatches
   if tMatches is not empty then
      get the random of the number of lines of tMatches
      return char (item 1 of line it of tMatches) to (item 2 of line it of
tMatches) of pxml
   else
      return empty
   end if

end getRandomURL

function lcSQL_GetRegexMatches ptext,pregex
   /*
   Finds all the matches of pregex in ptext and returns a list of the
start,end char positions
    */

   local tOffset,tStart,tEnd,tList,tLength

   --Make sure the whole regex string is a capture group
   if not (pregex begins with "(" and pregex ends with ")") then
      put "("before pregex
      put ")" after pregex
   end if

   --Make sure the regex is valid
   try
      get matchChunk(ptext,pregex,tStart,tEnd)
   catch e
      answer "Invalid regex expression:"
      return empty
   end try

   put 1 into tOffset
   put the length of ptext into tLength
   repeat
      if matchChunk(char tOffset to tLength of ptext,pregex,tStart,tEnd)
then
         put tStart+tOffset-1, tEnd+tOffset-1 & return after tList
         add tEnd  to tOffset
      else
         exit repeat
      end if
   end repeat

   return tList

end lcSQL_GetRegexMatches

Of course, if you knew in advance how may URLs were in the xml, you could
do the whole thing in one call to MatchChunk with the appropriate number of
start/end variables.

Which reminds me that it would be so nice to have the matchChunkvariable be
a list as in the above, or a numerically-keyed array with each key
containing the comma-separated start and end positions, or just two
variables with the first one holding a comma-separted list of the start
positions and the second one holding a comma-separated list of the end
positions.  That would all cause backwards compatibility issues so I guess
there'd have to be a new function, matchAllChunks maybe.

Or you could ask Thierry when his new regex library will be ready.


Pete
lcSQL Software <http://www.lcsql.com>
Home of lcStackBrowser <http://www.lcsql.com/lcstackbrowser.html> and
SQLiteAdmin <http://www.lcsql.com/sqliteadmin.html>

On Fri, Jan 16, 2015 at 6:28 PM, Kay C Lan <lan.kc.macmail at gmail.com> wrote:

> If the above format is EXACTLY as the data appears, i.e.
>
> <![CDATA[   --this is exact
> --random URL you want to capture
> ]]>  --this is exact
>
> then this is a very easy to parse. If the line above and below the random
> URL is not as you've stated then there are other ways to skin the cat.
>
> on mouseUp
>    put URL tPathToXMLFile into tData
>    put false into tParse
>    put empty into tOutput
>    repeat for each line tLine in tData
>       switch
>          case (tLine = "]]>")
>             put false into tParse
>             break
>          case (tLine = "<![CDATA[")
>             put true into tParse
>             break
>          case (tParse = false)
>             --don't need to do anything
>             break
>          case (tParse = true)
>             put tLine & cr after tOutput
>             break
>          default --this is here mainly for the development process
>             answer warning "A Case I haven't considered." titled "Switch
> Error"
>             breakpoint
>       end switch
>    end repeat
>    --strip the trailing cr
>    put word 1 to -1 of tOutput into tOutput
>    --put tOutput into where ever you like
> end mouseUp
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



More information about the use-livecode mailing list