Getting the URL out of <![CDATA[
Peter Haworth
pete at lcsql.com
Sat Jan 17 17:35:13 EST 2015
I can't resist posting a regex solution.
Have to admit I'm not entirely clear about what the OP wants to achieve,
but I think it's to get hold of a random URL from an xml file so here goes.
lcSQL_GetRegexMatches is a library function I use all the time to find all
the matches of a regex in a string.
function getRandomURL pxml
constant kRegex="https://www.domainB.com/show.php\?l=0&u=156&id=\d*"
--Need to escape the "?"
local tMatches
put lcSQL_GetRegexMatches(pxml,kRegex) into tMatches
if tMatches is not empty then
get the random of the number of lines of tMatches
return char (item 1 of line it of tMatches) to (item 2 of line it of
tMatches) of pxml
else
return empty
end if
end getRandomURL
function lcSQL_GetRegexMatches ptext,pregex
/*
Finds all the matches of pregex in ptext and returns a list of the
start,end char positions
*/
local tOffset,tStart,tEnd,tList,tLength
--Make sure the whole regex string is a capture group
if not (pregex begins with "(" and pregex ends with ")") then
put "("before pregex
put ")" after pregex
end if
--Make sure the regex is valid
try
get matchChunk(ptext,pregex,tStart,tEnd)
catch e
answer "Invalid regex expression:"
return empty
end try
put 1 into tOffset
put the length of ptext into tLength
repeat
if matchChunk(char tOffset to tLength of ptext,pregex,tStart,tEnd)
then
put tStart+tOffset-1, tEnd+tOffset-1 & return after tList
add tEnd to tOffset
else
exit repeat
end if
end repeat
return tList
end lcSQL_GetRegexMatches
Of course, if you knew in advance how may URLs were in the xml, you could
do the whole thing in one call to MatchChunk with the appropriate number of
start/end variables.
Which reminds me that it would be so nice to have the matchChunkvariable be
a list as in the above, or a numerically-keyed array with each key
containing the comma-separated start and end positions, or just two
variables with the first one holding a comma-separted list of the start
positions and the second one holding a comma-separated list of the end
positions. That would all cause backwards compatibility issues so I guess
there'd have to be a new function, matchAllChunks maybe.
Or you could ask Thierry when his new regex library will be ready.
Pete
lcSQL Software <http://www.lcsql.com>
Home of lcStackBrowser <http://www.lcsql.com/lcstackbrowser.html> and
SQLiteAdmin <http://www.lcsql.com/sqliteadmin.html>
On Fri, Jan 16, 2015 at 6:28 PM, Kay C Lan <lan.kc.macmail at gmail.com> wrote:
> If the above format is EXACTLY as the data appears, i.e.
>
> <![CDATA[ --this is exact
> --random URL you want to capture
> ]]> --this is exact
>
> then this is a very easy to parse. If the line above and below the random
> URL is not as you've stated then there are other ways to skin the cat.
>
> on mouseUp
> put URL tPathToXMLFile into tData
> put false into tParse
> put empty into tOutput
> repeat for each line tLine in tData
> switch
> case (tLine = "]]>")
> put false into tParse
> break
> case (tLine = "<![CDATA[")
> put true into tParse
> break
> case (tParse = false)
> --don't need to do anything
> break
> case (tParse = true)
> put tLine & cr after tOutput
> break
> default --this is here mainly for the development process
> answer warning "A Case I haven't considered." titled "Switch
> Error"
> breakpoint
> end switch
> end repeat
> --strip the trailing cr
> put word 1 to -1 of tOutput into tOutput
> --put tOutput into where ever you like
> end mouseUp
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
More information about the use-livecode
mailing list