why XHTML cannot be parsed with RevXML ?

paolo mazza paolo.mazza at neol.it
Thu Mar 13 08:01:26 EDT 2008

Dear Revs,

as far as I know XHTML cannot be parsed with the XML tools in Rev.

Why? Don't you think It would be great manage the XHTML code with the
RevXML tools?

I found these the following messages on the list dated 23-12-2006.
Anything has changed then?

Paolo Mazza


Unfortunately my experience with the different protocols is very limited.
The world wide conferences try to accommodate all the interested parties
when they publish their standards, but I don't study the rationale they

I know part of the rationale is driven by the big companies.  Someone like
Ken Ray can give a good answer.  There are so many flavors of text markup
languages (TML) that have been promulgated for different purposes, I am not
sure there can ever be a standard way of parsing them.

I think that in the beginning a markup language was only for the display of
elements in a 'browser', not an organized data system.  One key part of a
browser program is that if it does not know what to do with a tag, it
silently ignores it rather than producing an error message.  In other
errors do not break the page, they result in something displayed poorly or
not at all.

Hopefully someone with real knowledge in this area will chime in.

Jim Ault
Las Vegas

On 12/23/06 8:48 AM, "David Bovill" <david at ...> wrote:

> Jim _ thought that was the whole point of xHTML?
> That is that xHTML is HTML that works with XML parsers - that is why you
> view xHTML outlines in tools such as GoLive. I assumed htmltext from it's
> look was xHTML compliant - ans so always assumd that it would be
> straightforward to parse with the XML tools....
>  The question is where the logic breaks - is it that xMHTML cannot be
> with the XML tools in Rev - or is it that for some crazy reason htmltext
> not XHTML compliant (ie a subset of xHTML) and therefore alid XML. If the
> latter which I suspect? - what would I need to do to htmltext to make it
> valid XML?
> On 23/12/06, Jim Ault <JimAultWins at ...> wrote:
>> HTML text is a system of tags that signal what item is <start> </end>
>> whereas XML is much more of an 'outliner' with inheritance defining
>> children
>> and nodes.  They both have the <> </> look, but HTML is not regimented
>> same way except for Tables, Frames, and a few other constructs.
>> Now if you add in javascript and css, HTML is even less like XML, so the
>> parent.child relationship is even more remote.
>> It is hard to imagine a single parser that would work for both.  Perhaps
>> in
>> special cases that you generate to stay within rules.
>> Jim Ault
>> Las Vegas
>> On 12/22/06 10:17 PM, "David Bovill" <david at ...> wrote:
>>> I am using the script to parse the htmltext of Revs text fields - so it
>> is a
>>> nice fixed target. Here is the script I have at the moment modified
>> slightly
>>> from your suggestions to work with anchors:
>>> function html_ExtractAnchors someHtml
>>>     put someHtml into htmlPage
>>>     replace CR with empty in htmlPage --text is now one line
>>>     replace "name=" with "name=" & CR in htmlPage
>>>     replace "</a" with "</a" & CR in htmlPage
>>>     -- filter htmlPage with "*http://*"
>>>     -- set the itemdel to ">"
>>>     filter htmlPage with (quote & "*</a")
>>>     set the itemdel to quote
>>>     put empty into newLinkList
>>>     repeat for each line LNN in htmlPage
>>>         put item 2 of LNN & cr after newLinkList
>>>         -- put item 1 of LNN & cr after newLinkList
>>>     end repeat
>>>     delete last char of newLinkList
>>>     return newLinkList
>>> end html_ExtractAnchors
>>> NB - anyone managed to use  the XML libraries on htmltext - this is the
>> sort
>>> of thing I mean - which fais with html entities:
>>> function html_AttributeValues someHtml, attributeName, childName, depth
>>>     -- does not work with htmlEntities!
>>>     put revCreateXMLTree(someHtml, true, true, false) into treeID
>>>     if char 1 to 6 of treeID is "xmlerr" then
>>>         put someHtml
>>>         opn_Notify treeID, true
>>>         exit to top
>>>     end if
>>>     if depth is empty then put -1 into depth
>>>     put revXMLRootNode(treeID) into startNode
>>>     put revXMLAttributeValues(treeID, startNode, childName,
>> attributeName,
>>> CR, depth) into attributeValues
>>>     revDeleteXMLTree treeID
>>>     return word 1 to -1 of attributeValues
>>> end html_AttributeValues
>>> Would be nice...

More information about the Use-livecode mailing list