why XHTML cannot be parsed with RevXML ?

David Bovill david at openpartnership.net
Fri Mar 14 09:54:05 EDT 2008


Paolo - from memory the issue is not that you cannot parse valid XHTML with
Revs XML externals, rather that Revs native htmltext is not valid XML and
therefore cannot be parsed using Revs XML externals! Strange but true.
However from memory the problem is caused by html entities not being escaped
in htmltext and breaking the XML parsing. You can get around this by
replacing the htmltext enitities with XML compliant ones.

You should not have any problems with valid XHTML though.

On 13/03/2008, paolo mazza <paolo.mazza at neol.it> wrote:
>
>
> Dear Revs,
>
> as far as I know XHTML cannot be parsed with the XML tools in Rev.
>
> Why? Don't you think It would be great manage the XHTML code with the
> RevXML tools?
>
> I found these the following messages on the list dated 23-12-2006.
> Anything has changed then?
>
> Paolo Mazza
>
> MESSAGES:
>
> Unfortunately my experience with the different protocols is very limited.
> The world wide conferences try to accommodate all the interested parties
> when they publish their standards, but I don't study the rationale they
> use.
>
> I know part of the rationale is driven by the big companies.  Someone like
> Ken Ray can give a good answer.  There are so many flavors of text markup
> languages (TML) that have been promulgated for different purposes, I am
> not
> sure there can ever be a standard way of parsing them.
>
> I think that in the beginning a markup language was only for the display
> of
> elements in a 'browser', not an organized data system.  One key part of a
> browser program is that if it does not know what to do with a tag, it
> silently ignores it rather than producing an error message.  In other
> words,
> errors do not break the page, they result in something displayed poorly or
> not at all.
>
> Hopefully someone with real knowledge in this area will chime in.
>
> Jim Ault
> Las Vegas
>
> On 12/23/06 8:48 AM, "David Bovill" <david at ...> wrote:
>
> > Jim _ thought that was the whole point of xHTML?
> >
> > That is that xHTML is HTML that works with XML parsers - that is why you
> can
> > view xHTML outlines in tools such as GoLive. I assumed htmltext from
> it's
> > look was xHTML compliant - ans so always assumd that it would be
> > straightforward to parse with the XML tools....
> >
> >  The question is where the logic breaks - is it that xMHTML cannot be
> parsed
> > with the XML tools in Rev - or is it that for some crazy reason htmltext
> is
> > not XHTML compliant (ie a subset of xHTML) and therefore alid XML. If
> the
> > latter which I suspect? - what would I need to do to htmltext to make it
> > valid XML?
> >
> > On 23/12/06, Jim Ault <JimAultWins at ...> wrote:
> >>
> >> HTML text is a system of tags that signal what item is <start> </end>
> >> whereas XML is much more of an 'outliner' with inheritance defining
> >> children
> >> and nodes.  They both have the <> </> look, but HTML is not regimented
> the
> >> same way except for Tables, Frames, and a few other constructs.
> >>
> >> Now if you add in javascript and css, HTML is even less like XML, so
> the
> >> parent.child relationship is even more remote.
> >>
> >> It is hard to imagine a single parser that would work for
> both.  Perhaps
> >> in
> >> special cases that you generate to stay within rules.
> >>
> >> Jim Ault
> >> Las Vegas
> >>
> >>
> >> On 12/22/06 10:17 PM, "David Bovill" <david at ...> wrote:
> >>
> >>> I am using the script to parse the htmltext of Revs text fields - so
> it
> >> is a
> >>> nice fixed target. Here is the script I have at the moment modified
> >> slightly
> >>> from your suggestions to work with anchors:
> >>>
> >>> function html_ExtractAnchors someHtml
> >>>     put someHtml into htmlPage
> >>>     replace CR with empty in htmlPage --text is now one line
> >>>     replace "name=" with "name=" & CR in htmlPage
> >>>     replace "</a" with "</a" & CR in htmlPage
> >>>
> >>>     -- filter htmlPage with "*http://*"
> >>>     -- set the itemdel to ">"
> >>>     filter htmlPage with (quote & "*</a")
> >>>     set the itemdel to quote
> >>>
> >>>     put empty into newLinkList
> >>>     repeat for each line LNN in htmlPage
> >>>         put item 2 of LNN & cr after newLinkList
> >>>         -- put item 1 of LNN & cr after newLinkList
> >>>     end repeat
> >>>     delete last char of newLinkList
> >>>     return newLinkList
> >>> end html_ExtractAnchors
> >>>
> >>> NB - anyone managed to use  the XML libraries on htmltext - this is
> the
> >> sort
> >>> of thing I mean - which fais with html entities:
> >>>
> >>> function html_AttributeValues someHtml, attributeName, childName,
> depth
> >>>     -- does not work with htmlEntities!
> >>>
> >>>     put revCreateXMLTree(someHtml, true, true, false) into treeID
> >>>     if char 1 to 6 of treeID is "xmlerr" then
> >>>         put someHtml
> >>>         opn_Notify treeID, true
> >>>         exit to top
> >>>     end if
> >>>
> >>>     if depth is empty then put -1 into depth
> >>>     put revXMLRootNode(treeID) into startNode
> >>>     put revXMLAttributeValues(treeID, startNode, childName,
> >> attributeName,
> >>> CR, depth) into attributeValues
> >>>     revDeleteXMLTree treeID
> >>>     return word 1 to -1 of attributeValues
> >>> end html_AttributeValues
> >>>
> >>> Would be nice...
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>



More information about the use-livecode mailing list