Tree Arrays: putting XML in nested arrays

David Bovill david at architex.tv
Wed Dec 31 08:02:43 EST 2008


*Aim*
I am trying to define a generic data structure for storing tree structures
such as XML documents in the new nested arrays. I'd ike to use this
structure to store XML documents, and to store the tree data structures I
use for tree widgets. I'd like it to be simpler to use, understand and
debug, than the XML external.

NB - I remember a reference to some scripts that took XML and created the
new nested arrays. Anyone remember where it is - couldn't find it?

This post is a work in progress - brain dump. I hope it's useful. It helps
me structure my coding to write it up a bit first, and hopefully helps
others with similar issues.

*Key Words*
XML, tree widgets, nested data structures, design patterns, nested arrays,
regular epressions

*The Problem*
I'm working on a script to change one XML document into another - it is the
sort of thing XSLT <http://www.w3.org/TR/xslt> isused for.

So what would be a good way (design pattern) to convert one XML tree into
another in Revolution? Ideally this structure would work for XML - but also
more general tree structures where the node names could be complete lines of
text (and not simply single word xml node names), such as you might have in
an indented field.

There are all sorts of uses for this - lets take a recent example I've had
to do - translating rev htmltext to xHtml
basic<http://www.w3.org/TR/xhtml-basic/>:
which essentially involves taking elements like "

As an example of the sort of hacks Id like to avoid - here is a function
that I came up with for the xHtml basic
<http://www.w3.org/TR/xhtml-basic/>use case:

function html_RevToBoldSpan someHtml
>     put "(?miU)(<b>).*(</b>)" into someReg
>     -- put "(?mi)(<b>)[^\<]*(</b>)" into someReg
>     repeat
>         if matchchunk(someHtml, someReg, oTagStart, oTagEnd, cTagStart,
> cTagEnd) is true then
>             put "</span>" into char cTagStart to cTagEnd of someHtml
>             put "<span style='font-weight:bold'>" into char oTagStart to
> oTagEnd of someHtml
>         else
>             return someHtml
>         end if
>     end repeat
> end html_RevToBoldSpan
>

Tip - for those of you that have delved into regular expressions - this
script illustrates a new trick I've found with regular expressions - the use
of "U" to force no-greedy matching - (?miU) at the beginning of a regexp
causes the match to be multi-line (m), case insensitive (i) and non-greedy
(U).

This problem with scripts like this is that they cannot deal with true
nested formatting tags - for that you need to walk the tag tree - which is
what I'd like to do next:

Here were my initial thoughts on a simple start based on renaming XML nodes:

   1. Create an XML Tree for the original XML
   2. Create a new one for the transformed XML
   3. Write a recursive function to walk the tree - starting at the root
   node, getting its children and recursing
   4. Have the recursive function make a call to a translate function which
   uses an array to store the new tag names as the contents of nested keys -
   this could be an array or use the xml treeID

Somehow I need to include the general ability to use node attributes to
determine the new node name - so that:

<span style='font-weight:bold'>  => <b>
> <span style='color:#FF0000'>Red</span> => <font color="#FF0000">
>

I think what I'd really like to do all this with arrays rather than XML
treeIDs - that is:

   1. Create an XML Tree for the original XML and convert it to an array
   2. Write a recursive function to walk the array - starting at the root
   node, getting its children and recursing
   3. Have the recursive function make a call to a translate function which
   uses an array to store the new tag names as the contents of nested keys -
   building a new transformed array as it goes
   4. Create a new XML document from the transformed Array

What I need to decide is:

   - what sort of structure to use for this generic array, and not so
   critically
   - how to implement some sort of "plugin" to this design pattern so it is
   easy do a variety of transformations easily and intuitively.

*Nested Array Data Structure*
For the array structure I want to use the new nested arrays, and also to be
able to store attributes of nodes - something like:

node_1
>     node_1_1
>        node_1_1_1
>     node_1_2
>

Puttin an XML tree like that above into a nested array we could then do
things like:

   - put treeArray ["node_1"]["node_1_2"] into nodeContents
   - put treeArray ["node_1"]["node_1_2"]["attribute"]["style"] into
   nodeStyle

With the attributes it gets ugly, and would require filtering out the
"attribute" key, and naming it in some unique as possible way. So probably
better to store a separate attribute branch of the array?

   - put treeArray ["_tree"]["node_1"]["node_1_2"] into nodeContents
   - put treeArray ["_attribute"]["node_1"]["node_1_2"]["style"] into
   nodeStyle

*What about duplicate nodes?*
This is where I get a bit stuck, as there are still problems: for instance
at the moment the structure does not allow for duplicate nodes (which are
very common):

topNode
>     duplicateNode
>        nestedNode
>     duplicateNode
>        anotherNestedNode
>

The xml treeID notation uses references like:

topnode/duplicateNode[2]/anotherNestedNode

No idea what to do for that? Something like:

   - put treeArray
   ["tree"]["topNode"]["duplicateNode/2"]["anotherNestedNode"] into
   nodeContents ???
   - put treeArray
   ["tree"]["topNode"][1]["duplicateNode"][2]["anotherNestedNode"][1] into
   nodeContents

*What about callout functions?*
I've used these before for plugin searches - but it may be more intuitive to
simply copy and customise a handler for a specific purpose. On the other
hand recursive functions are never that intuitive to customise - so call
outs may be better?

And not sure what a callout function could look like?

on custom_TransformNode nodePath, orignialArray, @transFormed

end custom_TransformNode



More information about the use-livecode mailing list