HTMLtext doesn't play well with CSS

Richard Gaskin ambassador at fourthworld.com
Sat Jul 17 12:50:17 EDT 2010


Tim Ponn wrote:

 > I want the user to be able to change font sizes, make bold,
 > italic, whatever.  I also want them to have the freedom to
 > turn some of the text into links, etc.  When I try to use
 > HTMLtext in rev, the results are not so good.  How do I
 > improve it?

As Jim pointed out, the htmlText of a field is not true HTML in the 
browser sense; it could more accurately be called "xmlText" because it 
uses XML tags to represent style runs, but is not designed to be 
web-ready HTML.

The htmlText property was added to the engine to provide something no 
other xTalk had, which is very, very useful:  a plain-text description 
of everything in a field, both content and style attributes.  Unlike 
rtfText, htmlText is designed to be the one way a field's content and 
styles can be reproduced in another field with complete fidelity.  As 
such it includes tags like threeDBox which is supported in the Rev 
engine but not in HTML, and is missing a good many HTML things like CSS.

One useful thing about htmlText is that the order of tags is fairly 
consistent when you obtain that property from a field, regardless of the 
tag order you may have used to set those attributes.

For example, you can use this:

     set the htmlText of fld 1 to "<i><b>Hello</b></i>"

...and when you get the htmlText you'll get:

    <p><b><i>Hello</i></b></p>

Note the reversal of the order of <i> and <b>. This happens because the 
storage format of htmlText is a binary representation in which those 
flags have fixed positions, so it can parse an htmlText string to set 
those binary flags but once set they're in whatever order the engine 
stores them, and retrieving them will translate them from the binary 
form to the text tags in that order.

This can be useful because it can allow you to predict what certain 
combinations of attributes will be, and then do a search-and-replace to 
swap 'em out for CSS assignments.

For example, if you had a CSS class named "MyClass" which sets the bold 
and italic of text, you could write this to translate the htmlText to 
use CSS assignments:

    replace "<b><i>" with "<span style='MyClass'>" in tData
    replace "</b></i>" with "</span>" in tData

At first this seems excitingly easy to deal with, extensible as it can 
be to include font size, font face, and other aspects.

But then we come to nested tags, and meet with a grave disappointment. :(

This htmlText:

   <p><u><b>Hello</b></u><b> <i>world</i></b>.</p>

...describes the style runs for the words "Hello World" in which both 
words are in bold but "Hello" is underlined and "World" is italicized.

Note what happened to "<b>" there: it got replicated to enclose each 
word separately, as opposed to this form which would be more common in HTML:

   <p><u><b>Hello</u> <i>world</i></b>.</p>

In some cases this won't be a big problem, since while it adds a bit of 
bloat to the page it can still allow simple wholesale replacements to be 
used to assign classes.

But there may be times where it's not sufficient, requiring you to parse 
tags by examining them in sequence (see the optional third argument to 
the offset function for a good way to make a pull-parser),  omitting 
redundant tags.

With WebMerge, the revJournal blog, and some custom CMS solutions for 
clients, I've had to deal with these sorts of issues myself.  In those 
contexts the efficiency of the page generation was a higher priority 
than the cleanliness of the resulting HTML, so I opted for what could 
arguably called laziness in how those tags are dealt with. ;)

It would be ideal if we had a nicely generalized function like this:

    webHtml(pHtmlText, pCss)

...in which pHtmlText is the raw htmlText of a field and pCss is a set 
of CSS definitions.  The function could then parse the text, look for 
tag patterns which can be satisfied by the various CSS definitions 
supplied, and replace those htmlText tag sequences with appropriate 
class and style assignments as needed.

Unfortunately I have no such function in my libraries. It's been on my 
to-do list, but has been a much lower priority than other things which 
actually get done. :)

Given the complexity of the task, this might make a good exercise for 
the readers here.  As often happens here, folks would likely submit 
different forms, each more complete and hopefully faster than the last, 
and if the process follows historic norms at the end Alex Tweedly will 
come up with a three-line solution using arrays. :)

If the function were made public domain or MIT license, it could be used 
in commercial projects as well as open source projects without legal 
encumbrance.

Anyone up for such a task?

I'll offer a code bounty of $100 for any reasonably efficient function 
that does that.

--
  Richard Gaskin
  Fourth World
  Rev training and consulting: http://www.fourthworld.com
  Webzine for Rev developers: http://www.revjournal.com
  revJournal blog: http://revjournal.com/blog.irv



More information about the use-livecode mailing list