reStructuredText in RevTalk ?

Jim Ault jimaultwins at yahoo.com
Tue Nov 10 21:40:27 EST 2009


On Nov 10, 2009, at 3:36 PM, Alex Tweedly wrote:

>
> I'm using a simple file-based CMS in on-rev (along the lines of  
> Andre's CMS/blog system).
>
> I'm looking for a simple way to allow users to include simple html  
> within the files which will be included as part of the web pages. I  
> need to protect from any accidental damage to the rest of the page  
> (e.g. if they add <div>s or mis-matched or incorrect <table>,  
> etc.) , and some users will not be familiar with html. Really all I  
> need is <br>, <p>, simple <li>, and URL/mail addresses.
>
> I guess I could build a html parser that disallowed any 'unsafe'  
> html - but that doesn't sounds trivial (and it assumes I can think  
> of all the cases where html could cause a problem).
>
> Or I could come up with some format of my own to do it.
>
> Or I could build a (small subset of?) reStructuredText to make it  
> easier to use, and then translate this to html for the output.
>
> Has anyone built this already, in a form they could share ?
> Or anything equivalent (and hopefully simpler :-)
>
> (btw - http://docutils.sourceforge.net/rst.html )


In parsing html, sanity is an optional and infrequent result.
If the journey is to begin, then I would recommend the following as a  
starting point.

Allowing user html has several challenges, and I would start with  
these few basics:

    put fld "incomingBlock" into userHtml
    -- line endings are ignored, so...
    replace cr with empty in userHtml
    replace numtochar(10) with empty in userHtml
    replace tab with empty in userHtml
    --reset line content by...
    replace "<" with (cr & "<") in userHtml
    replace ">" with (">" & cr) in userHtml
    --now each line either begins with  "<", "</", or a string

    --polish the line endings
    set the itemDel to ">"  --each tag line should contain one item

    repeat with each line LNN in userHtml
       get word 1 to -1 of LNN --trim white space
       if char 1 of IT is not "<" then
          --string of data between tags
          --disregard special chars like umlauts
          -- or convert them using entities  &amp:
          put IT & cr after cleanerHtml
       else
          if char -1 of IT is not ">" then
             -- oops --opening "<" not closed
             breakpoint --fix
          else
             if the number of items in IT > 1 then
                -- oops --too many  ">"
                --this is a tag line, not a data line
                --something not opened properly
                breakpoint --fix
             end if
             put space before char -1 of IT
             if char -3 of IT is "/" then put " /" into char -3 to -2  
of IT
             -- now endings are either " >"  or " />"
             --thus last word is closing tag
             --  valid   "<BR>   <BR >  <BR something>  <BR char  
string >  <BR />
             -- invalid  "< BR>   < BR >
             if word 1 of IT is "<" then put word 1 of IT & word 2 of  
IT into word 1 to 2 of IT
             -- now "<tag anything >"  or "<tag anything />" should be  
true
          end if
          --since this is a tag line, change quotes to apostrophes
          --in html, both are considered valid quote chars
          --  and now the data lines can contain user quote chars  
without
          --  interfering with rev commands.. thus
          replace quote with "'" in IT -- apostrophe works the same
          put IT & cr after cleanerHtml

       end repeat
       filter cleanerHtml without empt

------------------------------------
--now there are 5 types of lines remaining
--1 data        Click here for more videos
--2 mono tags     <BR >  <p >  <hr >
--3 mono with Atrb   <img src='refrStr' height='30' width='244' />

--4 bookends no Atrb   <title> (then a data line)     and later a line  
of </title >

--5 bookend with Atrb   <table cellpadding='2' bgcolor='green' >
--       (then more open and close tags    TR  TH  TD  )
--        (then a data line)  (then more open and closing tags)
--     and later a line of </table >

--also   <a href='#anchorOnThisPage' >Click to go down further</a >

--also   <a href='httpAnotherPage.com/path/page.html' >Click to go  
over here</a >

And now the fun begins.  Nested tags that need to be balanced,  
especially clickable lists
           here is a block that functions as simple horizontal menu
<div id='specialCase' class='formatBold' >
<ul id='specialBehaviour' >
<li display='inline' ><a href='httpJumpToPumpkins' class='underline4'  
 >Carving pumpkins</a ></li >
<li display='inline' ><a href='httpJumpToCats' class='underline4'  
 >Find a black cat</a ></li >
<li display='inline' ><a href='httpJumpToBrooms' class='underline4'  
 >Scary witch's brooms</a ></li >
</ul >
</div >
--display inline is to make a list into one horizontal menu
-- id and class allow CSS to work, so you would want to filter this  
out to keep it simple

Tables and lists are the most common multi-nested forms that make  
parsing difficult.
Disallowing 'Table', and then attributes like ( id='string'  
class='string' style='font-color:red'  ) will help simplify the core  
html you will allow.

Hope this helps you get started.
Note I am making this a quick reply and none of the above code has  
been tested, so there could be some errors.  I am just typing off the  
top of my head.

Jim Ault
Las Vegas







More information about the use-livecode mailing list