reStructuredText in RevTalk ?
Jim Ault
jimaultwins at yahoo.com
Tue Nov 10 21:40:27 EST 2009
On Nov 10, 2009, at 3:36 PM, Alex Tweedly wrote:
>
> I'm using a simple file-based CMS in on-rev (along the lines of
> Andre's CMS/blog system).
>
> I'm looking for a simple way to allow users to include simple html
> within the files which will be included as part of the web pages. I
> need to protect from any accidental damage to the rest of the page
> (e.g. if they add <div>s or mis-matched or incorrect <table>,
> etc.) , and some users will not be familiar with html. Really all I
> need is <br>, <p>, simple <li>, and URL/mail addresses.
>
> I guess I could build a html parser that disallowed any 'unsafe'
> html - but that doesn't sounds trivial (and it assumes I can think
> of all the cases where html could cause a problem).
>
> Or I could come up with some format of my own to do it.
>
> Or I could build a (small subset of?) reStructuredText to make it
> easier to use, and then translate this to html for the output.
>
> Has anyone built this already, in a form they could share ?
> Or anything equivalent (and hopefully simpler :-)
>
> (btw - http://docutils.sourceforge.net/rst.html )
In parsing html, sanity is an optional and infrequent result.
If the journey is to begin, then I would recommend the following as a
starting point.
Allowing user html has several challenges, and I would start with
these few basics:
put fld "incomingBlock" into userHtml
-- line endings are ignored, so...
replace cr with empty in userHtml
replace numtochar(10) with empty in userHtml
replace tab with empty in userHtml
--reset line content by...
replace "<" with (cr & "<") in userHtml
replace ">" with (">" & cr) in userHtml
--now each line either begins with "<", "</", or a string
--polish the line endings
set the itemDel to ">" --each tag line should contain one item
repeat with each line LNN in userHtml
get word 1 to -1 of LNN --trim white space
if char 1 of IT is not "<" then
--string of data between tags
--disregard special chars like umlauts
-- or convert them using entities &:
put IT & cr after cleanerHtml
else
if char -1 of IT is not ">" then
-- oops --opening "<" not closed
breakpoint --fix
else
if the number of items in IT > 1 then
-- oops --too many ">"
--this is a tag line, not a data line
--something not opened properly
breakpoint --fix
end if
put space before char -1 of IT
if char -3 of IT is "/" then put " /" into char -3 to -2
of IT
-- now endings are either " >" or " />"
--thus last word is closing tag
-- valid "<BR> <BR > <BR something> <BR char
string > <BR />
-- invalid "< BR> < BR >
if word 1 of IT is "<" then put word 1 of IT & word 2 of
IT into word 1 to 2 of IT
-- now "<tag anything >" or "<tag anything />" should be
true
end if
--since this is a tag line, change quotes to apostrophes
--in html, both are considered valid quote chars
-- and now the data lines can contain user quote chars
without
-- interfering with rev commands.. thus
replace quote with "'" in IT -- apostrophe works the same
put IT & cr after cleanerHtml
end repeat
filter cleanerHtml without empt
------------------------------------
--now there are 5 types of lines remaining
--1 data Click here for more videos
--2 mono tags <BR > <p > <hr >
--3 mono with Atrb <img src='refrStr' height='30' width='244' />
--4 bookends no Atrb <title> (then a data line) and later a line
of </title >
--5 bookend with Atrb <table cellpadding='2' bgcolor='green' >
-- (then more open and close tags TR TH TD )
-- (then a data line) (then more open and closing tags)
-- and later a line of </table >
--also <a href='#anchorOnThisPage' >Click to go down further</a >
--also <a href='httpAnotherPage.com/path/page.html' >Click to go
over here</a >
And now the fun begins. Nested tags that need to be balanced,
especially clickable lists
here is a block that functions as simple horizontal menu
<div id='specialCase' class='formatBold' >
<ul id='specialBehaviour' >
<li display='inline' ><a href='httpJumpToPumpkins' class='underline4'
>Carving pumpkins</a ></li >
<li display='inline' ><a href='httpJumpToCats' class='underline4'
>Find a black cat</a ></li >
<li display='inline' ><a href='httpJumpToBrooms' class='underline4'
>Scary witch's brooms</a ></li >
</ul >
</div >
--display inline is to make a list into one horizontal menu
-- id and class allow CSS to work, so you would want to filter this
out to keep it simple
Tables and lists are the most common multi-nested forms that make
parsing difficult.
Disallowing 'Table', and then attributes like ( id='string'
class='string' style='font-color:red' ) will help simplify the core
html you will allow.
Hope this helps you get started.
Note I am making this a quick reply and none of the above code has
been tested, so there could be some errors. I am just typing off the
top of my head.
Jim Ault
Las Vegas
More information about the use-livecode
mailing list