Another Revolution Success Story

Richard Gaskin ambassador at fourthworld.com
Thu Jul 1 03:25:54 EDT 2004


Mark Wieder wrote:
> I really wish the csv format had never been invented. Separating
> fields with tabs works much better, and separating them with
> non-printing characters is better yet.

Amen to that, brother.  I guess the clue train doesn't stop in Redmond. ;)

[Semi-OT link - US Gov. warns against MS Explorer:
<http://www.kb.cert.org/vuls/id/713878>]

I once spent an evening with some friends trying to find a less 
efficient tabular format than CSV, and even with our best effort we 
couldn't think of a way to waste more clock cycles parsing as few 
characters as required by the absurb CSV format.

Extra bonus points that CSV is implemented differently in different MS
products (varying escape sequences).  It seems Redmond takes their own
formats as seriously as they take security concerns.


MisterX wrote:
> You mean my importer didn't work?
> 
> Send me a small sample of a non working csv to my email (not the list)
> and I'll see if it can be fixed. Let me also know which record is wrong.

Yours was a very smart effort, and for a moment I was hoping you'd
found the holy grail of scripting, an efficient means of parsing CSV.

Alas, if I read your algorithm correctly it parses line by line,
making the assumption that there are no returns in field data.  I had
tried that once myself, but my customers have since made it clear to me 
that CSV allows returns in data.  It seems the trick is to differentiate 
between return chars within data and returns used to delimit data, 
noting that they are not normally escaped in most products (FM Pro 
wisely substitutes them with a non-printing character, ASCII 11, but
Redmond shows no such wisdom).

I asked around on this some time ago, including quite a few programmers 
far smarter than me.  The best algorithm we could come up with was one 
which walks through the data char by char, keeping track of when it's in 
field data and when it leaves the field, noting that commas are escaped
inconsistently in MS products and not all fields have their data
enclosed in quotes (FM Pro-exported CSV does, but it's a smarter tool in
general than most of the oddities that come out of Redmond <g>).

My post from 14 June 2002 with my own CSV2Tab function is at
<http://lists.runrev.com/pipermail/metacard/2002-June/001767.html>.

Hats off to anyone who can improve it's speed, and a bottle of
12-year-old single malt to anyone who can come up with an algorithm I
can use which is at least twice as fast.

IMNSHO, CSV2Tab should be a built-in function.  If there's some 
agreement on this and a willingness to vote for it I'll post the request 
to Bugzilla.

-- 
  Richard Gaskin
  Fourth World Media Corporation
  ___________________________________________________
  Rev tools and more:  http://www.fourthworld.com/rev






More information about the use-livecode mailing list