Comma-delimited values

Bob Sneidar bobs at twft.com
Mon Mar 8 14:35:44 EST 2010


Not to add fuel to the fire, but I have seen spreadsheet data containing tabs in cells before. Usually it's when importing from another program. Also, let's not forget that sometimes a cell can contain carriage returns too. 

This is the same problem that plagues SQL programmers, forcing them to escape their values before submitting a query. As it is the responsibility of SQL developers to take this precaution, it is also the responsibility of developers to properly format their exported data so that things like quotes, commas and carriage returns (delimiters) are "escaped" or converted to something else. 

Now Software used to do this pretty well, converting their notes fields containing commas and carriage returns to other special characters before exporting them. But eventually, you will hit this problem where the data is going to contain things you didn't expect, and you are going to have to deal with it, as in the case of the space after the day number in the original example. 

Bob


On Mar 8, 2010, at 11:19 AM, Jim Bufalini wrote:

> I agree with Richard's rant. CSV (Comma Separated Values) is a very, very
> old convention that was not well thought out from the beginning. Commas can
> exists within cells (values) as in the case of dates or text blocks or even
> formatted numbers ($1,000.00). So, to get around this, they added quotes
> around cells that could contain commas. The only problem with this, as
> Richard points out, is text blocks can also contain quotes. ;-)
> 
> This is why I said, whether you are exporting to import into Rev or any
> other program, use tab as the value separator and get rid of the arbitrary
> quotes that they only put in there because commas can exist in the cell.
> 
> Tab is very safe to use as a value separator from a spreadsheet export
> because if you press tab when editing a cell, in all spreadsheets I am aware
> of, the tab is not inserted into the cell, but instead jumps you to editing
> the next cell.
> 
> Aloha from Hawaii,
> 
> Jim Bufalini
> 
>> -----Original Message-----
>> From: use-revolution-bounces at lists.runrev.com [mailto:use-revolution-
>> bounces at lists.runrev.com] On Behalf Of Richard Gaskin
>> Sent: Monday, March 08, 2010 8:55 AM
>> To: How to use Revolution
>> Subject: Re: Comma-delimited values
>> 
>> Paul D. DeRocco wrote:
>>> Add an inQuotes flag, with an initial value of false. For each item,
>> if it
>>> has a quote in it, toggle the inQuotes flag. Then, if inQuotes is
>> set,
>>> append a comma instead of a tab, to put the item back together again.
>> 
>> Roger that.  CSV elements may contain returns within quoted portions,
>> and escaping uses a wide range of conventions differing from program to
>> program (within the Microsoft family I've seen it differ from version
>> to
>> version of the same program).  The flag method, while notoriously slow,
>> is the only reliable method I've found for handling all the many
>> variants of CSV.
>> 
>> <rant>
>> A plea for sanity in the software development world:
>> 
>> While we need to write CSV importers from time to time, please never
>> write CSV exporters.
>> 
>> CSV must die.
>> 
>> The problem with CSV is that the comma is very commonly used in data,
>> making it a uniquely stupid choice as a delimiter.
>> 
>> That stupidity could be countered with consistent escaping, but no true
>> standard has emerged since the many decades of this productivity-abuse
>> began.
>> 
>> Making a stupid decision even more stupid, most CSV files quote
>> non-numeric values, as though the programmers did not realize the quote
>> character is commonly used in our language and therefore may likely be
>> part of the data. So using quote as an escape means that you must
>> escape
>> the escape.  Jeeminy who thinks up that merde?!?!
>> 
>> Sometimes the escape for in-data double-quotes is a double double-
>> quote,
>> which sometimes makes it hard to know what to do with empty values
>> shown
>> as "", esp. given that in some formats the empty value abuts the
>> adjacent commas and in others, like MySQL dumps, it abuts only the
>> trailing comma but has a space before the leading comma.  Other times
>> double-quotes are escaped with \", meaning you'll need to escape any
>> in-data backslashes too.
>> 
>> For thinking people, about the time you realize that you're escaping
>> the
>> escape that you've escaped to handle your data, it would occur to you
>> to
>> go back and check your original premise.  But not so for the creators
>> of
>> CSV.
>> 
>> As Jim Bufalini pointed out, tab-delimited or even (in fewer cases)
>> pipe-delimited make much saner options.
>> 
>> For my own delimited exports I've adopted the convention used in
>> FileMaker Pro and others, with escapes that are unlikely to be in the
>> data:
>> 
>> - records are delimited with returns
>> - fields delimited with tabs
>> - quotes are never added and are included only when they are part
>>   of the data
>> - any in-data returns are escaped with ASCII 11
>> - any in-data tabs escaped with ASCII 4
>> 
>> Simple to write, lightning-fast to parse.
>> 
>> When you add up all the programmer- and end-user-hours lost to dealing
>> with the uniquely stupid collection of mish-mashed ad-hoc formats that
>> is CSV, it amounts to nothing less than a crime against humanity.
>> Several hundred if not thousands of human lifetimes have been lost
>> either dealing with bugs related to CSV parsers, or simply waiting for
>> the inherently slow parsing of CSV that could have taken mere
>> milliseconds if done in any saner format.
>> 
>> CSV must die.
>> 
>> Please help it die:  never write CSV exporters.
>> </rant>
>> 
>> --
>>  Richard Gaskin
>>  Fourth World
>>  Rev training and consulting: http://www.fourthworld.com
>>  Webzine for Rev developers: http://www.revjournal.com
>>  revJournal blog: http://revjournal.com/blog.irv
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-revolution
> 
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution




More information about the use-livecode mailing list