A chunking mystery

Richard Gaskin ambassador at fourthworld.com
Fri Mar 14 11:30:01 EDT 2008


David Coker wrote:

 >>  Consider (for starters) what happens if you have embedded commas in a
 >>  line like this:
 >>
 >>  "Los Angeles, CA",0,1
 >>
 >>  Column 3 in your CSV file is "1"
 >>  Item 3 is "0"
 >>
 >>  = Trouble!
 >
 > You are certainly correct there, my friend!
 > In this case though, as part of the "process" for the raw source files
 > that are already in a somewhat usable CSV format, I first convert the
 > data into a piped "|" format, then strip out any extra quotes and
 > commas *and* shuffle the target field up to the 2nd item position.
 > Afterwards, I convert the data back to CSV format before doing any
 > additional processing.

Have you considered converting to tab-delimited instead of back to CSV?

Because commas appear so frequently in data, they're notoriously 
cumbersome to parse.

With my WebMerge product I have to handle just about any delimited 
columnar format, and it converts everything to tab-delimited internally 
for simpler processing and to help me with debugging, since I can dump 
contents into a field in one move and they naturally display in columns.

At the risk of sounding like I'm on a crusade against CSV (I am), it may 
be even better to go back to the source of the documents and request a 
version in a less ridiculous format. ;)

I understand that may not be possible, but damn! it amazes me how many 
programs continue to perpetuate the fundamental conceptual error that is 
CSV (or rather "are", since there is no single CSV specification; 
indeed, even Microsoft uses different variants among their apps, 
sometimes switching escaping conventions between versions).

I dream of the day when the CSV dies the natural death that should have 
made it extinct decades ago if only we lived in a rational world.

For my own part, in every app I write that exports data, we support 
tab-delimited, XML, and others, but I cannot in good conscience export 
to any flavor of CSV.

I would estimate that the world has lost several million programmer 
hours to dealing with the inherent idiocies of CSV since it premiered. 
Imagine what could have been accomplished if that time were put into 
features....

-- 
  Richard Gaskin
  Fourth World Media Corporation
  Developer of WebMerge: Publish any database on any Web site
  ___________________________________________________________
  Ambassador at FourthWorld.com       http://www.FourthWorld.com



More information about the use-livecode mailing list