A chunking mystery
Richard Gaskin
ambassador at fourthworld.com
Fri Mar 14 11:30:01 EDT 2008
David Coker wrote:
>> Consider (for starters) what happens if you have embedded commas in a
>> line like this:
>>
>> "Los Angeles, CA",0,1
>>
>> Column 3 in your CSV file is "1"
>> Item 3 is "0"
>>
>> = Trouble!
>
> You are certainly correct there, my friend!
> In this case though, as part of the "process" for the raw source files
> that are already in a somewhat usable CSV format, I first convert the
> data into a piped "|" format, then strip out any extra quotes and
> commas *and* shuffle the target field up to the 2nd item position.
> Afterwards, I convert the data back to CSV format before doing any
> additional processing.
Have you considered converting to tab-delimited instead of back to CSV?
Because commas appear so frequently in data, they're notoriously
cumbersome to parse.
With my WebMerge product I have to handle just about any delimited
columnar format, and it converts everything to tab-delimited internally
for simpler processing and to help me with debugging, since I can dump
contents into a field in one move and they naturally display in columns.
At the risk of sounding like I'm on a crusade against CSV (I am), it may
be even better to go back to the source of the documents and request a
version in a less ridiculous format. ;)
I understand that may not be possible, but damn! it amazes me how many
programs continue to perpetuate the fundamental conceptual error that is
CSV (or rather "are", since there is no single CSV specification;
indeed, even Microsoft uses different variants among their apps,
sometimes switching escaping conventions between versions).
I dream of the day when the CSV dies the natural death that should have
made it extinct decades ago if only we lived in a rational world.
For my own part, in every app I write that exports data, we support
tab-delimited, XML, and others, but I cannot in good conscience export
to any flavor of CSV.
I would estimate that the world has lost several million programmer
hours to dealing with the inherent idiocies of CSV since it premiered.
Imagine what could have been accomplished if that time were put into
features....
--
Richard Gaskin
Fourth World Media Corporation
Developer of WebMerge: Publish any database on any Web site
___________________________________________________________
Ambassador at FourthWorld.com http://www.FourthWorld.com
More information about the use-livecode
mailing list