Stupid CSV tricks

Craig Spooner cspooner at lamar.colostate.edu
Thu Jun 13 13:43:01 EDT 2002


Richard,

I don't know if this will be helpful, but I faced this problem with a 
project a year ago.  In my case, I had the data in an Excel file and wanted 
to export it to a tab-del text file, to be read by my MC app.  I believe 
even tab-delimited files will get the extra quotes around fields that 
contain commas, so I did a search-and-replace in Excel to find all of the 
real quotes and replace them with something like "{{".  After exporting 
from Excel, I knew that any quotes that appeared were put there by Excel 
and could be removed by my application.  I then replaced "{{" with quotes, 
and I was back in business.

Here's a piece of the actual code:

       replace quote with "" in gMasterList -- remove quotes inserted by Excel
       replace "{{" with quote in gMasterList -- replace orig quotes
       replace tab & space with tab in gMasterList -- remove leading spaces
       replace space & tab with tab in gMasterList -- remove trailing spaces

That may be just the type of kludge you're trying to avoid, but I offer it 
up in case there's a kernel of an idea you can build on.

regards,
Craig Spooner



>Date: Wed, 12 Jun 2002 14:22:53 -0700
>From: Richard Gaskin <ambassador at FourthWorld.com>
>Subject: Stupid CSV tricks
>To: MetaCard List <metacard at lists.runrev.com>
>Reply-To: metacard at lists.runrev.com
>
>Some implementations of the CSV format do not consistenly use
>comma-separated values.  Microsoft products and others use a comma only for
>numeric values, with all others designated as text by enclosing them in
>quotes, effectively using a quote-comma-quote delimiter.
>
>To make parsing such files even more of a challenge, a quoted string can
>contain any character, including commas and returns.
>
>I've tried a number of algorithms for parsing these files efficiently, and
>even explored the issue wih Ken Ray and others, and the only robust
>algorithm we've come up with yet is one which walks through each of the
>characters to determine what is a delimiter and wha is part of the data.
>
>I'd like to find a faster method, but thus far all attempts at using the
>replace command and replacetext function have fallen short in one way or
>another.
>
>Considering the ubiquity of this format, I would imagine I'm not the first
>MetaTalker needed to parse CSV.  Anyone found an algorithm faster that
>walking through the chars?
>
>--
>  Richard Gaskin
>  Fourth World Media Corporation
>  Custom Software and Web Development for All Major Platforms
>  Developer of WebMerge 2.0: Publish any Database on Any Site
>  ___________________________________________________________
>  Ambassador at FourthWorld.com       http://www.FourthWorld.com
>  Tel: 323-225-3717                       AIM: FourthWorldInc




More information about the metacard mailing list