Another Revolution Success Story

Alex Tweedly alex at tweedly.net
Sun Oct 17 19:57:14 EDT 2004


Reviving an old thread from a few months ago ....

At 15:02 01/07/2004 -0700, Richard Gaskin wrote:

>If CSV were consistently implemented CSV2TabNew would work excellently 
>right out of the box, but since some CSVs escape quotes by doubling them I 
>needed to add one line (see below) to also substitute doubled quote chars 
>with the quote placeholder.
>
>Bonus: since the added line reduces the number of quote characters, the 
>function is now even faster.

>Here's CSV2Tab3:
>
>
>function CSV2Tab3 pData
>   local tNuData -- contains tabbed copy of data
>   local tReturnPlaceholder -- replaces cr in field data to avoid line
>   --                       breaks which would be misread as records;
>   --                       replaced later during dislay
>   local tEscapedQuotePlaceholder -- used for keeping track of quotes
>   --                       in data
>   local tInQuotedText -- flag set while reading data between quotes
>   --
>   put numtochar(11) into tReturnPlaceholder -- vertical tab as
>   --                       placeholder
>   put numtochar(2)  into tEscapedQuotePlaceholder -- used to simplify
>   --                       distinction between quotes in data and those
>   --                       used in delimiters
>   --
>   -- Normalize line endings:
>   replace crlf with cr in pData          -- Win to UNIX
>   replace numtochar(13) with cr in pData -- Mac to UNIX
>   --
>   -- Put placeholder in escaped quote (non-delimiter) chars:
>   replace ("\"&quote) with tEscapedQuotePlaceholder in pData
>   replace quote&quote with tEscapedQuotePlaceholder in pData --<NEW
>   --
>   put space before pData   -- to avoid ambiguity of starting context
>   split pData by quote
>   put False into tInsideQuoted
>   repeat for each element k in pData
>     if (tInsideQuoted) then
>       replace cr with tReturnPlaceholder in k
>       put k after tNuData
>       put False into tInsideQuoted
>     else
>       replace comma with tab in k
>       put k after tNuData
>       put true into tInsideQuoted
>     end if
>   end repeat
>   --
>   delete char 1 of tNuData -- remove the leading space
>   replace tEscapedQuotePlaceholder with quote in tNuData
>   return tNuData
>end CSV2Tab3

Unfortunately, there's a problem with this code; the heart of it is
   split pData by quote
   repeat for each element k in pData
      -- build up new string
   end repeat

and this is not guaranteed to work. The form "repeat for each element" will 
process the elements in the order of the keys of the array. Normally this 
is the correct order (because split by only a primary separator produces an 
array whose keys are consecutive integers), but there is no guarantee that 
they will always be.

And I've found at least one case where they're not - I have a spreadsheet 
which works just fine up to 3904 lines - but add one more line and it fails 
completely.
(verified by "put the keys of pData after msg")

Changing
    repeat for each element k in pData
to
   repeat with tCounter = 1 to the number of lines in the keys of pData
     put pData[tCounter] into k

solves it. Obviously it will be slower - but "slow and correct" beats "fast 
and wrong" :-)

-- Alex.




More information about the use-livecode mailing list