Another Revolution Success Story
Alex Tweedly
alex at tweedly.net
Sun Oct 17 19:57:14 EDT 2004
Reviving an old thread from a few months ago ....
At 15:02 01/07/2004 -0700, Richard Gaskin wrote:
>If CSV were consistently implemented CSV2TabNew would work excellently
>right out of the box, but since some CSVs escape quotes by doubling them I
>needed to add one line (see below) to also substitute doubled quote chars
>with the quote placeholder.
>
>Bonus: since the added line reduces the number of quote characters, the
>function is now even faster.
>Here's CSV2Tab3:
>
>
>function CSV2Tab3 pData
> local tNuData -- contains tabbed copy of data
> local tReturnPlaceholder -- replaces cr in field data to avoid line
> -- breaks which would be misread as records;
> -- replaced later during dislay
> local tEscapedQuotePlaceholder -- used for keeping track of quotes
> -- in data
> local tInQuotedText -- flag set while reading data between quotes
> --
> put numtochar(11) into tReturnPlaceholder -- vertical tab as
> -- placeholder
> put numtochar(2) into tEscapedQuotePlaceholder -- used to simplify
> -- distinction between quotes in data and those
> -- used in delimiters
> --
> -- Normalize line endings:
> replace crlf with cr in pData -- Win to UNIX
> replace numtochar(13) with cr in pData -- Mac to UNIX
> --
> -- Put placeholder in escaped quote (non-delimiter) chars:
> replace ("\""e) with tEscapedQuotePlaceholder in pData
> replace quote"e with tEscapedQuotePlaceholder in pData --<NEW
> --
> put space before pData -- to avoid ambiguity of starting context
> split pData by quote
> put False into tInsideQuoted
> repeat for each element k in pData
> if (tInsideQuoted) then
> replace cr with tReturnPlaceholder in k
> put k after tNuData
> put False into tInsideQuoted
> else
> replace comma with tab in k
> put k after tNuData
> put true into tInsideQuoted
> end if
> end repeat
> --
> delete char 1 of tNuData -- remove the leading space
> replace tEscapedQuotePlaceholder with quote in tNuData
> return tNuData
>end CSV2Tab3
Unfortunately, there's a problem with this code; the heart of it is
split pData by quote
repeat for each element k in pData
-- build up new string
end repeat
and this is not guaranteed to work. The form "repeat for each element" will
process the elements in the order of the keys of the array. Normally this
is the correct order (because split by only a primary separator produces an
array whose keys are consecutive integers), but there is no guarantee that
they will always be.
And I've found at least one case where they're not - I have a spreadsheet
which works just fine up to 3904 lines - but add one more line and it fails
completely.
(verified by "put the keys of pData after msg")
Changing
repeat for each element k in pData
to
repeat with tCounter = 1 to the number of lines in the keys of pData
put pData[tCounter] into k
solves it. Obviously it will be slower - but "slow and correct" beats "fast
and wrong" :-)
-- Alex.
More information about the use-livecode
mailing list