CSV again.
Alex Tweedly
alex at tweedly.net
Wed May 16 18:00:46 EDT 2012
On 16/05/2012 00:35, Peter Haworth wrote:
> Thanks Alex.
>
> I ran the same data though your new handler and it seems to have worked
> fine.
>
> There was a recent discussion on some of these corner case issues on the
> sqlite list so I'll go grab their test cases and see what happens.
>
> As far as performance, the new handler took approx 2 1/2 times longer than
> the CSV3 version on my 48k rows/17 columns dataset, but that's still only
> about 1 second so definitely not a concern as mentioned previously.
>
I tried it out with this new test data. It has the odd characteristic of
having partially quoted strings within the cell content; I've adjusted
the script to allow for that (by removing one logic check). I've also
added a line to add an extra empty item at the end of a line whenever
the last item is already empty (i.e. to deal with Livecode's method of
ignoring blank trailing items).
With these changes, csv4Tab() gets same results as the original
csv2Tab() did, and they fit with what I think is correct for this
strange data set :-)
Performance is still better than csv2Tab was, but sadly not as quick as
(the incorrect) csv3Tab was.
> function CSV4Tab pData,pcoldelim
> local tNuData -- contains tabbed copy of data
> local tReturnPlaceholder -- replaces cr in field data to avoid line
> -- breaks which would be misread as records;
> local tNuDelim -- new character to replace the delimiter
> local tStatus, theInsideStringSoFar
> --
> put numtochar(11) into tReturnPlaceholder -- vertical tab as
> placeholder
> put numtochar(29) into tNuDelim
> --
> if pcoldelim is empty then put comma into pcoldelim
> -- Normalize line endings:
> replace crlf with cr in pData -- Win to UNIX
> replace numtochar(13) with cr in pData -- Mac to UNIX
>
> put "outside" into tStatus
> set the itemdel to quote
> repeat for each item k in pData
> -- put tStatus && k & CR after msg
> switch tStatus
>
> case "inside"
> put k after theInsideStringSoFar
> put "passedquote" into tStatus
> next repeat
>
> case "passedquote"
> -- decide if it was a duplicated escapedQuote or a
> closing quote
> if k is empty then -- it's a duplicated quote
> put quote after theInsideStringSoFar
> put "inside" into tStatus
> next repeat
> end if
> -- not empty - so we remain inside the cell, though we
> have left the quoted section
> -- NB this allows for quoted sub-strings within the
> cell content !!
> replace cr with tReturnPlaceholder in theInsideStringSoFar
> put theInsideStringSoFar after tNuData
>
> case "outside"
> replace pcoldelim with tNuDelim in k
> -- and deal with the "empty trailing item" issue in
> Livecode
> replace (tNuDelim & CR) with tNuDelim & tNuDelim & CR in k
> put k after tNuData
> put "inside" into tStatus
> put empty into theInsideStringSoFar
> next repeat
> default
> put "defaulted"
> break
> end switch
> end repeat
> return tNuData
> end CSV4Tab
>
-- Alex.
More information about the use-livecode
mailing list