CSV again.

Alex Tweedly alex at tweedly.net
Tue May 15 13:02:58 EDT 2012


Unfortunately, that's not enough to fix it, Peter.

The problem case you have identified is where the CSV exporter has 
decided to quote even empty cells. This wasn't covered in the original 
samples, or in any cases I've had to deal with.

Your workaround uses the sequence <comma & quote & quote & comma> to 
attempt to identify this case - but that only identifies it when it 
occurs in the "interior" cells within a record (line). You'd need to 
extend it to also cover the first cell in the line -
   i.e. <CR & quote & quote & comma>
and the last cell on the line
   i.e. <comma & quote & quote & CR>
and even the *only* cell on the line
   i.e. <CR & quote & quote & CR>

and then subsequently un-replace each of those appropriately.

BUT - there's an even worse problem - any of these sequences *can* occur 
within a quoted string - e.g.   abc,"this cell contains an escaped quote 
,"", within it", another cell

Basically - the original idea ONLY works if the only time two quotes 
appear as consecutive characters is as an escaped quote within a quoted 
cell.    (hmmm - that means there is another nasty corner case - where 
the escaped quote appears as the first character within a quoted cell, 
e.g. abc,"""quoted string""",def !!)

Fixing this is going to require checking for the doubled quote and 
acting differently within the loop that alternates between 'inside' and 
'outside' quoted cells; and of course that alternation depends on the 
discovery of quotes (and hence needs to look-ahead at subsequent 
characters to detect the doubled cases.

I'll have a go at re-writing it using that method - but it is basically 
a re-write from scratch, so it may take an hour or two to make sure I've 
got all the cases covered (and I don't yet have any prediction about the 
performance).

If you could send me your test data off-list that would be helpful.

Thanks
-- Alex.

On 15/05/2012 02:00, Peter Haworth wrote:
> Hi Alex,
> Just toi clat=rify, this was two double quotes with a comma right before
> and right after them, not an escaped double quote in the middle of string.
>
> I've made a fix to this which works, subject to your approval
>
> I changed the line:
>
> *replace* quote&quote with tEscapedQuotePlaceholder in pData
>
>
> to these three lines:
>
>
> *replace* comma&  quote&  quote&  comma with numToChar(31) in pData
>
> *replace* quote&quote with tEscapedQuotePlaceholder in pData
>
> *replace* numToChar(31) with comma&  quote&  quote&  comma in pData
>
>
> That seems to have fixed it.
>
>
> Pete
> lcSQL Software<http://www.lcsql.com>
>
>
>
> On Mon, May 14, 2012 at 2:50 PM, Peter Haworth<pete at lcsql.com>  wrote:
>
>> However, I have found another corner case and that is two consecutive
>> double quote characters with no intervening characters.  I'm still checking
>> into it for sure, but it looks like what happens with that after running it
>> through your function is a single quote character.  Any thoughts on that?
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>





More information about the use-livecode mailing list