CSV again.
Alex Tweedly
alex at tweedly.net
Tue May 15 13:02:58 EDT 2012
Unfortunately, that's not enough to fix it, Peter.
The problem case you have identified is where the CSV exporter has
decided to quote even empty cells. This wasn't covered in the original
samples, or in any cases I've had to deal with.
Your workaround uses the sequence <comma & quote & quote & comma> to
attempt to identify this case - but that only identifies it when it
occurs in the "interior" cells within a record (line). You'd need to
extend it to also cover the first cell in the line -
i.e. <CR & quote & quote & comma>
and the last cell on the line
i.e. <comma & quote & quote & CR>
and even the *only* cell on the line
i.e. <CR & quote & quote & CR>
and then subsequently un-replace each of those appropriately.
BUT - there's an even worse problem - any of these sequences *can* occur
within a quoted string - e.g. abc,"this cell contains an escaped quote
,"", within it", another cell
Basically - the original idea ONLY works if the only time two quotes
appear as consecutive characters is as an escaped quote within a quoted
cell. (hmmm - that means there is another nasty corner case - where
the escaped quote appears as the first character within a quoted cell,
e.g. abc,"""quoted string""",def !!)
Fixing this is going to require checking for the doubled quote and
acting differently within the loop that alternates between 'inside' and
'outside' quoted cells; and of course that alternation depends on the
discovery of quotes (and hence needs to look-ahead at subsequent
characters to detect the doubled cases.
I'll have a go at re-writing it using that method - but it is basically
a re-write from scratch, so it may take an hour or two to make sure I've
got all the cases covered (and I don't yet have any prediction about the
performance).
If you could send me your test data off-list that would be helpful.
Thanks
-- Alex.
On 15/05/2012 02:00, Peter Haworth wrote:
> Hi Alex,
> Just toi clat=rify, this was two double quotes with a comma right before
> and right after them, not an escaped double quote in the middle of string.
>
> I've made a fix to this which works, subject to your approval
>
> I changed the line:
>
> *replace* quote"e with tEscapedQuotePlaceholder in pData
>
>
> to these three lines:
>
>
> *replace* comma& quote& quote& comma with numToChar(31) in pData
>
> *replace* quote"e with tEscapedQuotePlaceholder in pData
>
> *replace* numToChar(31) with comma& quote& quote& comma in pData
>
>
> That seems to have fixed it.
>
>
> Pete
> lcSQL Software<http://www.lcsql.com>
>
>
>
> On Mon, May 14, 2012 at 2:50 PM, Peter Haworth<pete at lcsql.com> wrote:
>
>> However, I have found another corner case and that is two consecutive
>> double quote characters with no intervening characters. I'm still checking
>> into it for sure, but it looks like what happens with that after running it
>> through your function is a single quote character. Any thoughts on that?
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
More information about the use-livecode
mailing list