CSV again.
Bob Sneidar
bobs at twft.com
Tue May 15 13:26:41 EDT 2012
<sigh> Another good developer lost to the csv parsing chasm of hell. We won't be hearing from Alex again. ;-)
Bob
On May 15, 2012, at 10:02 AM, Alex Tweedly wrote:
> Unfortunately, that's not enough to fix it, Peter.
>
> The problem case you have identified is where the CSV exporter has decided to quote even empty cells. This wasn't covered in the original samples, or in any cases I've had to deal with.
>
> Your workaround uses the sequence <comma & quote & quote & comma> to attempt to identify this case - but that only identifies it when it occurs in the "interior" cells within a record (line). You'd need to extend it to also cover the first cell in the line -
> i.e. <CR & quote & quote & comma>
> and the last cell on the line
> i.e. <comma & quote & quote & CR>
> and even the *only* cell on the line
> i.e. <CR & quote & quote & CR>
>
> and then subsequently un-replace each of those appropriately.
>
> BUT - there's an even worse problem - any of these sequences *can* occur within a quoted string - e.g. abc,"this cell contains an escaped quote ,"", within it", another cell
>
> Basically - the original idea ONLY works if the only time two quotes appear as consecutive characters is as an escaped quote within a quoted cell. (hmmm - that means there is another nasty corner case - where the escaped quote appears as the first character within a quoted cell, e.g. abc,"""quoted string""",def !!)
>
> Fixing this is going to require checking for the doubled quote and acting differently within the loop that alternates between 'inside' and 'outside' quoted cells; and of course that alternation depends on the discovery of quotes (and hence needs to look-ahead at subsequent characters to detect the doubled cases.
>
> I'll have a go at re-writing it using that method - but it is basically a re-write from scratch, so it may take an hour or two to make sure I've got all the cases covered (and I don't yet have any prediction about the performance).
>
> If you could send me your test data off-list that would be helpful.
>
> Thanks
> -- Alex.
>
> On 15/05/2012 02:00, Peter Haworth wrote:
>> Hi Alex,
>> Just toi clat=rify, this was two double quotes with a comma right before
>> and right after them, not an escaped double quote in the middle of string.
>>
>> I've made a fix to this which works, subject to your approval
>>
>> I changed the line:
>>
>> *replace* quote"e with tEscapedQuotePlaceholder in pData
>>
>>
>> to these three lines:
>>
>>
>> *replace* comma& quote& quote& comma with numToChar(31) in pData
>>
>> *replace* quote"e with tEscapedQuotePlaceholder in pData
>>
>> *replace* numToChar(31) with comma& quote& quote& comma in pData
>>
>>
>> That seems to have fixed it.
>>
>>
>> Pete
>> lcSQL Software<http://www.lcsql.com>
>>
>>
>>
>> On Mon, May 14, 2012 at 2:50 PM, Peter Haworth<pete at lcsql.com> wrote:
>>
>>> However, I have found another corner case and that is two consecutive
>>> double quote characters with no intervening characters. I'm still checking
>>> into it for sure, but it looks like what happens with that after running it
>>> through your function is a single quote character. Any thoughts on that?
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list