CSV again.

Peter Haworth pete at lcsql.com
Tue May 15 19:35:23 EDT 2012


Thanks Alex.

I ran the same data though your new handler and it seems to have worked
fine.

There was a recent discussion on some of these corner case issues on the
sqlite list so I'll go grab their test cases and see what happens.

As far as performance, the new handler took approx 2 1/2 times longer than
the CSV3 version on my 48k rows/17 columns dataset, but that's still only
about 1 second so definitely not a concern as mentioned previously.

Pete
lcSQL Software <http://www.lcsql.com>



On Tue, May 15, 2012 at 3:54 PM, Alex Tweedly <alex at tweedly.net> wrote:

> On 15/05/2012 18:26, Bob Sneidar wrote:
>
>> <sigh>  Another good developer lost to the csv parsing chasm of hell. We
>> won't be hearing from Alex again. ;-)
>>
>>  Don't worry Bob, I'm just a tourist here in the chasm, I'm not moving in
> :-)
>
> Pete - please try this out on your data. AFAICT it should handle all the
> cases discussed here, and has the added benefit of being simpler and
> (slightly) easier to understand. Also, it uses no "global replace"s, so it
> would be much easier to modify it to handle very large files by reading
> bufferfulls at a time.
>
> -- Alex.
>
>  function CSV4Tab pData,pcoldelim
>>    local tNuData -- contains tabbed copy of data
>>    local tReturnPlaceholder -- replaces cr in field data to avoid line
>>    --                       breaks which would be misread as records;
>>    local tStatus, theInsideStringSoFar
>>    --
>>    put numtochar(11) into tReturnPlaceholder -- vertical tab as
>> placeholder
>>    --
>>    if pcoldelim is empty then put comma into pcoldelim
>>    -- Normalize line endings:
>>    replace crlf with cr in pData          -- Win to UNIX
>>    replace numtochar(13) with cr in pData -- Mac to UNIX
>>
>>    put "outside" into tStatus
>>    set the itemdel to quote
>>    repeat for each item k in pData
>>        switch tStatus
>>
>>            case "inside"
>>                put k after theInsideStringSoFar
>>                put "passedquote" into tStatus
>>                next repeat
>>
>>            case "passedquote"
>>                -- decide if it was a duplicated escapedQuote or a closing
>> quote
>>                if k is empty then   -- it's a duplicated quote
>>                    put quote after theInsideStringSoFar
>>                    put "inside" into tStatus
>>                    next repeat
>>                end if
>>                -- not empty - so we should have a delimiter here
>>                if char 1 of k = pcoldelim or char 1 of k = cr then
>>                    -- as we expect - we have just left the quoted string
>>                    replace cr with tReturnPlaceholder in
>> theInsideStringSoFar
>>                    put theInsideStringSoFar after tNuData
>>                    -- and then deal with this outside item
>>                    -- by falling through into the 'outsie' case
>>                else
>>                    put "bad logic"
>>                    break
>>                end if
>>
>>            case "outside"
>>                replace pcoldelim with numtochar(29) in k
>>                put k after tNuData
>>                put "inside" into tStatus
>>                put empty into theInsideStringSoFar
>>                next repeat
>>            default
>>                put "defaulted"
>>                break
>>        end switch
>>    end repeat
>>    return tNuData
>> end CSV4Tab
>>
>>
>
> ______________________________**_________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>
>



More information about the use-livecode mailing list