CSV again.

Alex Tweedly alex at tweedly.net
Tue May 15 18:54:37 EDT 2012


On 15/05/2012 18:26, Bob Sneidar wrote:
> <sigh>  Another good developer lost to the csv parsing chasm of hell. We won't be hearing from Alex again. ;-)
>
Don't worry Bob, I'm just a tourist here in the chasm, I'm not moving in :-)

Pete - please try this out on your data. AFAICT it should handle all the 
cases discussed here, and has the added benefit of being simpler and 
(slightly) easier to understand. Also, it uses no "global replace"s, so 
it would be much easier to modify it to handle very large files by 
reading bufferfulls at a time.

-- Alex.

> function CSV4Tab pData,pcoldelim
>     local tNuData -- contains tabbed copy of data
>     local tReturnPlaceholder -- replaces cr in field data to avoid line
>     --                       breaks which would be misread as records;
>     local tStatus, theInsideStringSoFar
>     --
>     put numtochar(11) into tReturnPlaceholder -- vertical tab as 
> placeholder
>     --
>     if pcoldelim is empty then put comma into pcoldelim
>     -- Normalize line endings:
>     replace crlf with cr in pData          -- Win to UNIX
>     replace numtochar(13) with cr in pData -- Mac to UNIX
>
>     put "outside" into tStatus
>     set the itemdel to quote
>     repeat for each item k in pData
>         switch tStatus
>
>             case "inside"
>                 put k after theInsideStringSoFar
>                 put "passedquote" into tStatus
>                 next repeat
>
>             case "passedquote"
>                 -- decide if it was a duplicated escapedQuote or a 
> closing quote
>                 if k is empty then   -- it's a duplicated quote
>                     put quote after theInsideStringSoFar
>                     put "inside" into tStatus
>                     next repeat
>                 end if
>                 -- not empty - so we should have a delimiter here
>                 if char 1 of k = pcoldelim or char 1 of k = cr then
>                     -- as we expect - we have just left the quoted string
>                     replace cr with tReturnPlaceholder in 
> theInsideStringSoFar
>                     put theInsideStringSoFar after tNuData
>                     -- and then deal with this outside item
>                     -- by falling through into the 'outsie' case
>                 else
>                     put "bad logic"
>                     break
>                 end if
>
>             case "outside"
>                 replace pcoldelim with numtochar(29) in k
>                 put k after tNuData
>                 put "inside" into tStatus
>                 put empty into theInsideStringSoFar
>                 next repeat
>             default
>                 put "defaulted"
>                 break
>         end switch
>     end repeat
>     return tNuData
> end CSV4Tab
>





More information about the use-livecode mailing list