CSV again.

Alex Tweedly alex at tweedly.net
Thu Oct 15 22:55:40 EDT 2015


Hmmmm ... my quick test of what was csv4Tab, but is now called csvToTab1 
- see below - gives me
(showing results with a colon  ':' for the cell delimiter, i.e. 
replacing numtochar(29) from the code in the previous use-list code

a,b,c   ---> a:b:c
"a","","c" ---> a::c

Now to me, that's what it should give - so I think it gets it right :-)

Question is
a. do you get the same result ?
     if not, what do you get ?  OR can you try with the code below
     if you do, but disagree that this is right, what do you think it 
should give ?

-- Alex

function CSVToTab1 pData,pcoldelim
    local tNuData -- contains tabbed copy of data
    local tReturnPlaceholder -- replaces cr in field data to avoid line
    --                       breaks which would be misread as records;
    local tNuDelim  -- new character to replace the delimiter
    local tStatus, theInsideStringSoFar
    --
    put numtochar(11) into tReturnPlaceholder -- vertical tab as placeholder
    put numtochar(29) into tNuDelim
    --
    if pcoldelim is empty then put comma into pcoldelim
    -- Normalize line endings:
    replace crlf with cr in pData          -- Win to UNIX
    replace numtochar(13) with cr in pData -- Mac to UNIX

    put "outside" into tStatus
    set the itemdel to quote
    repeat for each item k in pData
       -- put tStatus && k & CR after msg
       switch tStatus

          case "inside"
             put k after theInsideStringSoFar
             put "passedquote" into tStatus
             next repeat

          case "passedquote"
             -- decide if it was a duplicated escapedQuote or a closing 
quote
             if k is empty then   -- it's a duplicated quote
                put quote after theInsideStringSoFar
                put "inside" into tStatus
                next repeat
             end if
             -- not empty - so we remain inside the cell, though we have 
left the quoted section
             -- NB this allows for quoted sub-strings within the cell 
content !!
             replace cr with tReturnPlaceholder in theInsideStringSoFar
             put theInsideStringSoFar after tNuData

          case "outside"
             replace pcoldelim with tNuDelim in k
             -- and deal with the "empty trailing item" issue in Livecode
             replace (tNuDelim & CR) with tNuDelim & tNuDelim & CR in k
             put k after tNuData
             put "inside" into tStatus
             put empty into theInsideStringSoFar
             next repeat
          default
             put "defaulted"
             break
       end switch
    end repeat
    return tNuData
end CSVToTab1

On 16/10/2015 01:34, Mike Kerner wrote:
> csv4 does not handle it, and it comes up with a different result from csv2
> (which is also wrong).  I sent Richard proposed changes to csv2 which
> addresses that issue, but I'll wait while we collectively try to remember
> what the latest and greatest csv parser algorithm is before I try to come
> up with more ways to break or fix it.
>
> On Thu, Oct 15, 2015 at 8:24 PM, Alex Tweedly <alex at tweedly.net> wrote:
>
>> Richard et al.,
>>
>> sometime after that article, there was a further thread on the use-list.
>> Pete Haworth found a case not properly covered by the version on the
>> article, and I came up with a revised version (cutely called csv4Tab !! -
>> csv3Tab was an interim, deeply buggy attempt)
>>
>> (It's in
>> http://lists.runrev.com/pipermail/use-livecode/2012-May/172275.html )
>>
>> It *looks* from that thread (
>> http://lists.runrev.com/pipermail/use-livecode/2012-May/172191.html ) as
>> though this case had been discussed, and the re-write should properly
>> handle it - but I haven't yet had time to try it. My laptop has been
>> replaced in the meantime, and I can't find my test stack, and recreating it
>> and finding the test data is a bit too much for after 1am:-)
>>
>> So I'll try it tomorrow; hopefully csv4Tab() will already work for this
>> case. If it doesn't, we can try again :-)
>>
>> -- Alex.
>>
>>
>> On 16/10/2015 00:34, Richard Gaskin wrote:
>>
>>> Mike Kerner wrote:
>>>> Alex, Richard, etc.
>>>>
>>>> What do we consider the latest version of the csv parser?  I think I
>>>> found a bug in Richard's CSV2Text code, and proposed changes, but he
>>>> wanted the discussion to go down over here, first.  Then I noticed
>>>> that csv4Text is out over here, which makes 2, I guess, a bit long in
>>>> the tooth.
>>> The version referred to here as "Richard's" is the famous Tweedly algo,
>>> in the middle of this page:
>>> <http://www.fourthworld.com/embassy/articles/csv-must-die.html>
>>>
>>> Alex came up with that after a a bunch of us here had a long discussion
>>> about the many variants of CSV running around, and how stupidly complex
>>> they are to parse (see the details in that article).
>>>
>>> Mike wrote me this afternoon letting me know that there's yet another
>>> exception that doesn't seem to be accounted for there:
>>>
>>>     "value","","value"
>>>
>>> I had thought we'd covered that in the earlier discussion, but perhaps
>>> not.
>>>
>>> So this seems like a good time to once again bring together the best
>>> minds in our community (are you listening Alex Tweedly, Geoff Canyon, Mark
>>> Weider, Dick Kreisel, and others?) to see if we can revisit CSV parsing and
>>> come up with a function that can parse it into tabs efficiently, while
>>> taking into account all of the really stupid exceptions that have crept
>>> into the world since that really stupid format was first popularized.
>>>
>>> When we're done I'll update the article, and add even more sarcastic
>>> comments about what a really dumb idea it was to have encouraged people to
>>> delimit text with a character so frequently appearing in text.
>>>
>>> --
>>>   Richard Gaskin
>>>   Fourth World Systems
>>>   Software Design and Development for the Desktop, Mobile, and the Web
>>>   ____________________________________________________________________
>>>   Ambassador at FourthWorld.com http://www.FourthWorld.com
>>>
>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>
>





More information about the use-livecode mailing list