CSV again.
Alex Tweedly
alex at tweedly.net
Thu Oct 15 22:55:40 EDT 2015
Hmmmm ... my quick test of what was csv4Tab, but is now called csvToTab1
- see below - gives me
(showing results with a colon ':' for the cell delimiter, i.e.
replacing numtochar(29) from the code in the previous use-list code
a,b,c ---> a:b:c
"a","","c" ---> a::c
Now to me, that's what it should give - so I think it gets it right :-)
Question is
a. do you get the same result ?
if not, what do you get ? OR can you try with the code below
if you do, but disagree that this is right, what do you think it
should give ?
-- Alex
function CSVToTab1 pData,pcoldelim
local tNuData -- contains tabbed copy of data
local tReturnPlaceholder -- replaces cr in field data to avoid line
-- breaks which would be misread as records;
local tNuDelim -- new character to replace the delimiter
local tStatus, theInsideStringSoFar
--
put numtochar(11) into tReturnPlaceholder -- vertical tab as placeholder
put numtochar(29) into tNuDelim
--
if pcoldelim is empty then put comma into pcoldelim
-- Normalize line endings:
replace crlf with cr in pData -- Win to UNIX
replace numtochar(13) with cr in pData -- Mac to UNIX
put "outside" into tStatus
set the itemdel to quote
repeat for each item k in pData
-- put tStatus && k & CR after msg
switch tStatus
case "inside"
put k after theInsideStringSoFar
put "passedquote" into tStatus
next repeat
case "passedquote"
-- decide if it was a duplicated escapedQuote or a closing
quote
if k is empty then -- it's a duplicated quote
put quote after theInsideStringSoFar
put "inside" into tStatus
next repeat
end if
-- not empty - so we remain inside the cell, though we have
left the quoted section
-- NB this allows for quoted sub-strings within the cell
content !!
replace cr with tReturnPlaceholder in theInsideStringSoFar
put theInsideStringSoFar after tNuData
case "outside"
replace pcoldelim with tNuDelim in k
-- and deal with the "empty trailing item" issue in Livecode
replace (tNuDelim & CR) with tNuDelim & tNuDelim & CR in k
put k after tNuData
put "inside" into tStatus
put empty into theInsideStringSoFar
next repeat
default
put "defaulted"
break
end switch
end repeat
return tNuData
end CSVToTab1
On 16/10/2015 01:34, Mike Kerner wrote:
> csv4 does not handle it, and it comes up with a different result from csv2
> (which is also wrong). I sent Richard proposed changes to csv2 which
> addresses that issue, but I'll wait while we collectively try to remember
> what the latest and greatest csv parser algorithm is before I try to come
> up with more ways to break or fix it.
>
> On Thu, Oct 15, 2015 at 8:24 PM, Alex Tweedly <alex at tweedly.net> wrote:
>
>> Richard et al.,
>>
>> sometime after that article, there was a further thread on the use-list.
>> Pete Haworth found a case not properly covered by the version on the
>> article, and I came up with a revised version (cutely called csv4Tab !! -
>> csv3Tab was an interim, deeply buggy attempt)
>>
>> (It's in
>> http://lists.runrev.com/pipermail/use-livecode/2012-May/172275.html )
>>
>> It *looks* from that thread (
>> http://lists.runrev.com/pipermail/use-livecode/2012-May/172191.html ) as
>> though this case had been discussed, and the re-write should properly
>> handle it - but I haven't yet had time to try it. My laptop has been
>> replaced in the meantime, and I can't find my test stack, and recreating it
>> and finding the test data is a bit too much for after 1am:-)
>>
>> So I'll try it tomorrow; hopefully csv4Tab() will already work for this
>> case. If it doesn't, we can try again :-)
>>
>> -- Alex.
>>
>>
>> On 16/10/2015 00:34, Richard Gaskin wrote:
>>
>>> Mike Kerner wrote:
>>>> Alex, Richard, etc.
>>>>
>>>> What do we consider the latest version of the csv parser? I think I
>>>> found a bug in Richard's CSV2Text code, and proposed changes, but he
>>>> wanted the discussion to go down over here, first. Then I noticed
>>>> that csv4Text is out over here, which makes 2, I guess, a bit long in
>>>> the tooth.
>>> The version referred to here as "Richard's" is the famous Tweedly algo,
>>> in the middle of this page:
>>> <http://www.fourthworld.com/embassy/articles/csv-must-die.html>
>>>
>>> Alex came up with that after a a bunch of us here had a long discussion
>>> about the many variants of CSV running around, and how stupidly complex
>>> they are to parse (see the details in that article).
>>>
>>> Mike wrote me this afternoon letting me know that there's yet another
>>> exception that doesn't seem to be accounted for there:
>>>
>>> "value","","value"
>>>
>>> I had thought we'd covered that in the earlier discussion, but perhaps
>>> not.
>>>
>>> So this seems like a good time to once again bring together the best
>>> minds in our community (are you listening Alex Tweedly, Geoff Canyon, Mark
>>> Weider, Dick Kreisel, and others?) to see if we can revisit CSV parsing and
>>> come up with a function that can parse it into tabs efficiently, while
>>> taking into account all of the really stupid exceptions that have crept
>>> into the world since that really stupid format was first popularized.
>>>
>>> When we're done I'll update the article, and add even more sarcastic
>>> comments about what a really dumb idea it was to have encouraged people to
>>> delimit text with a character so frequently appearing in text.
>>>
>>> --
>>> Richard Gaskin
>>> Fourth World Systems
>>> Software Design and Development for the Desktop, Mobile, and the Web
>>> ____________________________________________________________________
>>> Ambassador at FourthWorld.com http://www.FourthWorld.com
>>>
>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>
>
More information about the use-livecode
mailing list