CSV again.

Alex Tweedly alex at tweedly.net
Thu Oct 29 19:25:10 EDT 2015


On 29/10/2015 14:41, Mike Kerner wrote:
> Belay that.  Let's do this on the list.
>
Sure ...
> On Thu, Oct 29, 2015 at 10:22 AM, Mike Kerner <mike at mikekerner.com 
> <mailto:mike at mikekerner.com>> wrote:
>
>     1) In v3, why did you remove the <HT> substitution?  That just bit me.
>

Short answer : A bug.
Long answer : 2 bugs, but on the same line of code - so kind of just one 
bug really :-)
Very Long Answer :
I had a version (say, 2.9) which I tested properly. Then I added some 
more parameterization, and while doing that I thought "This line is 
wrong, it shouldn't be doing "replace TAB with ...", it should be using 
one of these new parameters". This was just plain wrong, so that's bug 
number 1.

Then I later realized that there was no case where I would need to do 
the "replace" as written - so I commented out the line (also, wrong - 
that's bug number 2).


Solution:
I enclose below a new version, csvToTab4. Only change (in the card 
script) is that line 37 changed from
     -- replace pOldItemDelim with pNewTAB in theInsideStringSoFar
to
     replace TAB with pNewTAB in theInsideStringSoFar

And with that change it does (AFAIK) properly produce <GS> (or whatever 
you pass in as pNewTAB) for any embedded TAB chars.

> 2) I'm not sure we should bore everyone else with the details on the 
> list, but I'd like to pick your brain about some of the details of 
> what you're thinking in various parts of this as I intend to do some 
> tweaking and commenting for future reference.
Yeah, it would be great to improve the comments, and hopefully explain 
what it's doing.

On 29/10/2015 15:01, Mike Kerner wrote:
> So beyond the embedded <HT>, I found another issue.  Let's say the string is
> "test<CR>"""
>
>
> The <CR> is not handled.
Hmmm - in my testing it is, I give it ( last line is same as this 
example you give )

INPUT

a,"b
c"
"c<TAB>d"
"e<CR>"""

and get OUTPUT
a<TAB>b<VT>c
c<GS>d
e<VT>"

which I think is correct. Do you have a more complex test case, or do 
you get different results ? Can you send me thae case where you see the 
problem (off-list) ?  Thanks.

> Should you perhaps do your substitutions on the "inside", instead of on the
> "passedQuote"?
>
Hmmm - tempting, but no.

Firstly, it would need to do the replace in the current item both for 
status = 'inside' and 'passedquote' because if you have input like
    "one<TAB> two""three""four<TAB>five"
the status goes from 'inside' to 'passedquote' to 'inside' to 
'passedquote' to etc. and for the latter TAB character it is 'passedquote'.

More generally, I want to do these substitutions in as few places as 
possible (i.e. so that I am passing the longest possible string to the 
engine to do a speedy 'replace'), so the best time to do that after 
'passedquote'.

New version
function CSVToTab4 pData, pOldLineDelim, pOldItemDelim, pNewCR, pNewTAB
    -- fill in defaults
    if pOldLineDelim is empty then put CR into pOldLineDelim
    if pOldItemDelim is empty then put COMMA into pOldItemDelim
    if pNewCR is empty then put numtochar(11) into pNewCR   -- Use <VT> 
for quoted CRs
    if pNewTAB is empty then put numtochar(29) into pNewTAB      -- Use 
<GS> (group separator) for quoted TABs

    local tNuData                         -- contains tabbed copy of data

    local tStatus, theInsideStringSoFar

    -- Normalize line endings: REMOVED
    -- Will normaly be correct already, only binfile: or similar chould 
make this necessary
    -- and that exceptional case should be the caller's responsibility

    put "outside" into tStatus
    set the itemdel to quote
    repeat for each item k in pData
       -- put tStatus && k & CR after msg
       switch tStatus

          case "inside"
             put k after theInsideStringSoFar
             put "passedquote" into tStatus
             next repeat

          case "passedquote"
             -- decide if it was a duplicated escapedQuote or a closing 
quote
             if k is empty then   -- it's a duplicated quote
                put quote after theInsideStringSoFar
                put "inside" into tStatus
                next repeat
             end if
             -- not empty - so we remain inside the cell, though we have 
left the quoted section
             -- NB this allows for quoted sub-strings within the cell 
content !!
             replace pOldLineDelim with pNewCR in theInsideStringSoFar
             replace TAB with pNewTAB in theInsideStringSoFar
             put theInsideStringSoFar after tNuData

          case "outside"
             replace pOldItemDelim with TAB in k
             -- and deal with the "empty trailing item" issue in Livecode
             replace (pNewTAB & pOldLineDelim) with pNewTAB & pNewTAB & 
CR in k
             put k after tNuData
             put "inside" into tStatus
             put empty into theInsideStringSoFar
             next repeat
          default
             put "defaulted"
             break
       end switch
    end repeat

    -- and finally deal with the trailing item isse in input data
    -- i.e. the very last char is a quote, so there is no trigger to 
flush the
    --      last item
    if the last char of pData = quote then
       put theInsideStringSoFar after tNuData
    end if

    return tNuData
end CSVToTab4

-- Alex.



More information about the use-livecode mailing list