CSV again.
Alex Tweedly
alex at tweedly.net
Thu Oct 29 19:25:10 EDT 2015
On 29/10/2015 14:41, Mike Kerner wrote:
> Belay that. Let's do this on the list.
>
Sure ...
> On Thu, Oct 29, 2015 at 10:22 AM, Mike Kerner <mike at mikekerner.com
> <mailto:mike at mikekerner.com>> wrote:
>
> 1) In v3, why did you remove the <HT> substitution? That just bit me.
>
Short answer : A bug.
Long answer : 2 bugs, but on the same line of code - so kind of just one
bug really :-)
Very Long Answer :
I had a version (say, 2.9) which I tested properly. Then I added some
more parameterization, and while doing that I thought "This line is
wrong, it shouldn't be doing "replace TAB with ...", it should be using
one of these new parameters". This was just plain wrong, so that's bug
number 1.
Then I later realized that there was no case where I would need to do
the "replace" as written - so I commented out the line (also, wrong -
that's bug number 2).
Solution:
I enclose below a new version, csvToTab4. Only change (in the card
script) is that line 37 changed from
-- replace pOldItemDelim with pNewTAB in theInsideStringSoFar
to
replace TAB with pNewTAB in theInsideStringSoFar
And with that change it does (AFAIK) properly produce <GS> (or whatever
you pass in as pNewTAB) for any embedded TAB chars.
> 2) I'm not sure we should bore everyone else with the details on the
> list, but I'd like to pick your brain about some of the details of
> what you're thinking in various parts of this as I intend to do some
> tweaking and commenting for future reference.
Yeah, it would be great to improve the comments, and hopefully explain
what it's doing.
On 29/10/2015 15:01, Mike Kerner wrote:
> So beyond the embedded <HT>, I found another issue. Let's say the string is
> "test<CR>"""
>
>
> The <CR> is not handled.
Hmmm - in my testing it is, I give it ( last line is same as this
example you give )
INPUT
a,"b
c"
"c<TAB>d"
"e<CR>"""
and get OUTPUT
a<TAB>b<VT>c
c<GS>d
e<VT>"
which I think is correct. Do you have a more complex test case, or do
you get different results ? Can you send me thae case where you see the
problem (off-list) ? Thanks.
> Should you perhaps do your substitutions on the "inside", instead of on the
> "passedQuote"?
>
Hmmm - tempting, but no.
Firstly, it would need to do the replace in the current item both for
status = 'inside' and 'passedquote' because if you have input like
"one<TAB> two""three""four<TAB>five"
the status goes from 'inside' to 'passedquote' to 'inside' to
'passedquote' to etc. and for the latter TAB character it is 'passedquote'.
More generally, I want to do these substitutions in as few places as
possible (i.e. so that I am passing the longest possible string to the
engine to do a speedy 'replace'), so the best time to do that after
'passedquote'.
New version
function CSVToTab4 pData, pOldLineDelim, pOldItemDelim, pNewCR, pNewTAB
-- fill in defaults
if pOldLineDelim is empty then put CR into pOldLineDelim
if pOldItemDelim is empty then put COMMA into pOldItemDelim
if pNewCR is empty then put numtochar(11) into pNewCR -- Use <VT>
for quoted CRs
if pNewTAB is empty then put numtochar(29) into pNewTAB -- Use
<GS> (group separator) for quoted TABs
local tNuData -- contains tabbed copy of data
local tStatus, theInsideStringSoFar
-- Normalize line endings: REMOVED
-- Will normaly be correct already, only binfile: or similar chould
make this necessary
-- and that exceptional case should be the caller's responsibility
put "outside" into tStatus
set the itemdel to quote
repeat for each item k in pData
-- put tStatus && k & CR after msg
switch tStatus
case "inside"
put k after theInsideStringSoFar
put "passedquote" into tStatus
next repeat
case "passedquote"
-- decide if it was a duplicated escapedQuote or a closing
quote
if k is empty then -- it's a duplicated quote
put quote after theInsideStringSoFar
put "inside" into tStatus
next repeat
end if
-- not empty - so we remain inside the cell, though we have
left the quoted section
-- NB this allows for quoted sub-strings within the cell
content !!
replace pOldLineDelim with pNewCR in theInsideStringSoFar
replace TAB with pNewTAB in theInsideStringSoFar
put theInsideStringSoFar after tNuData
case "outside"
replace pOldItemDelim with TAB in k
-- and deal with the "empty trailing item" issue in Livecode
replace (pNewTAB & pOldLineDelim) with pNewTAB & pNewTAB &
CR in k
put k after tNuData
put "inside" into tStatus
put empty into theInsideStringSoFar
next repeat
default
put "defaulted"
break
end switch
end repeat
-- and finally deal with the trailing item isse in input data
-- i.e. the very last char is a quote, so there is no trigger to
flush the
-- last item
if the last char of pData = quote then
put theInsideStringSoFar after tNuData
end if
return tNuData
end CSVToTab4
-- Alex.
More information about the use-livecode
mailing list