CSV again.

Mike Kerner MikeKerner at roadrunner.com
Sat Oct 17 10:53:33 EDT 2015


I am going to put 4 on Git and have at it.

1) There are other assumptions being made, like assuming that the <VT> and
<GS> don't appear in the incoming text.  Instead of hardcoding the interim
substitutions, determine what the interim substitutions are going to be
(can also allow the user to specify them).  Characters that we need to deal
with are quote, <HT>,<LF>, and comma.

2) In this version, you can specify the incoming column delimiter.  Add the
ability for the caller to specify the record delimiter before, the column
and record delimiters after, and what substitutions are going to be used,
after.  For example, for embedded <LF>'s, perhaps the user wants <13> or
even a string like a semicolon and a space


On Sat, Oct 17, 2015 at 5:03 AM, Alex Tweedly <alex at tweedly.net> wrote:

> Naturally it must be removed.
>
> But I have a more philosophical issue / question.
>
>
> TSV (in and of itself) doesn't have any quotes, and so doesn't handle
> quoted CRs or TABs.
>
> Currently, the 'old' version - as in Richard's published article, doesn't
> handle TAB characters enclosed within a quoted cell. The 'new' version does
> - but only by returning the data delimited by <GS> instead of TAB, and
> leaving enclosed TABs alone - a mistake, IMHO.
>
> I believe that what the converter should do is :
>  - return TSV - i.e. delimited by TABs
>  - replace quoted CR by <VT> within quoted cells (as it does now)
>  - replace quoted TABs by <GS> within quoted cells
>
> Any comments or suggestions ?
>
> Thanks
> Alex.
>
>
> On 17/10/2015 02:34, Mike Kerner wrote:
>
>> It's safe as long as you remember to remove it at the end of the function
>>
>> On Fri, Oct 16, 2015 at 7:12 PM, Alex Tweedly <alex at tweedly.net> wrote:
>>
>> Duh - replying to myself again :-)
>>>
>>> It looks as though that's exactly what you do mean - it certainly
>>> generates the problems you described earlier. And my one-line additional
>>> test would (does in my testing) solve it properly - without it, we don't
>>> get a chance to flush "theInsideStringSoFar" to tNuData, with the extra
>>> line we do. And adding it is always safe (AFAICI).
>>>
>>> -- Alex.
>>>
>>>
>>> On 17/10/2015 00:03, Alex Tweedly wrote:
>>>
>>> Sorry, Mike, but can you describe what you mean by a "naked" line ?
>>>> Is it simply one with no line delimiter after it ?
>>>> i.e. could only happen on the very last line of a file of input ?
>>>>
>>>> Could that be solved by a simple test (after the various 'replace'
>>>> statements)
>>>>      if the last char of pData <> CR then put CR after pData
>>>> before the parsing happens ?
>>>>
>>>> -- Alex.
>>>>
>>>>
>>>> On 16/10/2015 17:19, Mike Kerner wrote:
>>>>
>>>> No, the problem isn't that LC use LF and CR for ascii(10) and ignores
>>>>> ascii(13).  That's just a personal problem.
>>>>>
>>>>> The problem, here, is that the csv parser handles a naked line and a
>>>>> terminated line differently.  If the line is terminated, it parses it
>>>>> one
>>>>> way, and if it is not, it parses it (incorrectly) a different way,
>>>>> which
>>>>> makes me wonder if this is the latest version.
>>>>>
>>>>> On Fri, Oct 16, 2015 at 11:28 AM, Bob Sneidar <
>>>>> bobsneidar at iotecdigital.com>
>>>>> wrote:
>>>>>
>>>>> But what if the cr or lf or crlf is inside quoted text, meaning it is
>>>>> not
>>>>>
>>>>>> a delimiter? Oh, I'm afraid the deflector shield will be quite
>>>>>> operational
>>>>>> when your friends arrive.
>>>>>>
>>>>>> Bob S
>>>>>>
>>>>>>
>>>>>> On Oct 16, 2015, at 08:04 , Alex Tweedly <alex at tweedly.net> wrote:
>>>>>>
>>>>>>> Hi Mike,
>>>>>>>
>>>>>>> thanks for that additional info.
>>>>>>>
>>>>>>> I *think* (it's been 3 years) I left them as <GS> (i.e.
>>>>>>> numtochar(29))
>>>>>>>
>>>>>>> because I had some data including normal TAB characters within the
>>>>>> cells
>>>>>> (!!( and thought <GS> was a safer bet - though of course nothing is
>>>>>> completely safe. It's then up to the caller to decide whether to do
>>>>>> "replace numtochar(29) with TAB in ...", or do TAB escaping, or
>>>>>> whatever
>>>>>> they want.
>>>>>>
>>>>>> As for the other bigger problem .... Oh dear = CR vs LF vs CRLF ....
>>>>>>>
>>>>>>> Are you on Mac or Windows or Linux ?
>>>>>>> How is the LF delimited data getting into your app ?
>>>>>>> Maybe we should just add a "replace chartonum(13) with CR in pData" ?
>>>>>>>
>>>>>>> (I confess to being confused by this - I know that LC does
>>>>>>>
>>>>>>> auto-translation of line delimiters at various places, but I'm not
>>>>>> sure
>>>>>> when it is, or isn't, completely safe. Maybe the easiest thing is to
>>>>>> jst do
>>>>>> all the translations ....
>>>>>>
>>>>>>    replace CRLF with CR in pData
>>>>>>>    replace numtochar(10) with CR in pData
>>>>>>>    replace numtochar(13) with CR in pData
>>>>>>>
>>>>>>> -- Alex.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>> use-livecode mailing list
>>>>>> use-livecode at lists.runrev.com
>>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>>> subscription preferences:
>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> use-livecode at lists.runrev.com
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>
>>>
>>
>>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



-- 
On the first day, God created the heavens and the Earth
On the second day, God created the oceans.
On the third day, God put the animals on hold for a few hours,
   and did a little diving.
And God said, "This is good."



More information about the use-livecode mailing list