CSV again.

Mike Kerner MikeKerner at roadrunner.com
Sat Oct 17 11:42:04 EDT 2015


I added it to my repository on GitHub if anyone wants to try to do this in
Git.

On Sat, Oct 17, 2015 at 10:53 AM, Mike Kerner <MikeKerner at roadrunner.com>
wrote:

> I am going to put 4 on Git and have at it.
>
> 1) There are other assumptions being made, like assuming that the <VT> and
> <GS> don't appear in the incoming text.  Instead of hardcoding the interim
> substitutions, determine what the interim substitutions are going to be
> (can also allow the user to specify them).  Characters that we need to deal
> with are quote, <HT>,<LF>, and comma.
>
> 2) In this version, you can specify the incoming column delimiter.  Add
> the ability for the caller to specify the record delimiter before, the
> column and record delimiters after, and what substitutions are going to be
> used, after.  For example, for embedded <LF>'s, perhaps the user wants <13>
> or even a string like a semicolon and a space
>
>
> On Sat, Oct 17, 2015 at 5:03 AM, Alex Tweedly <alex at tweedly.net> wrote:
>
>> Naturally it must be removed.
>>
>> But I have a more philosophical issue / question.
>>
>>
>> TSV (in and of itself) doesn't have any quotes, and so doesn't handle
>> quoted CRs or TABs.
>>
>> Currently, the 'old' version - as in Richard's published article, doesn't
>> handle TAB characters enclosed within a quoted cell. The 'new' version does
>> - but only by returning the data delimited by <GS> instead of TAB, and
>> leaving enclosed TABs alone - a mistake, IMHO.
>>
>> I believe that what the converter should do is :
>>  - return TSV - i.e. delimited by TABs
>>  - replace quoted CR by <VT> within quoted cells (as it does now)
>>  - replace quoted TABs by <GS> within quoted cells
>>
>> Any comments or suggestions ?
>>
>> Thanks
>> Alex.
>>
>>
>> On 17/10/2015 02:34, Mike Kerner wrote:
>>
>>> It's safe as long as you remember to remove it at the end of the function
>>>
>>> On Fri, Oct 16, 2015 at 7:12 PM, Alex Tweedly <alex at tweedly.net> wrote:
>>>
>>> Duh - replying to myself again :-)
>>>>
>>>> It looks as though that's exactly what you do mean - it certainly
>>>> generates the problems you described earlier. And my one-line additional
>>>> test would (does in my testing) solve it properly - without it, we don't
>>>> get a chance to flush "theInsideStringSoFar" to tNuData, with the extra
>>>> line we do. And adding it is always safe (AFAICI).
>>>>
>>>> -- Alex.
>>>>
>>>>
>>>> On 17/10/2015 00:03, Alex Tweedly wrote:
>>>>
>>>> Sorry, Mike, but can you describe what you mean by a "naked" line ?
>>>>> Is it simply one with no line delimiter after it ?
>>>>> i.e. could only happen on the very last line of a file of input ?
>>>>>
>>>>> Could that be solved by a simple test (after the various 'replace'
>>>>> statements)
>>>>>      if the last char of pData <> CR then put CR after pData
>>>>> before the parsing happens ?
>>>>>
>>>>> -- Alex.
>>>>>
>>>>>
>>>>> On 16/10/2015 17:19, Mike Kerner wrote:
>>>>>
>>>>> No, the problem isn't that LC use LF and CR for ascii(10) and ignores
>>>>>> ascii(13).  That's just a personal problem.
>>>>>>
>>>>>> The problem, here, is that the csv parser handles a naked line and a
>>>>>> terminated line differently.  If the line is terminated, it parses it
>>>>>> one
>>>>>> way, and if it is not, it parses it (incorrectly) a different way,
>>>>>> which
>>>>>> makes me wonder if this is the latest version.
>>>>>>
>>>>>> On Fri, Oct 16, 2015 at 11:28 AM, Bob Sneidar <
>>>>>> bobsneidar at iotecdigital.com>
>>>>>> wrote:
>>>>>>
>>>>>> But what if the cr or lf or crlf is inside quoted text, meaning it is
>>>>>> not
>>>>>>
>>>>>>> a delimiter? Oh, I'm afraid the deflector shield will be quite
>>>>>>> operational
>>>>>>> when your friends arrive.
>>>>>>>
>>>>>>> Bob S
>>>>>>>
>>>>>>>
>>>>>>> On Oct 16, 2015, at 08:04 , Alex Tweedly <alex at tweedly.net> wrote:
>>>>>>>
>>>>>>>> Hi Mike,
>>>>>>>>
>>>>>>>> thanks for that additional info.
>>>>>>>>
>>>>>>>> I *think* (it's been 3 years) I left them as <GS> (i.e.
>>>>>>>> numtochar(29))
>>>>>>>>
>>>>>>>> because I had some data including normal TAB characters within the
>>>>>>> cells
>>>>>>> (!!( and thought <GS> was a safer bet - though of course nothing is
>>>>>>> completely safe. It's then up to the caller to decide whether to do
>>>>>>> "replace numtochar(29) with TAB in ...", or do TAB escaping, or
>>>>>>> whatever
>>>>>>> they want.
>>>>>>>
>>>>>>> As for the other bigger problem .... Oh dear = CR vs LF vs CRLF ....
>>>>>>>>
>>>>>>>> Are you on Mac or Windows or Linux ?
>>>>>>>> How is the LF delimited data getting into your app ?
>>>>>>>> Maybe we should just add a "replace chartonum(13) with CR in pData"
>>>>>>>> ?
>>>>>>>>
>>>>>>>> (I confess to being confused by this - I know that LC does
>>>>>>>>
>>>>>>>> auto-translation of line delimiters at various places, but I'm not
>>>>>>> sure
>>>>>>> when it is, or isn't, completely safe. Maybe the easiest thing is to
>>>>>>> jst do
>>>>>>> all the translations ....
>>>>>>>
>>>>>>>    replace CRLF with CR in pData
>>>>>>>>    replace numtochar(10) with CR in pData
>>>>>>>>    replace numtochar(13) with CR in pData
>>>>>>>>
>>>>>>>> -- Alex.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> use-livecode mailing list
>>>>>>> use-livecode at lists.runrev.com
>>>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>>>> subscription preferences:
>>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>> use-livecode mailing list
>>>>> use-livecode at lists.runrev.com
>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>> subscription preferences:
>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>
>>>>>
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> use-livecode at lists.runrev.com
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>>>
>>>
>>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>
>
>
> --
> On the first day, God created the heavens and the Earth
> On the second day, God created the oceans.
> On the third day, God put the animals on hold for a few hours,
>    and did a little diving.
> And God said, "This is good."
>



-- 
On the first day, God created the heavens and the Earth
On the second day, God created the oceans.
On the third day, God put the animals on hold for a few hours,
   and did a little diving.
And God said, "This is good."



More information about the use-livecode mailing list