CSV again.

Alex Tweedly alex at tweedly.net
Sat Oct 17 20:30:36 EDT 2015


Hi Peter,

it also requires offsets() - I can guess what it does, but it would be 
safer to get the actual code you use :-)

Thanks
-- Alex.

On 18/10/2015 00:41, Peter M. Brigham wrote:
> So here's my attempt. It converts a CVS text to an array. Let's see if there's csv data that can break it.
>
> -- Peter
>
> Peter M. Brigham
> pmbrig at gmail.com
> http://home.comcast.net/~pmbrig
>
> -------
>
> function CSVtoArray pData
>     -- by Peter M. Brigham, pmbrig at gmail.com
>     -- requires getDelimiters(), howmany()
>     put getDelimiters(pData,5) into tDelims
>     put line 1 of tDelims into crChar
>     put line 2 of tDelims into tabChar
>     put line 3 of tDelims into commaChar
>     put line 4 of tDelims into openQuoteChar
>     put line 5 of tDelims into closeQuoteChar
>     
>     replace crlf with cr in pData          -- Win to UNIX
>     replace numtochar(13) with cr in pData -- Mac to UNIX
>     
>     if howmany(quote,pData) mod 2 = 1 then
>        return "This CSV data is not parsable (unclosed quotes in data)."
>     end if
>     
>     put offsets(quote,pData) into qOffsets
>     if qOffsets > 0 then
>        put 1 into counter
>        repeat for each item q in qOffsets
>           if counter mod 2 = 1 then put openQuoteChar into char q of pData
>           else put closeQuoteChar into char q of pData
>           add 1 to counter
>        end repeat
>     end if
>     
>     put offsets(cr,pData) into crOffsets
>     repeat for each item r in crOffsets
>        put char 1 to r of pData into upToHere
>        if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
>           -- the cr is within a quoted string
>           put crChar into char r of pData
>        end if
>     end repeat
>     put offsets(tab,pData) into tabOffsets
>     repeat for each item t in tabOffsets
>        put char 1 to t of pData into upToHere
>        if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
>           -- the tab is within a quoted string
>           put tabChar into char t of pData
>        end if
>     end repeat
>     put offsets(comma,pData) into commaOffsets
>     repeat for each item c in commaOffsets
>        put char 1 to c of pData into upToHere
>        if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
>           -- the comma is within a quoted string
>           put commaChar into char c of pData
>        end if
>     end repeat
>     put 0 into lineCounter
>     repeat for each line L in pData
>        add 1 to lineCounter
>        put 0 into itemCounter
>        repeat for each item i in L
>           add 1 to itemCounter
>           put i into thisItem
>           if howmany(quote,thisItem) mod 2 = 1 then
>              return "This CSV data is not parsable (unclosed quotes in item)."
>           end if
>           replace crChar with cr in thisItem
>           replace tabChar with tab in thisItem
>           replace commaChar with comma in thisItem
>           replace openQuoteChar with quote in thisItem
>           replace closeQuoteChar with quote in thisItem
>           put thisItem into A[lineCounter][itemCounter]
>        end repeat
>     end repeat
>     return A
> end CSVtoArray
>
> function getDelimiters pText, nbr
>     -- returns a cr-delimited list of <nbr> characters
>     --    not found in the variable pText
>     -- use for delimiters for, eg, parsing text files, manipulating arrays, etc.
>     -- usage: put getDelimiters(pText,2) into tDelims
>     --        if tDelims begins with "Error" then exit to top -- or whatever
>     --        put line 1 of tDelims into lineDivider
>     --        put line 2 of tDelims into itemDivider
>     --             etc.
>     -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>     
>     if pText = empty then return "Error: no text specified."
>     if nbr = empty then put 1 into nbr -- default 1 delimiter
>     put "2,3,4,5,6,7,8,16,17,18,19,20,21,22,23,24,25,26" into baseList
>     -- low ASCII values, excluding CR, LF, tab, etc.
>     put the number of items of baseList into maxNbr
>     if nbr > maxNbr then return "Error: max" && maxNbr && "delimiters."
>     repeat with tCount = 1 to nbr
>        put true into failed
>        repeat with i = 1 to the number of items of baseList
>           put item i of baseList into testNbr
>           put numtochar(testNbr) into testChar
>           if testChar is not in pText then
>              -- found one, store and get next delim
>              put false into failed
>              put testChar into line tCount of delimList
>              exit repeat
>           end if
>        end repeat
>        if failed then
>           if tCount = 0 then
>              return "Error: cannot get any delimiters."
>           else if tCount = 1 then
>              return "Error: can only get one delimiter."
>           else
>              return "Error: can only get" && tCount && "delimiters."
>           end if
>        end if
>        delete item i of baseList
>     end repeat
>     return delimList
> end getDelimiters
>
> function howmany pStr, pContainer, pCaseSens
>     -- how many times pStr occurs in pContainer
>     -- note that howmany("00","000000") returns 3, not 5
>     --     ie,  overlapping matches are not counted
>     -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>     
>     if pCaseSens = empty then put false into pCaseSens
>     set the casesensitive to pCaseSens
>     if pStr is not in pContainer then return 0
>     put len(pContainer) into origLength
>     replace pStr with char 2 to -1 of pStr in pContainer
>     return origLength - len(pContainer)
> end howmany
>
>
> On Oct 17, 2015, at 5:03 AM, Alex Tweedly wrote:
>
>> Naturally it must be removed.
>>
>> But I have a more philosophical issue / question.
>>
>>
>> TSV (in and of itself) doesn't have any quotes, and so doesn't handle quoted CRs or TABs.
>>
>> Currently, the 'old' version - as in Richard's published article, doesn't handle TAB characters enclosed within a quoted cell. The 'new' version does - but only by returning the data delimited by <GS> instead of TAB, and leaving enclosed TABs alone - a mistake, IMHO.
>>
>> I believe that what the converter should do is :
>> - return TSV - i.e. delimited by TABs
>> - replace quoted CR by <VT> within quoted cells (as it does now)
>> - replace quoted TABs by <GS> within quoted cells
>>
>> Any comments or suggestions ?
>>
>> Thanks
>> Alex.
>>
>> On 17/10/2015 02:34, Mike Kerner wrote:
>>> It's safe as long as you remember to remove it at the end of the function
>>>
>>> On Fri, Oct 16, 2015 at 7:12 PM, Alex Tweedly <alex at tweedly.net> wrote:
>>>
>>>> Duh - replying to myself again :-)
>>>>
>>>> It looks as though that's exactly what you do mean - it certainly
>>>> generates the problems you described earlier. And my one-line additional
>>>> test would (does in my testing) solve it properly - without it, we don't
>>>> get a chance to flush "theInsideStringSoFar" to tNuData, with the extra
>>>> line we do. And adding it is always safe (AFAICI).
>>>>
>>>> -- Alex.
>>>>
>>>>
>>>> On 17/10/2015 00:03, Alex Tweedly wrote:
>>>>
>>>>> Sorry, Mike, but can you describe what you mean by a "naked" line ?
>>>>> Is it simply one with no line delimiter after it ?
>>>>> i.e. could only happen on the very last line of a file of input ?
>>>>>
>>>>> Could that be solved by a simple test (after the various 'replace'
>>>>> statements)
>>>>>      if the last char of pData <> CR then put CR after pData
>>>>> before the parsing happens ?
>>>>>
>>>>> -- Alex.
>>>>>
>>>>>
>>>>> On 16/10/2015 17:19, Mike Kerner wrote:
>>>>>
>>>>>> No, the problem isn't that LC use LF and CR for ascii(10) and ignores
>>>>>> ascii(13).  That's just a personal problem.
>>>>>>
>>>>>> The problem, here, is that the csv parser handles a naked line and a
>>>>>> terminated line differently.  If the line is terminated, it parses it one
>>>>>> way, and if it is not, it parses it (incorrectly) a different way, which
>>>>>> makes me wonder if this is the latest version.
>>>>>>
>>>>>> On Fri, Oct 16, 2015 at 11:28 AM, Bob Sneidar <
>>>>>> bobsneidar at iotecdigital.com>
>>>>>> wrote:
>>>>>>
>>>>>> But what if the cr or lf or crlf is inside quoted text, meaning it is not
>>>>>>> a delimiter? Oh, I'm afraid the deflector shield will be quite
>>>>>>> operational
>>>>>>> when your friends arrive.
>>>>>>>
>>>>>>> Bob S
>>>>>>>
>>>>>>>
>>>>>>> On Oct 16, 2015, at 08:04 , Alex Tweedly <alex at tweedly.net> wrote:
>>>>>>>> Hi Mike,
>>>>>>>>
>>>>>>>> thanks for that additional info.
>>>>>>>>
>>>>>>>> I *think* (it's been 3 years) I left them as <GS> (i.e. numtochar(29))
>>>>>>>>
>>>>>>> because I had some data including normal TAB characters within the cells
>>>>>>> (!!( and thought <GS> was a safer bet - though of course nothing is
>>>>>>> completely safe. It's then up to the caller to decide whether to do
>>>>>>> "replace numtochar(29) with TAB in ...", or do TAB escaping, or whatever
>>>>>>> they want.
>>>>>>>
>>>>>>>> As for the other bigger problem .... Oh dear = CR vs LF vs CRLF ....
>>>>>>>>
>>>>>>>> Are you on Mac or Windows or Linux ?
>>>>>>>> How is the LF delimited data getting into your app ?
>>>>>>>> Maybe we should just add a "replace chartonum(13) with CR in pData" ?
>>>>>>>>
>>>>>>>> (I confess to being confused by this - I know that LC does
>>>>>>>>
>>>>>>> auto-translation of line delimiters at various places, but I'm not sure
>>>>>>> when it is, or isn't, completely safe. Maybe the easiest thing is to
>>>>>>> jst do
>>>>>>> all the translations ....
>>>>>>>
>>>>>>>>    replace CRLF with CR in pData
>>>>>>>>    replace numtochar(10) with CR in pData
>>>>>>>>    replace numtochar(13) with CR in pData
>>>>>>>>
>>>>>>>> -- Alex.
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> use-livecode mailing list
>>>>>>> use-livecode at lists.runrev.com
>>>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>>>> subscription preferences:
>>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>>>
>>>>>>>
>>>>> _______________________________________________
>>>>> use-livecode mailing list
>>>>> use-livecode at lists.runrev.com
>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>> subscription preferences:
>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> use-livecode at lists.runrev.com
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list