CSV again.

Peter M. Brigham pmbrig at gmail.com
Sat Oct 17 20:47:35 EDT 2015


My mistake, failed to include the offsets() handler:

-- Peter

Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig

-----------

function offsets str, pContainer
   -- returns a comma-delimited list of all the offsets of str in pContainer
   -- returns 0 if not found
   -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
   --     ie, overlapping offsets are not counted
   -- note: to get the last occurrence of a string in a container (often useful)
   --     use "item -1 of offsets(...)"
   -- by Peter M. Brigham, pmbrig at gmail.com — freeware
   
   if str is not in pContainer then return 0
   put 0 into startPoint
   repeat
      put offset(str,pContainer,startPoint) into thisOffset
      if thisOffset = 0 then exit repeat
      add thisOffset to startPoint
      put startPoint & comma after offsetList
      add length(str)-1 to startPoint
   end repeat
   return item 1 to -1 of offsetList -- delete trailing comma
end offsets


On Oct 17, 2015, at 8:30 PM, Alex Tweedly wrote:

> Hi Peter,
> 
> it also requires offsets() - I can guess what it does, but it would be safer to get the actual code you use :-)
> 
> Thanks
> -- Alex.
> 
> On 18/10/2015 00:41, Peter M. Brigham wrote:
>> So here's my attempt. It converts a CVS text to an array. Let's see if there's csv data that can break it.
>> 
>> -- Peter
>> 
>> Peter M. Brigham
>> pmbrig at gmail.com
>> http://home.comcast.net/~pmbrig
>> 
>> -------
>> 
>> function CSVtoArray pData
>>    -- by Peter M. Brigham, pmbrig at gmail.com
>>    -- requires getDelimiters(), howmany()
>>    put getDelimiters(pData,5) into tDelims
>>    put line 1 of tDelims into crChar
>>    put line 2 of tDelims into tabChar
>>    put line 3 of tDelims into commaChar
>>    put line 4 of tDelims into openQuoteChar
>>    put line 5 of tDelims into closeQuoteChar
>>        replace crlf with cr in pData          -- Win to UNIX
>>    replace numtochar(13) with cr in pData -- Mac to UNIX
>>        if howmany(quote,pData) mod 2 = 1 then
>>       return "This CSV data is not parsable (unclosed quotes in data)."
>>    end if
>>        put offsets(quote,pData) into qOffsets
>>    if qOffsets > 0 then
>>       put 1 into counter
>>       repeat for each item q in qOffsets
>>          if counter mod 2 = 1 then put openQuoteChar into char q of pData
>>          else put closeQuoteChar into char q of pData
>>          add 1 to counter
>>       end repeat
>>    end if
>>        put offsets(cr,pData) into crOffsets
>>    repeat for each item r in crOffsets
>>       put char 1 to r of pData into upToHere
>>       if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
>>          -- the cr is within a quoted string
>>          put crChar into char r of pData
>>       end if
>>    end repeat
>>    put offsets(tab,pData) into tabOffsets
>>    repeat for each item t in tabOffsets
>>       put char 1 to t of pData into upToHere
>>       if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
>>          -- the tab is within a quoted string
>>          put tabChar into char t of pData
>>       end if
>>    end repeat
>>    put offsets(comma,pData) into commaOffsets
>>    repeat for each item c in commaOffsets
>>       put char 1 to c of pData into upToHere
>>       if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
>>          -- the comma is within a quoted string
>>          put commaChar into char c of pData
>>       end if
>>    end repeat
>>    put 0 into lineCounter
>>    repeat for each line L in pData
>>       add 1 to lineCounter
>>       put 0 into itemCounter
>>       repeat for each item i in L
>>          add 1 to itemCounter
>>          put i into thisItem
>>          if howmany(quote,thisItem) mod 2 = 1 then
>>             return "This CSV data is not parsable (unclosed quotes in item)."
>>          end if
>>          replace crChar with cr in thisItem
>>          replace tabChar with tab in thisItem
>>          replace commaChar with comma in thisItem
>>          replace openQuoteChar with quote in thisItem
>>          replace closeQuoteChar with quote in thisItem
>>          put thisItem into A[lineCounter][itemCounter]
>>       end repeat
>>    end repeat
>>    return A
>> end CSVtoArray
>> 
>> function getDelimiters pText, nbr
>>    -- returns a cr-delimited list of <nbr> characters
>>    --    not found in the variable pText
>>    -- use for delimiters for, eg, parsing text files, manipulating arrays, etc.
>>    -- usage: put getDelimiters(pText,2) into tDelims
>>    --        if tDelims begins with "Error" then exit to top -- or whatever
>>    --        put line 1 of tDelims into lineDivider
>>    --        put line 2 of tDelims into itemDivider
>>    --             etc.
>>    -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>>        if pText = empty then return "Error: no text specified."
>>    if nbr = empty then put 1 into nbr -- default 1 delimiter
>>    put "2,3,4,5,6,7,8,16,17,18,19,20,21,22,23,24,25,26" into baseList
>>    -- low ASCII values, excluding CR, LF, tab, etc.
>>    put the number of items of baseList into maxNbr
>>    if nbr > maxNbr then return "Error: max" && maxNbr && "delimiters."
>>    repeat with tCount = 1 to nbr
>>       put true into failed
>>       repeat with i = 1 to the number of items of baseList
>>          put item i of baseList into testNbr
>>          put numtochar(testNbr) into testChar
>>          if testChar is not in pText then
>>             -- found one, store and get next delim
>>             put false into failed
>>             put testChar into line tCount of delimList
>>             exit repeat
>>          end if
>>       end repeat
>>       if failed then
>>          if tCount = 0 then
>>             return "Error: cannot get any delimiters."
>>          else if tCount = 1 then
>>             return "Error: can only get one delimiter."
>>          else
>>             return "Error: can only get" && tCount && "delimiters."
>>          end if
>>       end if
>>       delete item i of baseList
>>    end repeat
>>    return delimList
>> end getDelimiters
>> 
>> function howmany pStr, pContainer, pCaseSens
>>    -- how many times pStr occurs in pContainer
>>    -- note that howmany("00","000000") returns 3, not 5
>>    --     ie,  overlapping matches are not counted
>>    -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>>        if pCaseSens = empty then put false into pCaseSens
>>    set the casesensitive to pCaseSens
>>    if pStr is not in pContainer then return 0
>>    put len(pContainer) into origLength
>>    replace pStr with char 2 to -1 of pStr in pContainer
>>    return origLength - len(pContainer)
>> end howmany
>> 
>> 
>> On Oct 17, 2015, at 5:03 AM, Alex Tweedly wrote:
>> 
>>> Naturally it must be removed.
>>> 
>>> But I have a more philosophical issue / question.
>>> 
>>> 
>>> TSV (in and of itself) doesn't have any quotes, and so doesn't handle quoted CRs or TABs.
>>> 
>>> Currently, the 'old' version - as in Richard's published article, doesn't handle TAB characters enclosed within a quoted cell. The 'new' version does - but only by returning the data delimited by <GS> instead of TAB, and leaving enclosed TABs alone - a mistake, IMHO.
>>> 
>>> I believe that what the converter should do is :
>>> - return TSV - i.e. delimited by TABs
>>> - replace quoted CR by <VT> within quoted cells (as it does now)
>>> - replace quoted TABs by <GS> within quoted cells
>>> 
>>> Any comments or suggestions ?
>>> 
>>> Thanks
>>> Alex.
>>> 
>>> On 17/10/2015 02:34, Mike Kerner wrote:
>>>> It's safe as long as you remember to remove it at the end of the function
>>>> 
>>>> On Fri, Oct 16, 2015 at 7:12 PM, Alex Tweedly <alex at tweedly.net> wrote:
>>>> 
>>>>> Duh - replying to myself again :-)
>>>>> 
>>>>> It looks as though that's exactly what you do mean - it certainly
>>>>> generates the problems you described earlier. And my one-line additional
>>>>> test would (does in my testing) solve it properly - without it, we don't
>>>>> get a chance to flush "theInsideStringSoFar" to tNuData, with the extra
>>>>> line we do. And adding it is always safe (AFAICI).
>>>>> 
>>>>> -- Alex.
>>>>> 
>>>>> 
>>>>> On 17/10/2015 00:03, Alex Tweedly wrote:
>>>>> 
>>>>>> Sorry, Mike, but can you describe what you mean by a "naked" line ?
>>>>>> Is it simply one with no line delimiter after it ?
>>>>>> i.e. could only happen on the very last line of a file of input ?
>>>>>> 
>>>>>> Could that be solved by a simple test (after the various 'replace'
>>>>>> statements)
>>>>>>     if the last char of pData <> CR then put CR after pData
>>>>>> before the parsing happens ?
>>>>>> 
>>>>>> -- Alex.
>>>>>> 
>>>>>> 
>>>>>> On 16/10/2015 17:19, Mike Kerner wrote:
>>>>>> 
>>>>>>> No, the problem isn't that LC use LF and CR for ascii(10) and ignores
>>>>>>> ascii(13).  That's just a personal problem.
>>>>>>> 
>>>>>>> The problem, here, is that the csv parser handles a naked line and a
>>>>>>> terminated line differently.  If the line is terminated, it parses it one
>>>>>>> way, and if it is not, it parses it (incorrectly) a different way, which
>>>>>>> makes me wonder if this is the latest version.
>>>>>>> 
>>>>>>> On Fri, Oct 16, 2015 at 11:28 AM, Bob Sneidar <
>>>>>>> bobsneidar at iotecdigital.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> But what if the cr or lf or crlf is inside quoted text, meaning it is not
>>>>>>>> a delimiter? Oh, I'm afraid the deflector shield will be quite
>>>>>>>> operational
>>>>>>>> when your friends arrive.
>>>>>>>> 
>>>>>>>> Bob S
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Oct 16, 2015, at 08:04 , Alex Tweedly <alex at tweedly.net> wrote:
>>>>>>>>> Hi Mike,
>>>>>>>>> 
>>>>>>>>> thanks for that additional info.
>>>>>>>>> 
>>>>>>>>> I *think* (it's been 3 years) I left them as <GS> (i.e. numtochar(29))
>>>>>>>>> 
>>>>>>>> because I had some data including normal TAB characters within the cells
>>>>>>>> (!!( and thought <GS> was a safer bet - though of course nothing is
>>>>>>>> completely safe. It's then up to the caller to decide whether to do
>>>>>>>> "replace numtochar(29) with TAB in ...", or do TAB escaping, or whatever
>>>>>>>> they want.
>>>>>>>> 
>>>>>>>>> As for the other bigger problem .... Oh dear = CR vs LF vs CRLF ....
>>>>>>>>> 
>>>>>>>>> Are you on Mac or Windows or Linux ?
>>>>>>>>> How is the LF delimited data getting into your app ?
>>>>>>>>> Maybe we should just add a "replace chartonum(13) with CR in pData" ?
>>>>>>>>> 
>>>>>>>>> (I confess to being confused by this - I know that LC does
>>>>>>>>> 
>>>>>>>> auto-translation of line delimiters at various places, but I'm not sure
>>>>>>>> when it is, or isn't, completely safe. Maybe the easiest thing is to
>>>>>>>> jst do
>>>>>>>> all the translations ....
>>>>>>>> 
>>>>>>>>>   replace CRLF with CR in pData
>>>>>>>>>   replace numtochar(10) with CR in pData
>>>>>>>>>   replace numtochar(13) with CR in pData
>>>>>>>>> 
>>>>>>>>> -- Alex.
>>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> use-livecode mailing list
>>>>>>>> use-livecode at lists.runrev.com
>>>>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>>>>> subscription preferences:
>>>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>>>> 
>>>>>>>> 
>>>>>> _______________________________________________
>>>>>> use-livecode mailing list
>>>>>> use-livecode at lists.runrev.com
>>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>>> subscription preferences:
>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>> 
>>>>> _______________________________________________
>>>>> use-livecode mailing list
>>>>> use-livecode at lists.runrev.com
>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>> subscription preferences:
>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list