CSV again.
Peter M. Brigham
pmbrig at gmail.com
Sat Oct 17 20:47:35 EDT 2015
My mistake, failed to include the offsets() handler:
-- Peter
Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig
-----------
function offsets str, pContainer
-- returns a comma-delimited list of all the offsets of str in pContainer
-- returns 0 if not found
-- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
-- ie, overlapping offsets are not counted
-- note: to get the last occurrence of a string in a container (often useful)
-- use "item -1 of offsets(...)"
-- by Peter M. Brigham, pmbrig at gmail.com — freeware
if str is not in pContainer then return 0
put 0 into startPoint
repeat
put offset(str,pContainer,startPoint) into thisOffset
if thisOffset = 0 then exit repeat
add thisOffset to startPoint
put startPoint & comma after offsetList
add length(str)-1 to startPoint
end repeat
return item 1 to -1 of offsetList -- delete trailing comma
end offsets
On Oct 17, 2015, at 8:30 PM, Alex Tweedly wrote:
> Hi Peter,
>
> it also requires offsets() - I can guess what it does, but it would be safer to get the actual code you use :-)
>
> Thanks
> -- Alex.
>
> On 18/10/2015 00:41, Peter M. Brigham wrote:
>> So here's my attempt. It converts a CVS text to an array. Let's see if there's csv data that can break it.
>>
>> -- Peter
>>
>> Peter M. Brigham
>> pmbrig at gmail.com
>> http://home.comcast.net/~pmbrig
>>
>> -------
>>
>> function CSVtoArray pData
>> -- by Peter M. Brigham, pmbrig at gmail.com
>> -- requires getDelimiters(), howmany()
>> put getDelimiters(pData,5) into tDelims
>> put line 1 of tDelims into crChar
>> put line 2 of tDelims into tabChar
>> put line 3 of tDelims into commaChar
>> put line 4 of tDelims into openQuoteChar
>> put line 5 of tDelims into closeQuoteChar
>> replace crlf with cr in pData -- Win to UNIX
>> replace numtochar(13) with cr in pData -- Mac to UNIX
>> if howmany(quote,pData) mod 2 = 1 then
>> return "This CSV data is not parsable (unclosed quotes in data)."
>> end if
>> put offsets(quote,pData) into qOffsets
>> if qOffsets > 0 then
>> put 1 into counter
>> repeat for each item q in qOffsets
>> if counter mod 2 = 1 then put openQuoteChar into char q of pData
>> else put closeQuoteChar into char q of pData
>> add 1 to counter
>> end repeat
>> end if
>> put offsets(cr,pData) into crOffsets
>> repeat for each item r in crOffsets
>> put char 1 to r of pData into upToHere
>> if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
>> -- the cr is within a quoted string
>> put crChar into char r of pData
>> end if
>> end repeat
>> put offsets(tab,pData) into tabOffsets
>> repeat for each item t in tabOffsets
>> put char 1 to t of pData into upToHere
>> if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
>> -- the tab is within a quoted string
>> put tabChar into char t of pData
>> end if
>> end repeat
>> put offsets(comma,pData) into commaOffsets
>> repeat for each item c in commaOffsets
>> put char 1 to c of pData into upToHere
>> if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
>> -- the comma is within a quoted string
>> put commaChar into char c of pData
>> end if
>> end repeat
>> put 0 into lineCounter
>> repeat for each line L in pData
>> add 1 to lineCounter
>> put 0 into itemCounter
>> repeat for each item i in L
>> add 1 to itemCounter
>> put i into thisItem
>> if howmany(quote,thisItem) mod 2 = 1 then
>> return "This CSV data is not parsable (unclosed quotes in item)."
>> end if
>> replace crChar with cr in thisItem
>> replace tabChar with tab in thisItem
>> replace commaChar with comma in thisItem
>> replace openQuoteChar with quote in thisItem
>> replace closeQuoteChar with quote in thisItem
>> put thisItem into A[lineCounter][itemCounter]
>> end repeat
>> end repeat
>> return A
>> end CSVtoArray
>>
>> function getDelimiters pText, nbr
>> -- returns a cr-delimited list of <nbr> characters
>> -- not found in the variable pText
>> -- use for delimiters for, eg, parsing text files, manipulating arrays, etc.
>> -- usage: put getDelimiters(pText,2) into tDelims
>> -- if tDelims begins with "Error" then exit to top -- or whatever
>> -- put line 1 of tDelims into lineDivider
>> -- put line 2 of tDelims into itemDivider
>> -- etc.
>> -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>> if pText = empty then return "Error: no text specified."
>> if nbr = empty then put 1 into nbr -- default 1 delimiter
>> put "2,3,4,5,6,7,8,16,17,18,19,20,21,22,23,24,25,26" into baseList
>> -- low ASCII values, excluding CR, LF, tab, etc.
>> put the number of items of baseList into maxNbr
>> if nbr > maxNbr then return "Error: max" && maxNbr && "delimiters."
>> repeat with tCount = 1 to nbr
>> put true into failed
>> repeat with i = 1 to the number of items of baseList
>> put item i of baseList into testNbr
>> put numtochar(testNbr) into testChar
>> if testChar is not in pText then
>> -- found one, store and get next delim
>> put false into failed
>> put testChar into line tCount of delimList
>> exit repeat
>> end if
>> end repeat
>> if failed then
>> if tCount = 0 then
>> return "Error: cannot get any delimiters."
>> else if tCount = 1 then
>> return "Error: can only get one delimiter."
>> else
>> return "Error: can only get" && tCount && "delimiters."
>> end if
>> end if
>> delete item i of baseList
>> end repeat
>> return delimList
>> end getDelimiters
>>
>> function howmany pStr, pContainer, pCaseSens
>> -- how many times pStr occurs in pContainer
>> -- note that howmany("00","000000") returns 3, not 5
>> -- ie, overlapping matches are not counted
>> -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>> if pCaseSens = empty then put false into pCaseSens
>> set the casesensitive to pCaseSens
>> if pStr is not in pContainer then return 0
>> put len(pContainer) into origLength
>> replace pStr with char 2 to -1 of pStr in pContainer
>> return origLength - len(pContainer)
>> end howmany
>>
>>
>> On Oct 17, 2015, at 5:03 AM, Alex Tweedly wrote:
>>
>>> Naturally it must be removed.
>>>
>>> But I have a more philosophical issue / question.
>>>
>>>
>>> TSV (in and of itself) doesn't have any quotes, and so doesn't handle quoted CRs or TABs.
>>>
>>> Currently, the 'old' version - as in Richard's published article, doesn't handle TAB characters enclosed within a quoted cell. The 'new' version does - but only by returning the data delimited by <GS> instead of TAB, and leaving enclosed TABs alone - a mistake, IMHO.
>>>
>>> I believe that what the converter should do is :
>>> - return TSV - i.e. delimited by TABs
>>> - replace quoted CR by <VT> within quoted cells (as it does now)
>>> - replace quoted TABs by <GS> within quoted cells
>>>
>>> Any comments or suggestions ?
>>>
>>> Thanks
>>> Alex.
>>>
>>> On 17/10/2015 02:34, Mike Kerner wrote:
>>>> It's safe as long as you remember to remove it at the end of the function
>>>>
>>>> On Fri, Oct 16, 2015 at 7:12 PM, Alex Tweedly <alex at tweedly.net> wrote:
>>>>
>>>>> Duh - replying to myself again :-)
>>>>>
>>>>> It looks as though that's exactly what you do mean - it certainly
>>>>> generates the problems you described earlier. And my one-line additional
>>>>> test would (does in my testing) solve it properly - without it, we don't
>>>>> get a chance to flush "theInsideStringSoFar" to tNuData, with the extra
>>>>> line we do. And adding it is always safe (AFAICI).
>>>>>
>>>>> -- Alex.
>>>>>
>>>>>
>>>>> On 17/10/2015 00:03, Alex Tweedly wrote:
>>>>>
>>>>>> Sorry, Mike, but can you describe what you mean by a "naked" line ?
>>>>>> Is it simply one with no line delimiter after it ?
>>>>>> i.e. could only happen on the very last line of a file of input ?
>>>>>>
>>>>>> Could that be solved by a simple test (after the various 'replace'
>>>>>> statements)
>>>>>> if the last char of pData <> CR then put CR after pData
>>>>>> before the parsing happens ?
>>>>>>
>>>>>> -- Alex.
>>>>>>
>>>>>>
>>>>>> On 16/10/2015 17:19, Mike Kerner wrote:
>>>>>>
>>>>>>> No, the problem isn't that LC use LF and CR for ascii(10) and ignores
>>>>>>> ascii(13). That's just a personal problem.
>>>>>>>
>>>>>>> The problem, here, is that the csv parser handles a naked line and a
>>>>>>> terminated line differently. If the line is terminated, it parses it one
>>>>>>> way, and if it is not, it parses it (incorrectly) a different way, which
>>>>>>> makes me wonder if this is the latest version.
>>>>>>>
>>>>>>> On Fri, Oct 16, 2015 at 11:28 AM, Bob Sneidar <
>>>>>>> bobsneidar at iotecdigital.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> But what if the cr or lf or crlf is inside quoted text, meaning it is not
>>>>>>>> a delimiter? Oh, I'm afraid the deflector shield will be quite
>>>>>>>> operational
>>>>>>>> when your friends arrive.
>>>>>>>>
>>>>>>>> Bob S
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 16, 2015, at 08:04 , Alex Tweedly <alex at tweedly.net> wrote:
>>>>>>>>> Hi Mike,
>>>>>>>>>
>>>>>>>>> thanks for that additional info.
>>>>>>>>>
>>>>>>>>> I *think* (it's been 3 years) I left them as <GS> (i.e. numtochar(29))
>>>>>>>>>
>>>>>>>> because I had some data including normal TAB characters within the cells
>>>>>>>> (!!( and thought <GS> was a safer bet - though of course nothing is
>>>>>>>> completely safe. It's then up to the caller to decide whether to do
>>>>>>>> "replace numtochar(29) with TAB in ...", or do TAB escaping, or whatever
>>>>>>>> they want.
>>>>>>>>
>>>>>>>>> As for the other bigger problem .... Oh dear = CR vs LF vs CRLF ....
>>>>>>>>>
>>>>>>>>> Are you on Mac or Windows or Linux ?
>>>>>>>>> How is the LF delimited data getting into your app ?
>>>>>>>>> Maybe we should just add a "replace chartonum(13) with CR in pData" ?
>>>>>>>>>
>>>>>>>>> (I confess to being confused by this - I know that LC does
>>>>>>>>>
>>>>>>>> auto-translation of line delimiters at various places, but I'm not sure
>>>>>>>> when it is, or isn't, completely safe. Maybe the easiest thing is to
>>>>>>>> jst do
>>>>>>>> all the translations ....
>>>>>>>>
>>>>>>>>> replace CRLF with CR in pData
>>>>>>>>> replace numtochar(10) with CR in pData
>>>>>>>>> replace numtochar(13) with CR in pData
>>>>>>>>>
>>>>>>>>> -- Alex.
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> use-livecode mailing list
>>>>>>>> use-livecode at lists.runrev.com
>>>>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>>>>> subscription preferences:
>>>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>>>>
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> use-livecode mailing list
>>>>>> use-livecode at lists.runrev.com
>>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>>> subscription preferences:
>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>>
>>>>> _______________________________________________
>>>>> use-livecode mailing list
>>>>> use-livecode at lists.runrev.com
>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>> subscription preferences:
>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list