CSV again.
Alex Tweedly
alex at tweedly.net
Sat Oct 17 20:30:36 EDT 2015
Hi Peter,
it also requires offsets() - I can guess what it does, but it would be
safer to get the actual code you use :-)
Thanks
-- Alex.
On 18/10/2015 00:41, Peter M. Brigham wrote:
> So here's my attempt. It converts a CVS text to an array. Let's see if there's csv data that can break it.
>
> -- Peter
>
> Peter M. Brigham
> pmbrig at gmail.com
> http://home.comcast.net/~pmbrig
>
> -------
>
> function CSVtoArray pData
> -- by Peter M. Brigham, pmbrig at gmail.com
> -- requires getDelimiters(), howmany()
> put getDelimiters(pData,5) into tDelims
> put line 1 of tDelims into crChar
> put line 2 of tDelims into tabChar
> put line 3 of tDelims into commaChar
> put line 4 of tDelims into openQuoteChar
> put line 5 of tDelims into closeQuoteChar
>
> replace crlf with cr in pData -- Win to UNIX
> replace numtochar(13) with cr in pData -- Mac to UNIX
>
> if howmany(quote,pData) mod 2 = 1 then
> return "This CSV data is not parsable (unclosed quotes in data)."
> end if
>
> put offsets(quote,pData) into qOffsets
> if qOffsets > 0 then
> put 1 into counter
> repeat for each item q in qOffsets
> if counter mod 2 = 1 then put openQuoteChar into char q of pData
> else put closeQuoteChar into char q of pData
> add 1 to counter
> end repeat
> end if
>
> put offsets(cr,pData) into crOffsets
> repeat for each item r in crOffsets
> put char 1 to r of pData into upToHere
> if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
> -- the cr is within a quoted string
> put crChar into char r of pData
> end if
> end repeat
> put offsets(tab,pData) into tabOffsets
> repeat for each item t in tabOffsets
> put char 1 to t of pData into upToHere
> if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
> -- the tab is within a quoted string
> put tabChar into char t of pData
> end if
> end repeat
> put offsets(comma,pData) into commaOffsets
> repeat for each item c in commaOffsets
> put char 1 to c of pData into upToHere
> if howmany(openQuoteChar,upToHere) <> howmany(closeQuoteChar,upToHere) then
> -- the comma is within a quoted string
> put commaChar into char c of pData
> end if
> end repeat
> put 0 into lineCounter
> repeat for each line L in pData
> add 1 to lineCounter
> put 0 into itemCounter
> repeat for each item i in L
> add 1 to itemCounter
> put i into thisItem
> if howmany(quote,thisItem) mod 2 = 1 then
> return "This CSV data is not parsable (unclosed quotes in item)."
> end if
> replace crChar with cr in thisItem
> replace tabChar with tab in thisItem
> replace commaChar with comma in thisItem
> replace openQuoteChar with quote in thisItem
> replace closeQuoteChar with quote in thisItem
> put thisItem into A[lineCounter][itemCounter]
> end repeat
> end repeat
> return A
> end CSVtoArray
>
> function getDelimiters pText, nbr
> -- returns a cr-delimited list of <nbr> characters
> -- not found in the variable pText
> -- use for delimiters for, eg, parsing text files, manipulating arrays, etc.
> -- usage: put getDelimiters(pText,2) into tDelims
> -- if tDelims begins with "Error" then exit to top -- or whatever
> -- put line 1 of tDelims into lineDivider
> -- put line 2 of tDelims into itemDivider
> -- etc.
> -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>
> if pText = empty then return "Error: no text specified."
> if nbr = empty then put 1 into nbr -- default 1 delimiter
> put "2,3,4,5,6,7,8,16,17,18,19,20,21,22,23,24,25,26" into baseList
> -- low ASCII values, excluding CR, LF, tab, etc.
> put the number of items of baseList into maxNbr
> if nbr > maxNbr then return "Error: max" && maxNbr && "delimiters."
> repeat with tCount = 1 to nbr
> put true into failed
> repeat with i = 1 to the number of items of baseList
> put item i of baseList into testNbr
> put numtochar(testNbr) into testChar
> if testChar is not in pText then
> -- found one, store and get next delim
> put false into failed
> put testChar into line tCount of delimList
> exit repeat
> end if
> end repeat
> if failed then
> if tCount = 0 then
> return "Error: cannot get any delimiters."
> else if tCount = 1 then
> return "Error: can only get one delimiter."
> else
> return "Error: can only get" && tCount && "delimiters."
> end if
> end if
> delete item i of baseList
> end repeat
> return delimList
> end getDelimiters
>
> function howmany pStr, pContainer, pCaseSens
> -- how many times pStr occurs in pContainer
> -- note that howmany("00","000000") returns 3, not 5
> -- ie, overlapping matches are not counted
> -- by Peter M. Brigham, pmbrig at gmail.com — freeware
>
> if pCaseSens = empty then put false into pCaseSens
> set the casesensitive to pCaseSens
> if pStr is not in pContainer then return 0
> put len(pContainer) into origLength
> replace pStr with char 2 to -1 of pStr in pContainer
> return origLength - len(pContainer)
> end howmany
>
>
> On Oct 17, 2015, at 5:03 AM, Alex Tweedly wrote:
>
>> Naturally it must be removed.
>>
>> But I have a more philosophical issue / question.
>>
>>
>> TSV (in and of itself) doesn't have any quotes, and so doesn't handle quoted CRs or TABs.
>>
>> Currently, the 'old' version - as in Richard's published article, doesn't handle TAB characters enclosed within a quoted cell. The 'new' version does - but only by returning the data delimited by <GS> instead of TAB, and leaving enclosed TABs alone - a mistake, IMHO.
>>
>> I believe that what the converter should do is :
>> - return TSV - i.e. delimited by TABs
>> - replace quoted CR by <VT> within quoted cells (as it does now)
>> - replace quoted TABs by <GS> within quoted cells
>>
>> Any comments or suggestions ?
>>
>> Thanks
>> Alex.
>>
>> On 17/10/2015 02:34, Mike Kerner wrote:
>>> It's safe as long as you remember to remove it at the end of the function
>>>
>>> On Fri, Oct 16, 2015 at 7:12 PM, Alex Tweedly <alex at tweedly.net> wrote:
>>>
>>>> Duh - replying to myself again :-)
>>>>
>>>> It looks as though that's exactly what you do mean - it certainly
>>>> generates the problems you described earlier. And my one-line additional
>>>> test would (does in my testing) solve it properly - without it, we don't
>>>> get a chance to flush "theInsideStringSoFar" to tNuData, with the extra
>>>> line we do. And adding it is always safe (AFAICI).
>>>>
>>>> -- Alex.
>>>>
>>>>
>>>> On 17/10/2015 00:03, Alex Tweedly wrote:
>>>>
>>>>> Sorry, Mike, but can you describe what you mean by a "naked" line ?
>>>>> Is it simply one with no line delimiter after it ?
>>>>> i.e. could only happen on the very last line of a file of input ?
>>>>>
>>>>> Could that be solved by a simple test (after the various 'replace'
>>>>> statements)
>>>>> if the last char of pData <> CR then put CR after pData
>>>>> before the parsing happens ?
>>>>>
>>>>> -- Alex.
>>>>>
>>>>>
>>>>> On 16/10/2015 17:19, Mike Kerner wrote:
>>>>>
>>>>>> No, the problem isn't that LC use LF and CR for ascii(10) and ignores
>>>>>> ascii(13). That's just a personal problem.
>>>>>>
>>>>>> The problem, here, is that the csv parser handles a naked line and a
>>>>>> terminated line differently. If the line is terminated, it parses it one
>>>>>> way, and if it is not, it parses it (incorrectly) a different way, which
>>>>>> makes me wonder if this is the latest version.
>>>>>>
>>>>>> On Fri, Oct 16, 2015 at 11:28 AM, Bob Sneidar <
>>>>>> bobsneidar at iotecdigital.com>
>>>>>> wrote:
>>>>>>
>>>>>> But what if the cr or lf or crlf is inside quoted text, meaning it is not
>>>>>>> a delimiter? Oh, I'm afraid the deflector shield will be quite
>>>>>>> operational
>>>>>>> when your friends arrive.
>>>>>>>
>>>>>>> Bob S
>>>>>>>
>>>>>>>
>>>>>>> On Oct 16, 2015, at 08:04 , Alex Tweedly <alex at tweedly.net> wrote:
>>>>>>>> Hi Mike,
>>>>>>>>
>>>>>>>> thanks for that additional info.
>>>>>>>>
>>>>>>>> I *think* (it's been 3 years) I left them as <GS> (i.e. numtochar(29))
>>>>>>>>
>>>>>>> because I had some data including normal TAB characters within the cells
>>>>>>> (!!( and thought <GS> was a safer bet - though of course nothing is
>>>>>>> completely safe. It's then up to the caller to decide whether to do
>>>>>>> "replace numtochar(29) with TAB in ...", or do TAB escaping, or whatever
>>>>>>> they want.
>>>>>>>
>>>>>>>> As for the other bigger problem .... Oh dear = CR vs LF vs CRLF ....
>>>>>>>>
>>>>>>>> Are you on Mac or Windows or Linux ?
>>>>>>>> How is the LF delimited data getting into your app ?
>>>>>>>> Maybe we should just add a "replace chartonum(13) with CR in pData" ?
>>>>>>>>
>>>>>>>> (I confess to being confused by this - I know that LC does
>>>>>>>>
>>>>>>> auto-translation of line delimiters at various places, but I'm not sure
>>>>>>> when it is, or isn't, completely safe. Maybe the easiest thing is to
>>>>>>> jst do
>>>>>>> all the translations ....
>>>>>>>
>>>>>>>> replace CRLF with CR in pData
>>>>>>>> replace numtochar(10) with CR in pData
>>>>>>>> replace numtochar(13) with CR in pData
>>>>>>>>
>>>>>>>> -- Alex.
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> use-livecode mailing list
>>>>>>> use-livecode at lists.runrev.com
>>>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>>>> subscription preferences:
>>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>>>
>>>>>>>
>>>>> _______________________________________________
>>>>> use-livecode mailing list
>>>>> use-livecode at lists.runrev.com
>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>> subscription preferences:
>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> use-livecode at lists.runrev.com
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list