Sample code for reading a CSV file

Peter Brigham MD pmbrig at gmail.com
Thu Feb 17 18:59:13 EST 2011


On Feb 17, 2011, at 3:01 PM, Paul Dupuis wrote:

> First, thanks to everyone who replied, but especially to Nosanity.  
> Your code reminded me that you can effectively tell when you are  
> inside an encapsulated bit of data by an odd/even count of the  
> encapsulation character. So, for anyone who wants it, here is a  
> generalized function that I just wrote to parse a CSV file,  
> regardless of the field or record delimiters (commas, tabs or  
> whatever) and to deal with encapsulation appropriately.
>
> This assumes you read the entire CSV file into a variable you pass  
> into pData, so a call would look like:
>
> put csvToArray(myEntireCSVData,return,comma,quote) into myDataAsArray
>
> I have tested it a bit in the last 30 minutes and it working in the  
> cases I tried, but did not test exhaustively and have not checked  
> performance on large datasets. If any one uses this and run into an  
> issue, please let me know.
>
> function csvToArray pData, pRecordDelimiter, pFieldDelimiter,  
> pEncapsulationDelimiter
>  local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray
>
>  # Initialize the temporary record and field delimiters. Change  
> these if your CSV file may contain them.
>  put charToNum(1) into tReservedRecordDelimiter; put charToNum(2)  
> into tReservedFieldDelimiter;
>
>  # Step 1: Replace any Record or Field delimiters that are  
> encapsulated with temporary characters
>  set itemdel to pEncapsulationDelimiter
>  repeat with i = 1 to the number of items in pData
>    if trunc(i/2) = (i/2) then
>      replace pFieldDelimiter with tReservedFieldDelimiter in item i  
> of pData
>      replace pRecordDelimiter with tReservedRecordDelimiter in item  
> i of pData
>    end if
>  end repeat
>
>  # Step 2: Replace all occurances of the encapsulation delimiter
>  replace pEncapsulationDelimiter with empty in pData
>
>  # Step 3: Parse records and fields into the array, replace any  
> occurances of the reserved record and field delimiters for each  
> element
>  set itemdel to pFieldDelimiter
>  set lineDel to pRecordDelimiter
>  repeat with i = 1 to the number of lines in pData
>      repeat with j = 1 to the number of items in line i of pData
>         get item j of line i of pData
>         replace tReservedRecordDelimiter with pRecordDelimiter in it
>         replace tReservedFieldDelimiter with pFieldDelimiter in it
>         put it into tArray[i][j]
>      end repeat
>   end repeat
>
>   # Step 4: return the array
>   return tArray
> end csvToArray

Here's an extension of your handler that will find appropriate  
reservedDelimiters that do not appear in the csv text, so you don't  
have to hope that numtochar(1) and numtochar(2) are not in the data:

function csvToArray pData, pRecordDelimiter, pFieldDelimiter,  
pEncapsulationDelimiter
    local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray
    # Initialize the temporary record and field delimiters. Change  
these if your CSV file may contain them.

    put getDelimiters(pData) into tDelim -- ## see below
    put numtochar(line 1 of tDelim) into tReservedRecordDelimiter
    put numtochar(line 2 of tDelim) into tReservedFieldDelimiter

     # Step 1: Replace any Record or Field delimiters that are  
encapsulated with temporary characters
     set itemdel to pEncapsulationDelimiter
     repeat with i = 1 to the number of items in pData
          if trunc(i/2) = (i/2) then
               replace pFieldDelimiter with tReservedFieldDelimiter in  
item i of pData
               replace pRecordDelimiter with tReservedRecordDelimiter  
in item i of pData
          end if
     end repeat

     # Step 2: Replace all occurances of the encapsulation delimiter
     replace pEncapsulationDelimiter with empty in pData

     # Step 3: Parse records and fields into the array, replace any  
occurances of the reserved record and field delimiters for each element
     set itemdel to pFieldDelimiter
     set lineDel to pRecordDelimiter
     repeat with i = 1 to the number of lines in pData
            repeat with j = 1 to the number of items in line i of pData
                  get item j of line i of pData
                  replace tReservedRecordDelimiter with  
pRecordDelimiter in it
                  replace tReservedFieldDelimiter with pFieldDelimiter  
in it
                  put it into tArray[i][j]
            end repeat
      end repeat

      # Step 4: return the array
      return tArray
end csvToArray

function getDelimiters tCSV
    -- return 2 non-printing ASCII characters not found in the  
variable tCSV
    put "1,2,3,4,5,6,7,8" into dList
    -- could use other non-printing ASCII values
    put false into foundOne
    repeat with i = 1 to the number of items of dList
       put item i of dList into testNbr
       if numtochar(testNbr) is not in tCSV then
          put true into foundOne
          put testNbr into delimOne
          exit repeat
       end if
    end repeat
    if not foundOne then
       doAlert
       exit to top
    end if
    delete item i of dList
    put false into foundOne
    repeat with i = 1 to the number of items of dList
       put item i of dList into testNbr
       if numtochar(testNbr) is not in tCSV then
          put true into foundOne
          put testNbr into delimTwo
          exit repeat
       end if
    end repeat
    if not foundOne then
       doAlert
       exit to top
    end if
    return delimOne & cr & delimTwo
end getDelimiters

on doAlert
    answer "Cannot find delimiters!"
end doAlert

----------

Not tested but should work -- I used something like this in an old HC  
utility stack I put together years ago that I can't find at this  
point. Double-check me for bugs, since I'm working off the top of my  
head, but the principle is sound.

(The HC utility stack allowed you to open two stacks, one as the  
source of data, one as the recipient of data, then click on one field  
after another in the source stack, click on fields in order in the  
second stack (with prompts to clarify which corresponded to which),  
then depress the commandkey to exit the loading loop, and the handlers  
would load the data into the desired fields in the second stack. This  
of course depended on storing the data in CSV format in a variable  
using record and field delimiters found nowhere in the data being  
transferred. No arrays in those days.)

-- Peter

Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig






More information about the use-livecode mailing list