Sample code for reading a CSV file

Alex Tweedly alex at tweedly.net
Thu Feb 17 18:38:21 EST 2011


Many years ago (2004 ?) I posted code to do something like his using 
split/combine to differentiate between 'inside' and 'outside' field 
delimiters. It was very fast - but pretty hard to follow, and I don't 
remember now which obscure cases it handled (we haven't even mentioned 
doubled characters and backslash escaped field delimiters yet :-)

The code Paul posted is very clear and easy to follow - but it will 
suffer pretty severe performance issues on large data sets. If you need 
to worry about that, you could get pretty good speed up just by 
replacing the "repeat with ..." by "repeat for each ..." - see below.  
The time to run this will grow (approx) linearly with the number of 
lines in the input data, whereas the previous version was a bit worse 
than N-squared growth ....

> function csvToArray pData, pRecordDelimiter, pFieldDelimiter, 
> pEncapsulationDelimiter
>    local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray
>
>    # Initialize the temporary record and field delimiters. Change 
> these if your CSV file may contain them.
>    put charToNum(1) into tReservedRecordDelimiter; put charToNum(2) 
> into tReservedFieldDelimiter;
>
>    # Step 1: Replace any Record or Field delimiters that are 
> encapsulated with temporary characters
>    set itemdel to pEncapsulationDelimiter
>    put false into tIsEven
>    repeat for each item itm in pData
>       if tIsEven then
>          replace pFieldDelimiter with tReservedFieldDelimiter in itm
>          replace pRecordDelimiter with tReservedRecordDelimiter in itm
>       end if
>       put itm & itemDelimiter after tData
>       put not tIsEven into tIsEven
>    end repeat
>    delete the last char of tData
>    put tData into pData
>    # Step 2: Replace all occurances of the encapsulation delimiter
>    replace pEncapsulationDelimiter with empty in pData
>    -- put pData into field "F"
>
>    # Step 3: Parse records and fields into the array, replace any 
> occurances of the reserved record and field delimiters for each element
>    set itemdel to pFieldDelimiter
>    set lineDel to pRecordDelimiter
>    put 0 into i
>    repeat for each line L in pData
>       add 1 to i
>       put 0 into j
>       repeat for each item itm in L
>          add 1 to j
>          replace tReservedRecordDelimiter with pRecordDelimiter in itm
>          replace tReservedFieldDelimiter with pFieldDelimiter in itm
>          put itm into tArray[i][j]
>       end repeat
>    end repeat
>
>    # Step 4: return the array
>    return tArray
> end csvToArray



On 17/02/2011 20:01, Paul Dupuis wrote:
> First, thanks to everyone who replied, but especially to Nosanity. 
> Your code reminded me that you can effectively tell when you are 
> inside an encapsulated bit of data by an odd/even count of the 
> encapsulation character. So, for anyone who wants it, here is a 
> generalized function that I just wrote to parse a CSV file, regardless 
> of the field or record delimiters (commas, tabs or whatever) and to 
> deal with encapsulation appropriately.
>
> This assumes you read the entire CSV file into a variable you pass 
> into pData, so a call would look like:
>
> put csvToArray(myEntireCSVData,return,comma,quote) into myDataAsArray
>
> I have tested it a bit in the last 30 minutes and it working in the 
> cases I tried, but did not test exhaustively and have not checked 
> performance on large datasets. If any one uses this and run into an 
> issue, please let me know.
>
> function csvToArray pData, pRecordDelimiter, pFieldDelimiter, 
> pEncapsulationDelimiter
>   local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray
>
>   # Initialize the temporary record and field delimiters. Change these 
> if your CSV file may contain them.
>   put charToNum(1) into tReservedRecordDelimiter; put charToNum(2) 
> into tReservedFieldDelimiter;
>
>   # Step 1: Replace any Record or Field delimiters that are 
> encapsulated with temporary characters
>   set itemdel to pEncapsulationDelimiter
>   repeat with i = 1 to the number of items in pData
>     if trunc(i/2) = (i/2) then
>       replace pFieldDelimiter with tReservedFieldDelimiter in item i 
> of pData
>       replace pRecordDelimiter with tReservedRecordDelimiter in item i 
> of pData
>     end if
>   end repeat
>
>   # Step 2: Replace all occurances of the encapsulation delimiter
>   replace pEncapsulationDelimiter with empty in pData
>
>   # Step 3: Parse records and fields into the array, replace any 
> occurances of the reserved record and field delimiters for each element
>   set itemdel to pFieldDelimiter
>   set lineDel to pRecordDelimiter
>   repeat with i = 1 to the number of lines in pData
>       repeat with j = 1 to the number of items in line i of pData
>          get item j of line i of pData
>          replace tReservedRecordDelimiter with pRecordDelimiter in it
>          replace tReservedFieldDelimiter with pFieldDelimiter in it
>          put it into tArray[i][j]
>       end repeat
>    end repeat
>
>    # Step 4: return the array
>    return tArray
> end csvToArray
>
>





More information about the Use-livecode mailing list