Sample code for reading a CSV file
Peter Brigham MD
pmbrig at gmail.com
Thu Feb 17 18:59:13 EST 2011
On Feb 17, 2011, at 3:01 PM, Paul Dupuis wrote:
> First, thanks to everyone who replied, but especially to Nosanity.
> Your code reminded me that you can effectively tell when you are
> inside an encapsulated bit of data by an odd/even count of the
> encapsulation character. So, for anyone who wants it, here is a
> generalized function that I just wrote to parse a CSV file,
> regardless of the field or record delimiters (commas, tabs or
> whatever) and to deal with encapsulation appropriately.
>
> This assumes you read the entire CSV file into a variable you pass
> into pData, so a call would look like:
>
> put csvToArray(myEntireCSVData,return,comma,quote) into myDataAsArray
>
> I have tested it a bit in the last 30 minutes and it working in the
> cases I tried, but did not test exhaustively and have not checked
> performance on large datasets. If any one uses this and run into an
> issue, please let me know.
>
> function csvToArray pData, pRecordDelimiter, pFieldDelimiter,
> pEncapsulationDelimiter
> local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray
>
> # Initialize the temporary record and field delimiters. Change
> these if your CSV file may contain them.
> put charToNum(1) into tReservedRecordDelimiter; put charToNum(2)
> into tReservedFieldDelimiter;
>
> # Step 1: Replace any Record or Field delimiters that are
> encapsulated with temporary characters
> set itemdel to pEncapsulationDelimiter
> repeat with i = 1 to the number of items in pData
> if trunc(i/2) = (i/2) then
> replace pFieldDelimiter with tReservedFieldDelimiter in item i
> of pData
> replace pRecordDelimiter with tReservedRecordDelimiter in item
> i of pData
> end if
> end repeat
>
> # Step 2: Replace all occurances of the encapsulation delimiter
> replace pEncapsulationDelimiter with empty in pData
>
> # Step 3: Parse records and fields into the array, replace any
> occurances of the reserved record and field delimiters for each
> element
> set itemdel to pFieldDelimiter
> set lineDel to pRecordDelimiter
> repeat with i = 1 to the number of lines in pData
> repeat with j = 1 to the number of items in line i of pData
> get item j of line i of pData
> replace tReservedRecordDelimiter with pRecordDelimiter in it
> replace tReservedFieldDelimiter with pFieldDelimiter in it
> put it into tArray[i][j]
> end repeat
> end repeat
>
> # Step 4: return the array
> return tArray
> end csvToArray
Here's an extension of your handler that will find appropriate
reservedDelimiters that do not appear in the csv text, so you don't
have to hope that numtochar(1) and numtochar(2) are not in the data:
function csvToArray pData, pRecordDelimiter, pFieldDelimiter,
pEncapsulationDelimiter
local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray
# Initialize the temporary record and field delimiters. Change
these if your CSV file may contain them.
put getDelimiters(pData) into tDelim -- ## see below
put numtochar(line 1 of tDelim) into tReservedRecordDelimiter
put numtochar(line 2 of tDelim) into tReservedFieldDelimiter
# Step 1: Replace any Record or Field delimiters that are
encapsulated with temporary characters
set itemdel to pEncapsulationDelimiter
repeat with i = 1 to the number of items in pData
if trunc(i/2) = (i/2) then
replace pFieldDelimiter with tReservedFieldDelimiter in
item i of pData
replace pRecordDelimiter with tReservedRecordDelimiter
in item i of pData
end if
end repeat
# Step 2: Replace all occurances of the encapsulation delimiter
replace pEncapsulationDelimiter with empty in pData
# Step 3: Parse records and fields into the array, replace any
occurances of the reserved record and field delimiters for each element
set itemdel to pFieldDelimiter
set lineDel to pRecordDelimiter
repeat with i = 1 to the number of lines in pData
repeat with j = 1 to the number of items in line i of pData
get item j of line i of pData
replace tReservedRecordDelimiter with
pRecordDelimiter in it
replace tReservedFieldDelimiter with pFieldDelimiter
in it
put it into tArray[i][j]
end repeat
end repeat
# Step 4: return the array
return tArray
end csvToArray
function getDelimiters tCSV
-- return 2 non-printing ASCII characters not found in the
variable tCSV
put "1,2,3,4,5,6,7,8" into dList
-- could use other non-printing ASCII values
put false into foundOne
repeat with i = 1 to the number of items of dList
put item i of dList into testNbr
if numtochar(testNbr) is not in tCSV then
put true into foundOne
put testNbr into delimOne
exit repeat
end if
end repeat
if not foundOne then
doAlert
exit to top
end if
delete item i of dList
put false into foundOne
repeat with i = 1 to the number of items of dList
put item i of dList into testNbr
if numtochar(testNbr) is not in tCSV then
put true into foundOne
put testNbr into delimTwo
exit repeat
end if
end repeat
if not foundOne then
doAlert
exit to top
end if
return delimOne & cr & delimTwo
end getDelimiters
on doAlert
answer "Cannot find delimiters!"
end doAlert
----------
Not tested but should work -- I used something like this in an old HC
utility stack I put together years ago that I can't find at this
point. Double-check me for bugs, since I'm working off the top of my
head, but the principle is sound.
(The HC utility stack allowed you to open two stacks, one as the
source of data, one as the recipient of data, then click on one field
after another in the source stack, click on fields in order in the
second stack (with prompts to clarify which corresponded to which),
then depress the commandkey to exit the loading loop, and the handlers
would load the data into the desired fields in the second stack. This
of course depended on storing the data in CSV format in a variable
using record and field delimiters found nowhere in the data being
transferred. No arrays in those days.)
-- Peter
Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig
More information about the use-livecode
mailing list