Parse a CSV File with Regular Expressions
Thomas Gutzmann
thomas.gutzmann at gutzmann.com
Tue Jan 18 09:34:25 EST 2005
Alex, you nightmare,
> But it fails on a few very common cases, including empty, quoted fields and multiple adjacent
>quotes within fields
>
>> d:\Our Documents\Alex> perl re1.pl
>> Original: """My family"",""My PowerBook"",""My Defender
>> 110""","","mylife at home.com"
>> After replacement: "'My family','My PowerBook','My Defender
>> 110'","","mylife at home.com"
>> Item 1: 'My family','My PowerBook'
>> Item 2: 'My Defender 110'
>> Item 3: ,
>>
>> d:\Our Documents\Alex>
>
> I'm sure there's a way round this too .... but I suspect it's time to stop drawing out these
>examples.
#!/usr/bin/perl
@s = (
'"""My family"",""My PowerBook"",""My Defender 110""","","mylife at home.com"');
foreach (@s) {
print ("Original: $_\n");
s/""([^"]*)""/'$1'/g;
print ("After replacement: $_\n");
if (/"*([^"]*)"*,"*([^"]*)"*,"*([^"]*)"*/) {
print ("\tItem 1: $1\n\tItem 2: $2\n\tItem 3: $3\n");
}
}
I disallowed empty string by using "+" instead of "*" - using "*" solves this issue.
>> As you can see, embedded newline characters don't affect the result;
>> this problem must be solved in the routine reading the lines. You can
>> also ignore EOL bei excluding "$" (this is EOL for RE), but I haven't
>> tested it, and I also don't have the time for it. Normally, you don't
>> have these problems.
>
> Actually, normally I do have this problem. Palm Pilot exports usually have embedded CR within
>quoted fields, and that's one I often deal with.
It depends if the embedded CR is an EOL. If no, my example works unchanged. If yes, it takes some
more thinking, because you have to identify true end-of-lines (which are end-of-records in this
case), and you have to cope with missing fields which would screw up everything. But I suspect
that there is a distinction between end-of-line (e.g. CR) and end-of-record (e.g. LF or CR/LF) -
most decent programmers would create some sort of record boundary, while the embedded CR is used
for field formatting.
By the way, I can understand your aversion against Perl. But it has it's virtues, if you use it
for well defined and limited purposes, keep programs short and spend enough time on clean
programming and documentation. But whom do I tell it...
Cheers,
Thomas G.
More information about the use-livecode
mailing list