Parse a CSV File with Regular Expressions
alex at tweedly.net
Tue Jan 18 09:12:49 EST 2005
Thomas Gutzmann wrote:
> Hi Alex,
> a 100% solution is not possible with one RE because embedded quotes
> cannot be converted at the same time as the rest is parsed.
But I don't think they can be simply substituted independent of the parsing.
>> The best example of the embedded quote case is
>> """My family"",""My PowerBook"",""My Defender
>> 110""","1","mylife at home.com"
> I have modified the Perl script to convert double doublequotes (""x"")
> to single simple quotes ('x'). It's just one way, and of course I'm
> using a regular expression for that:
Good try - you may be up to 75% or 80% now :-) :-)
But it fails on a few very common cases, including empty, quoted fields
and multiple adjacent quotes within fields
> d:\Our Documents\Alex> perl re1.pl
> Original: """My family"",""My PowerBook"",""My Defender
> 110""","","mylife at home.com"
> After replacement: "'My family','My PowerBook','My Defender
> 110'","","mylife at home.com"
> Item 1: 'My family','My PowerBook'
> Item 2: 'My Defender 110'
> Item 3: ,
> d:\Our Documents\Alex>
I'm sure there's a way round this too .... but I suspect it's time to
stop drawing out these examples.
> As you can see, embedded newline characters don't affect the result;
> this problem must be solved in the routine reading the lines. You can
> also ignore EOL bei excluding "$" (this is EOL for RE), but I haven't
> tested it, and I also don't have the time for it. Normally, you don't
> have these problems.
Actually, normally I do have this problem. Palm Pilot exports usually
have embedded CR within quoted fields, and that's one I often deal with.
>>> Most of us use hands and feet for their respective purposes. So why
>>> do programmers want to use one tool for all?
>> Because there's a level of inefficiency and discomfort caused by
>> frequent changes in language and tools. Because it's hard to become
>> an expert in one language - doing it in Rev and Perl and PHP and
>> Python and Java and .... is probably impossible. Because it's easy,
>> but wrong, to write one language using the style and tricks of
>> another (see various blog threads about "Python's not Java", etc.)
>> But mostly just because programmers are people :-)
> Well, I don't agree. A good programmer should master a whole box of
> tools, and I also expect good developers to be multilingual.
I didn't say that programmers *should* use one tool (in fact, I said
they should use multiple).
These were the reasons why , IMO, programmers *want* to use one tool :-)
> One of the problems we have in IT today comes from the fact, that too
> many people learn just one language (Java), and just the basics of
> database systems (primitive SQL à la MySQL, no procedural SQL), and
> that they are also limited in their knowledge of tools.
Yeah, I've often advocated multiple tools on this list -
Python/Pythoncard is my common alternate to RunRev.
I used Perl fairly extensively back in the early days ('88-'91 or '92)
and developed an allergy to it then; maybe it's time to give it another try.
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.7.0 - Release Date: 17/01/2005
More information about the Use-livecode