Parse a CSV File with Regular Expressions

Alex Tweedly alex at tweedly.net
Tue Jan 18 09:12:49 EST 2005


Thomas Gutzmann wrote:

> Hi Alex,
>
> a 100% solution is not possible with one RE because embedded quotes 
> cannot be converted at the same time as the rest is parsed.

But I don't think they can be simply substituted independent of the parsing.

>> The best example of the embedded quote case is
>> """My family"",""My PowerBook"",""My Defender 
>> 110""","1","mylife at home.com"
>
>
> I have modified the Perl script to convert double doublequotes (""x"") 
> to single simple quotes ('x'). It's just one way, and of course I'm 
> using a regular expression for that:

Good try - you may be up to 75% or 80% now :-) :-)

But it fails on a few very common cases, including empty, quoted fields 
and multiple adjacent quotes within fields

> d:\Our Documents\Alex> perl re1.pl
> Original:  """My family"",""My PowerBook"",""My Defender 
> 110""","","mylife at home.com"
> After replacement:  "'My family','My PowerBook','My Defender 
> 110'","","mylife at home.com"
>     Item 1: 'My family','My PowerBook'
>     Item 2: 'My Defender 110'
>     Item 3: ,
>
> d:\Our Documents\Alex>

I'm sure there's a way round this too .... but I suspect it's time to 
stop drawing out these examples.

>
> As you can see, embedded newline characters don't affect the result; 
> this problem must be solved in the routine reading the lines. You can 
> also ignore EOL bei excluding "$" (this is EOL for RE), but I haven't 
> tested it, and I also don't have the time for it. Normally, you don't 
> have these problems.

Actually, normally I do have this problem. Palm Pilot exports usually 
have embedded CR within quoted fields, and that's one I often deal with.

>>> Most of us use hands and feet for their respective purposes. So why 
>>> do programmers want to use one tool for all? 
>>
>>
>> Because there's a level of inefficiency and discomfort caused by 
>> frequent changes in language and tools. Because it's hard to become 
>> an expert in one language - doing it in Rev and Perl and PHP and 
>> Python and Java and .... is probably impossible. Because it's easy, 
>> but wrong, to write one language using the style and tricks of 
>> another (see various blog threads about "Python's not Java", etc.)
>>
>> But mostly just because programmers are people :-)
>
>
> Well, I don't agree. A good programmer should master a whole box of 
> tools, and I also expect good developers to be multilingual.

I didn't say that programmers *should* use one tool (in fact, I said 
they should use multiple).
These were the reasons why , IMO, programmers *want* to use one tool :-)

> One of the problems we have in IT today comes from the fact, that too 
> many people learn just one language (Java), and just the basics of 
> database systems (primitive SQL à la MySQL, no procedural SQL), and 
> that they are also limited in their knowledge of tools.

Yeah, I've often advocated multiple tools on this list - 
Python/Pythoncard is my common alternate to RunRev.
I used Perl fairly extensively back in the early days ('88-'91 or '92) 
and developed an allergy to it then; maybe it's time to give it another try.

Cheers
-- Alex.


-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.7.0 - Release Date: 17/01/2005



More information about the use-livecode mailing list