Parse a CSV File with Regular Expressions

Thomas Gutzmann thomas.gutzmann at
Tue Jan 18 08:44:47 EST 2005

Hi Alex,

a 100% solution is not possible with one RE because embedded quotes cannot be converted at the 
same time as the rest is parsed.

> The best example of the embedded quote case is
> """My family"",""My PowerBook"",""My Defender 110""","1","mylife at"

I have modified the Perl script to convert double doublequotes (""x"") to single simple quotes 
('x'). It's just one way, and of course I'm using a regular expression for that:

@s = (
	'"My family, My PowerBook, My Defender 110","1","mylife at"',
	'Scrooge,2,billionaire at',
	'RunRev List,"3,
	...","all at"',
	' """My family"",""My PowerBook"",""My Defender 110""","1","mylife at"');
foreach (@s) {
	print ("Original: $_\n");
	print ("After replacement: $_\n");
	if (/"*([^"]+)"*,"*([^"]+)"*,"*([^"]+)"*/) {
		print ("\tItem 1: $1\n\tItem 2: $2\n\tItem 3: $3\n");

The result is

Original: "My family, My PowerBook, My Defender 110","1","mylife at"
After replacement: "My family, My PowerBook, My Defender 110","1","mylife at"
         Item 1: My family, My PowerBook, My Defender 110
         Item 2: 1
         Item 3: mylife at
Original: Scrooge,2,billionaire at
After replacement: Scrooge,2,billionaire at
         Item 1: Scrooge
         Item 2: 2
         Item 3: billionaire at
Original: RunRev List,"3,
         ...","all at"
After replacement: RunRev List,"3,
         ...","all at"
         Item 1: RunRev List
         Item 2: 3,
         Item 3: all at
Original:  """My family"",""My PowerBook"",""My Defender 110""","1","mylife at"
After replacement:  "'My family','My PowerBook','My Defender 110'","1","mylife at"
         Item 1: 'My family','My PowerBook','My Defender 110'
         Item 2: 1
         Item 3: mylife at

As you can see, embedded newline characters don't affect the result; this problem must be solved 
in the routine reading the lines. You can also ignore EOL bei excluding "$" (this is EOL for RE), 
but I haven't tested it, and I also don't have the time for it. Normally, you don't have these 

>> Most of us use hands and feet for their respective purposes. So why do 
>> programmers want to use one tool for all? 
> Because there's a level of inefficiency and discomfort caused by frequent changes in language 
>and tools. Because it's hard to become an expert in one language - doing it in Rev and Perl and 
>PHP and Python and Java and .... is probably impossible. Because it's easy, but wrong, to write 
>one language using the style and tricks of another (see various blog threads about "Python's not 
>Java", etc.)
> But mostly just because programmers are people :-)

Well, I don't agree. A good programmer should master a whole box of tools, and I also expect good 
developers to be multilingual. One of the problems we have in IT today comes from the fact, that 
too many people learn just one language (Java), and just the basics of database systems (primitive 
SQL à la MySQL, no procedural SQL), and that they are also limited in their knowledge of tools.

In a philosphical view, only knowledge gives you the possibility to choose, to differentiate, and 
to understand. This, in short, is one of the most important aspects of free will - in public and 
private life as well as in the job. An old saying in Germany goes like "Knowledge gives you 
freedom" ("Wissen macht frei").

But this discussion doesn't belong here.


Thomas G.

More information about the Use-livecode mailing list