Parse a CSV File with Regular Expressions

Thomas Gutzmann thomas.gutzmann at
Tue Jan 18 03:32:31 EST 2005

On Tue, 18 Jan 2005 01:36:48 +0000
  Alex Tweedly <alex at> wrote:
> Rev's RE library is based on PCRE, so should be adequately capable.
> However, I don't think it's as easy to parse the realistic version of CSV with REs as you might 

Well, Alex, it's not so difficult with Perl. If the items in the comma-separated list can contain 
other commata, in which case they are enclosed by quotes (optionally otherwise), like 
'"a,b",c,"d"', then the Perl script to parse the list looks like:

@s = (
	'"My family, My PowerBook, My Defender 110","1","mylife at"',
	'Scrooge,2,billionaire at',
	'RunRev List,"3,4,...","all at"');
foreach (@s) {
	if (/"*([^"]+)"*,"*([^"]+)"*,"*([^"]+)"*/) {
		print ("$_\n\t$1\n\t$2\n\t$3\n");

This example gives the result:

"My family, My PowerBook, My Defender 110","1","mylife at"
         My family, My PowerBook, My Defender 110
         mylife at
Scrooge,2,billionaire at
         billionaire at
RunRev List,"3,4,...","all at"
         RunRev List
         all at

which is what you would expect.

I don't know if it works in Rev because every implementation of RE is a bit different, and Perl 
has the best I've come across. Anyway: Perl can be installed on every machine, it's pre-installed 
on Unix, Linux and MacOS/X, so just use the power of this language in combination with Rev, RB or 
whatever development tool you use, instead of trying to do everything with one tool.

I'm missing this flexibility in the usage of tools in the IT world. Nobody in the industry would 
use a Porsche to transport stones (except the ones weared around the neck or wherever ladies have 
them), and nobody would drive a fork-lift truck on a (German) Autobahn. Most of us use hands and 
feet for their respective purposes. So why do programmers want to use one tool for all?


Thomas G.


For those of you who find it hard to read regular expressions (they are a good example of a 
write-only language):


represents 3 times the same group, separated by a comma: "*([^"]+)"*

This expression contains a prefix and a postfix: "* - which means "zero or more quotes".

In the middle of the expression - enclosed in brackets - is the term to be extracted: [^"]+ - 
which reads: any character except a quote, but at least one. If you replace the "+" with a "*", it 
would be allowed to have to commata following each other.

The regular expression can be shortened even more, but then it becomes completely 
uncomprehensible, and you need more time to comment it than to write it.

More information about the Use-livecode mailing list