Read a text file
alex at tweedly.net
Mon Jan 17 20:36:48 EST 2005
Thomas Gutzmann wrote:
> I haven't tested the regular expressions in Rev, but in Perl, it
> would take some scratching of the head only to cope with commas
> embedded in quotes. Or some browsing in the Internet. But it depends
> on the quality of the RE parser.
Rev's RE library is based on PCRE, so should be adequately capable.
However, I don't think it's as easy to parse the realistic version of
CSV with REs as you might think. Six months ago (when the earlier csv
thread on this list started), I looked into it; took about 5 mins to
convince me I couldn't do it myself without spending more time than I
wanted in learning the obscure corners of regex. So I spent an hour or
so searching the Internet, but didn't find anything even approaching the
real cases you encounter in csv files. Since there was an alternative
that did everything being discussed then, and which was adequately fast,
I didn't look any further.
Since I got your mail, I've spent another hour or two idly browsing the
net. I followed each of the to 20 hits from a couple of different Google
searches. I found a lot of articles that claim to have a regex that
handles csv files - but in fact their "coverage" ranges between 10% and
70% of the cases I think I'd need to handle in real apps.
There is one very credible looking article on a .NET regex that sounds
like it might do more than that - but the regex used is clearly not
going to succeed in PCRE - and indeed didn't in Python or Rev - so
either it uses a feature of .NET that isn't in Perl/PCRE, or there's a
typo, or something.
I believe that if there were a complete solution in regex, it would show
up pretty high on Google, so I am now, still, of the opinion that the
complete csv problem is beyond regex, even though the simpler cases can
be done fairly easily. I'd be delighted if someone could change that
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.6.11 - Release Date: 12/01/2005
More information about the Use-livecode