Parse a CSV File with Regular Expressions
thierry
douez at wanadoo.fr
Tue Jan 18 05:45:42 EST 2005
Hi,
well, as a test, i've copied and pasted the Perl'RE above
and escaped the quotes.. build the scrpt in revolution with the
results one should expect :-)
so, pretty much the same, isn't it ? :-)
and about speed, so far, so good...
HTH, regards,
thierry
-------------------
on mouseup
put empty into tmp
repeat for each line l in field 1
if matchText( l, "\"?([^\"]+)\"?,\"?([^\"]+)\"?,\"?([^\"]+)\"?", \
theName, theNumber, theAdress ) then
put l &cr&tab& theName &cr&tab& theNumber &cr&tab& theAdress & cr after tmp
else
put l &cr&tab& "Can't split this line ?"
end if
end repeat
put tmp
end mouseup
>> Rev's RE library is based on PCRE, so should be adequately capable.
>>
>> However, I don't think it's as easy to parse the realistic version of CSV with REs as you might
>>think.
TG> Well, Alex, it's not so difficult with Perl. If the items in the comma-separated list can contain
TG> other commata, in which case they are enclosed by quotes (optionally otherwise), like
TG> '"a,b",c,"d"', then the Perl script to parse the list looks like:
TG> #!/usr/bin/perl
TG> @s = (
TG> '"My family, My PowerBook, My Defender 110","1","mylife at home.com"',
TG> 'Scrooge,2,billionaire at minimum.com',
TG> 'RunRev List,"3,4,...","all at the-rest.co.uk"');
TG> foreach (@s) {
TG> if (/"*([^"]+)"*,"*([^"]+)"*,"*([^"]+)"*/) {
TG> print ("$_\n\t$1\n\t$2\n\t$3\n");
TG> }
TG> }
TG> This example gives the result:
TG> "My family, My PowerBook, My Defender 110","1","mylife at home.com"
TG> My family, My PowerBook, My Defender 110
TG> 1
TG> mylife at home.com
TG> Scrooge,2,billionaire at minimum.com
TG> Scrooge
TG> 2
TG> billionaire at minimum.com
TG> RunRev List,"3,4,...","all at the-rest.co.uk"
TG> RunRev List
TG> 3,4,...
TG> all at the-rest.co.uk
TG> which is what you would expect.
TG> I don't know if it works in Rev because every implementation of RE is a bit different, and Perl
TG> has the best I've come across. Anyway: Perl can be installed on every machine, it's pre-installed
TG> on Unix, Linux and MacOS/X, so just use the power of this language in combination with Rev, RB or
TG> whatever development tool you use, instead of trying to do everything with one tool.
TG> I'm missing this flexibility in the usage of tools in the IT world. Nobody in the industry would
TG> use a Porsche to transport stones (except the ones weared around the neck or wherever ladies have
TG> them), and nobody would drive a fork-lift truck on a (German) Autobahn. Most of us use hands and
TG> feet for their respective purposes. So why do programmers want to use one tool for all?
TG> Cheers,
TG> Thomas G.
TG> ---
TG> For those of you who find it hard to read regular expressions (they are a good example of a
TG> write-only language):
TG> /"*([^"]+)"*,"*([^"]+)"*,"*([^"]+)"*/
TG> represents 3 times the same group, separated by a comma: "*([^"]+)"*
TG> This expression contains a prefix and a postfix: "* - which means "zero or more quotes".
TG> In the middle of the expression - enclosed in brackets - is the term to be extracted: [^"]+ -
TG> which reads: any character except a quote, but at least one. If you replace the "+" with a "*", it
TG> would be allowed to have to commata following each other.
TG> The regular expression can be shortened even more, but then it becomes completely
TG> uncomprehensible, and you need more time to comment it than to write it.
TG> _______________________________________________
TG> use-revolution mailing list
TG> use-revolution at lists.runrev.com
TG> http://lists.runrev.com/mailman/listinfo/use-revolution
Best regards,
More information about the use-livecode
mailing list