Parse a CSV File with Regular Expressions

thierry douez at wanadoo.fr
Tue Jan 18 05:45:42 EST 2005


Hi,

well, as a test, i've copied and pasted the Perl'RE above
and escaped the quotes.. build the scrpt in revolution with the
results one should expect :-)

so, pretty much the same, isn't it ? :-)
and about speed, so far, so good...


HTH, regards,
thierry

-------------------
on mouseup
  put empty into tmp
  repeat for each line l in field 1
    if matchText( l, "\"?([^\"]+)\"?,\"?([^\"]+)\"?,\"?([^\"]+)\"?", \
        theName, theNumber, theAdress ) then
      put l &cr&tab& theName &cr&tab& theNumber &cr&tab& theAdress & cr after tmp
    else
      put l &cr&tab& "Can't split this line ?"
    end if
  end repeat
  put tmp
end mouseup


>> Rev's RE library is based on PCRE, so should be adequately capable.
>> 
>> However, I don't think it's as easy to parse the realistic version of CSV with REs as you might
>>think.

TG> Well, Alex, it's not so difficult with Perl. If the items in the comma-separated list can contain
TG> other commata, in which case they are enclosed by quotes (optionally otherwise), like
TG> '"a,b",c,"d"', then the Perl script to parse the list looks like:

TG> #!/usr/bin/perl
TG> @s = (
TG> 	'"My family, My PowerBook, My Defender 110","1","mylife at home.com"',
TG> 	'Scrooge,2,billionaire at minimum.com',
TG> 	'RunRev List,"3,4,...","all at the-rest.co.uk"');
TG> foreach (@s) {
TG> 	if (/"*([^"]+)"*,"*([^"]+)"*,"*([^"]+)"*/) {
TG> 		print ("$_\n\t$1\n\t$2\n\t$3\n");
TG> 	}
TG> }

TG> This example gives the result:

TG> "My family, My PowerBook, My Defender 110","1","mylife at home.com"
TG>          My family, My PowerBook, My Defender 110
TG>          1
TG>          mylife at home.com
TG> Scrooge,2,billionaire at minimum.com
TG>          Scrooge
TG>          2
TG>          billionaire at minimum.com
TG> RunRev List,"3,4,...","all at the-rest.co.uk"
TG>          RunRev List
TG>          3,4,...
TG>          all at the-rest.co.uk

TG> which is what you would expect.

TG> I don't know if it works in Rev because every implementation of RE is a bit different, and Perl
TG> has the best I've come across. Anyway: Perl can be installed on every machine, it's pre-installed
TG> on Unix, Linux and MacOS/X, so just use the power of this language in combination with Rev, RB or
TG> whatever development tool you use, instead of trying to do everything with one tool.

TG> I'm missing this flexibility in the usage of tools in the IT world. Nobody in the industry would
TG> use a Porsche to transport stones (except the ones weared around the neck or wherever ladies have
TG> them), and nobody would drive a fork-lift truck on a (German) Autobahn. Most of us use hands and
TG> feet for their respective purposes. So why do programmers want to use one tool for all?

TG> Cheers,

TG> Thomas G.

TG> ---

TG> For those of you who find it hard to read regular expressions (they are a good example of a
TG> write-only language):

TG> /"*([^"]+)"*,"*([^"]+)"*,"*([^"]+)"*/

TG> represents 3 times the same group, separated by a comma: "*([^"]+)"*

TG> This expression contains a prefix and a postfix: "* - which means "zero or more quotes".

TG> In the middle of the expression - enclosed in brackets - is the term to be extracted: [^"]+ -
TG> which reads: any character except a quote, but at least one. If you replace the "+" with a "*", it
TG> would be allowed to have to commata following each other.

TG> The regular expression can be shortened even more, but then it becomes completely
TG> uncomprehensible, and you need more time to comment it than to write it.
TG> _______________________________________________
TG> use-revolution mailing list
TG> use-revolution at lists.runrev.com
TG> http://lists.runrev.com/mailman/listinfo/use-revolution


Best regards, 




More information about the use-livecode mailing list