MatchText, MatchChunk and the needle in the haystack

Jim Ault JimAultWins at yahoo.com
Wed Mar 21 05:27:03 EDT 2007


On 3/21/07 1:32 AM, "Peter Alcibiades" <palcibiades-first at yahoo.co.uk>
wrote:

> Can you do it with a text editor and regular expressions?  I'm genuinely
> diffident about asking, because you all have so much more experience that if
> it were this easy, you'd have suggested it.
<full text below>

My basic approach for this kind of question is to assume that users have
very little experience with regular expressions combined with knowing very
little about the data set they are mining.

Also, the question they actually ask on the list is just one part of the
over-all task.  Given these three things, I like to propose tools that let
them "see" some of the pit falls that making incorrect assumptions about the
date can create.  One pit fall is assuming all occurrences of the date
string will be correctly formatted and intact.

I guess I look at it as 'what will help them build a tool they can trust'.

Don't get me wrong, I like and use regEx in a few of my apps for effectively
extracting clean data from a variety of web sites. I like its power and
flexibility.

As you say, if the user already knew some of the simpler regEx, the question
probably would not have appeared on the list.

I cannot speak for others on the list, but it seems that those who venture
into regEx only occasionally, get frustrated and are better off using the
chunking expressions of Rev.  Even when presented with a good regEx answer,
they are not sure what they are looking at.

By the way, nulls will make MatchText, etc fail, so "replace null with empty
in textBlock" needs to be part of the process for unknown data sources.

As far as using a text editor, that is usually my first step.  I like BBEdit
on an OSX platform, so I agree with your basic premise, start simple and
build up.

Nice to know you are paying attention to the big picture  :-)
Good post.

Jim Ault
Las Vegas


On 3/21/07 1:32 AM, "Peter Alcibiades" <palcibiades-first at yahoo.co.uk>
wrote:

> Can you do it with a text editor and regular expressions?  I'm genuinely
> diffident about asking, because you all have so much more experience that if
> it were this easy, you'd have suggested it.  But anyway, is there something
> wrong with the following?
> 
> I made up a fragment of a file like this in the form
> 02-Mar-92sometext01-Sep-04somemore text........and a few more entries of the
> same sort.
> 
> Then opened it in Kate (but presumably all programming editors have similar
> functionality?)
> 
> Then did a match with regular expressions in the Find part of the menu.  It
> helped construct the following expression:
> 
> [\d][\d]-[\D][\D][\D]-[\d][\d]
> 
> which really would not have been so very hard to figure out unaided - a
> classic case of the obligatory gui getting in the way of your typing.  This
> picks up all dates and it obviously misses other hyphenated expressions.
> 
> Then in the replace section I put
> 
> Enter\0
> 
> It uses the \0 as backwards reference, so to include all the found string in
> the replacement.
> 
> The only hard part, all of ten seconds, was that I didn't seem able to enter a
> line feed character directly, like by \n for instance, but I just copied and
> pasted one and bingo, it worked fine.  I ended up with a bunch of lines like
> this:
> 
> 02-Mar-92sometext
> 01-Sep-04somemore text..    and so on.
> 
> Was that what was wanted?
> 
> This was almost instant.  I guess if I'd a lot to do, I would think of an awk
> one liner, but have forgotten how to do backward references in awk.  And it
> would be even more embarrassing to have both got the above all wrong and to
> also cite duff awk scripts!
> 
> Peter





More information about the use-livecode mailing list