Help with Regex (was Re: Switch, Case and wild-cards?)
Richmond
richmondmathewson at gmail.com
Sun Dec 30 15:31:01 EST 2012
On 12/30/2012 10:08 PM, Robert Sneidar wrote:
> I ran into a similar problem doing search and replace in word to clean up text from some data system destined for excel. My solution was often to replace a character or string I wanted to preserve with a placeholder, replace or remove what remains, then restore my placeholders with the original values. In situations like this, work with bigger strings first, and sometimes you need to work it end to beginning, especially if you are using a word or character counter to keep track of where you are.
>
> Bob Sneidar
> IT Manager
> Calvary Chapel CM
> Sent from iPhone
>
>
Well, I am doing perfectly alright with the pattern searching routine I
worked
out with Unicode; as long as one works out which pattern to search for
first,
second and so forth everything is comparatively straightforward.
This does not involve placeholders, nor anything else as bizarre.
Here's a text:
| abcdGfEhijGkElmnopGqrstuEvwxfyz |
and I know that I have to move 'E' forwards to before the letter that
precedes it,
I know that I have to move 'G' to after the letter it follows,
and, I know that I have to replace 'f' with '&&&'.
Now as 'G' and 'E' sometimes occur as 'doublets', vis 'GkE' I know that they
have to swap before worrying about single instances of either 'G' or 'E',
and that, as 'f' sometimes occurs in relation to either 'G' or 'E', or
both of them,
I have to replace 'f' with '&&&' after the other operations, otherwise
they won't work.
So:
1. swap 'G' and 'E' when they surround one character.
2. move 'E' forwards by 1 when it occurs in isolation, and make sure
that precludes those 'E' chars that have already been swapped by rule #1.
3. Ditto for 'G'.
4. replace every instance of 'a' by '&&&'.
One of the ways of avoiding falling over the results of rule #1 while
implementing
#2 and #3 is to encode 'G' and 'E' in rule #1 as different symbols, say
'%' and '@'
after processing so that rules #2 and #3 don't pick them up (one can
always have
some rules #5 and #6 to replace '%' and '@' with 'G' and 'E' after
running through
rules #2, #3 and #4).
Now the "fun" of the whole thing is that I have to do that sort of thing
with texts that contain
about 20 patterns of the "swap X with Y" type.
--------------------------------
Having worked out a way to do this in 2010 (and then being fairly stupid
and forgetting the whole thing)
with Unicode putting the whole thing into practise has nothing at all to
do with the strengths or
short-comings of Livecode, but the limitations of the human mind to get
itself wrapped around the
underlying logic needed to work out the correct sequence of transformations.
Richmond.
More information about the use-livecode
mailing list