a regular expression question, or at least a text manipulation question
Alex Tweedly
alex at tweedly.net
Wed Aug 27 17:14:15 EDT 2008
Peter Alcibiades wrote:
> How do you do the following?
>
> I have a series of lines which go like this
>
> | [record separator, new record starts]
> AAA consectetur adipisicing elit, sed
> BBB lorem ipsum
> CCC consectetur adipisicing elit, sed
> CCC laboris nisi ut aliquip ex ea
> DDD ut aliquip ex ea commodo
> | [record separator]
> AAA adipisicing elit, sed [new record starts]
>
> | is the record separator.
>
> In the above, its CCC that is repeated, but it could be any prefix. Also CCC
> is next to its repetition. This will always be the case.
>
> I want to go through the file. When I find a single prefix (like AAA) this
> should be written to the output file. when the next line starts with the
> same prefix (as in the CCC cases, I want to put both occurences on the same
> line. So the desired output would be
>
> AAA consectetur adipisicing elit, sed
> BBB lorem ipsum
> CCC consectetur adipisicing elit, sed CCC laboris nisi ut aliquip ex ea
> DDD ut aliquip ex ea commodo
> EOR
> AAA adipisicing elit, sed
>
> How do I detect a repetition of that sort and do this?
>
>
Here's a simple script that does what I think you want ....
> on mouseUp
> put "x" into tSeparator # should be TAB, but for testing that's
> inconvenient
> put field "F1" into lInput
> put "" into lastPrefix
> put "" into lastSeriesOfLines
> repeat for each line L in lInput
> put char 1 to 3 of L into tPrefix
> replace tSeparator & tPrefix with tPrefix in char 4 to -1 of L
> if tPrefix = lastPrefix then
> put L after lastSeriesOfLines
> else
> put lastSeriesOfLines & CR after lOutput
> # NB this assumes record separator comes on its own on the line
> put tPrefix into lastPrefix
> put L into lastSeriesOfLines
> end if
> end repeat
> put lastSeriesOfLines & CR after lOutput
> put lOutput into field "F2"
> end mouseUp
>
> A similar question, if the line is
>
> CCC adipisicing elit, sed TAB CCC adipisicing elit, sed
>
> How do you detect the multiple occurence (I can do this with regex) and then
> write out in place of thie above expression (this I don't see how to do) the
> following:
>
> CCC adipisicing elit, sed CCC adipisicing elit, sed
>
>
See the "replace" line of the script above. Note - I was using input
fields, so the TAB was a nuisance, hence the introduction of tSeparator
.... You may need to adjust for whether spaces are significant, etc.
-- Alex.
More information about the use-livecode
mailing list