a regular expression question, or at least a text manipulation question

Alex Tweedly alex at tweedly.net
Wed Aug 27 17:14:15 EDT 2008


Peter Alcibiades wrote:
> How do you do the following?
>
> I have a series of lines which go like this
>
> |  [record separator, new record starts]
> AAA consectetur adipisicing elit, sed
> BBB lorem ipsum
> CCC consectetur adipisicing elit, sed
> CCC laboris nisi ut aliquip ex ea
> DDD ut aliquip ex ea commodo
> | [record separator]
> AAA adipisicing elit, sed   [new record starts]
>
> | is the record separator.
>
> In the above, its CCC that is repeated, but it could be any prefix.  Also CCC 
> is next to its repetition.  This will always be the case.
>
> I want to go through the file.  When I find a single prefix (like AAA) this 
> should be written to the output file.  when the next line starts with the 
> same prefix (as in the CCC cases, I want to put both occurences on the same 
> line.  So the desired output would be
>
> AAA consectetur adipisicing elit, sed
> BBB lorem ipsum
> CCC consectetur adipisicing elit, sed CCC laboris nisi ut aliquip ex ea
> DDD ut aliquip ex ea commodo
> EOR
> AAA adipisicing elit, sed
>
> How do I detect a repetition of that sort and do this? 
>
>   
Here's a simple script that does what I think you want ....

> on mouseUp
>   put "x" into tSeparator # should be TAB, but for testing that's 
> inconvenient
>   put field "F1" into lInput
>   put "" into lastPrefix
>   put "" into lastSeriesOfLines
>   repeat for each line L in lInput
>     put char 1 to 3 of L into tPrefix
>     replace tSeparator & tPrefix with tPrefix in char 4 to -1 of L
>     if tPrefix = lastPrefix then
>       put L after lastSeriesOfLines
>     else
>       put lastSeriesOfLines & CR after lOutput
>       # NB this assumes record separator comes on its own on the line
>       put tPrefix into lastPrefix
>       put L into lastSeriesOfLines
>     end if
>   end repeat
>   put lastSeriesOfLines & CR after lOutput
>   put lOutput into field "F2"
> end mouseUp
>

> A similar question, if the line is
>
> CCC  adipisicing elit, sed TAB CCC  adipisicing elit, sed
>
> How do you detect the multiple occurence (I can do this with regex) and then 
> write out in place of thie above expression (this I don't see how to do) the 
> following:
>
> CCC  adipisicing elit, sed CCC  adipisicing elit, sed
>
>   
See the "replace" line of the script above. Note - I was using input 
fields, so the TAB was a nuisance, hence the introduction of tSeparator 
....   You may need to adjust for whether spaces are significant, etc.

-- Alex.



More information about the use-livecode mailing list