MatchText, MatchChunk and the needle in the haystack

Bryan McCormick bryan at deepfoo.com
Tue Mar 20 06:12:55 EDT 2007


Jim, Dave, Devin

Thanks for your help in making me think harder about this. I literally 
woke up out of a dream this morning and knew right away what was wrong 
with the script. There was one error that would have persistently been a 
problem that I have fixed now.

In the interests of anyone else who encounters a similar horrible string 
task, the solution is provided below.

One more thing. You all get credit for making me think harder about what 
else was in the files that might have been a random char throwing things 
off.

Now, I did go and change the script to make it simpler. I realized I 
only needed to find the hyphen at the start of the date and simply 
advance forward past the next hyphen in the date string. Since we were 
dealing with fixed length records forward from the first hyphen (three 
char month, hyphen, two char year) this was the simplest way.

Genius? I thought so.

As luck would happen I had hit upon the few records that were problem 
children right off the bat.

It turned out that a few of the records had the word "in-line" with a 
hyphen which threw off the whole thing. So there is a separate script 
when the file is read in that checks now for nulls, odd-ball ascii 
codes, and our friend "in-line". I was lucky in this case that the 
records were so simple. The alternative would have been to keep the 
"-Jan-...-Dec-" chunks and walk through the file 12 times. No big deal I 
suppose and it could always be done that way if one had different chunks 
to search for.

Anyway, here is the finished script with comments. I hope it helps 
others who might have similar issues. I have over 5000 of these files to 
do which will now take about ten minutes versus the agony (and days) I'd 
have had to endure if there had been no community here to draw upon for 
help and if rev was not so darn handy.

By the way the script that adds the return character also puts in a 
comma in the right place after the date so that I have another delimiter 
to work with and the record in the end is comma delimited with a return 
character as the record marker. Much better than the ugly long single 
string I started out with.

Thanks All.

------------------------------------------


on mouseUp
   put fld 1 into textBlock
   put makeOffsets("-",textBlock,1) into varOffsets
   sort lines of varOffsets numeric descending
   -- this is the only way it works as otherwise the char count gets thrown
   -- off. essentially we are working up from the end of the string forward
   repeat for each line varRecord in varOffsets
     put char varRecord-2 to varRecord-1 of textBlock into eval
     if char 1 of eval is a number and char 2 of eval is a number  then
       put comma after char varRecord+6 of textBlock
       put cr  before char varRecord-2 of textBlock
     else
       if char 1 of eval is  not a  number and char 2 of eval is a 
number   then
         put comma after char varRecord+6 of textBlock
         put cr before char varRecord-1 of textBlock
       end if
     end if
   end repeat
   put textBlock into fld 1
end mouseUp

function makeOffsets varChunk,textBlock,posStart
   if posStart = empty then
     put 1 into pos
   else
     put posStart into pos
   end if
   repeat until varOffset = 0
     put offset(varChunk, textBlock, pos) into varOffset
     if varOffset <>0 then
       put varOffset+pos&return after newText
       -- this was what was mucked-up in the original script
       -- have to add the prior pos to the new one since we
       -- are using the "skip chars" option and need to add
       -- add the prior position to the new relative pos
       add varOffset+length(varChunk)+6 to pos
       -- i could get away with adding a fixed number in this
       -- case since the date was never going to be shorter than
       -- six chars + the found offset + chunk, ("-") in this case
     else
       exit repeat
     end if
   end repeat
   return newText
end makeOffsets



More information about the use-livecode mailing list