MatchText, MatchChunk and the needle in the haystack
Bryan McCormick
bryan at deepfoo.com
Tue Mar 20 06:12:55 EDT 2007
Jim, Dave, Devin
Thanks for your help in making me think harder about this. I literally
woke up out of a dream this morning and knew right away what was wrong
with the script. There was one error that would have persistently been a
problem that I have fixed now.
In the interests of anyone else who encounters a similar horrible string
task, the solution is provided below.
One more thing. You all get credit for making me think harder about what
else was in the files that might have been a random char throwing things
off.
Now, I did go and change the script to make it simpler. I realized I
only needed to find the hyphen at the start of the date and simply
advance forward past the next hyphen in the date string. Since we were
dealing with fixed length records forward from the first hyphen (three
char month, hyphen, two char year) this was the simplest way.
Genius? I thought so.
As luck would happen I had hit upon the few records that were problem
children right off the bat.
It turned out that a few of the records had the word "in-line" with a
hyphen which threw off the whole thing. So there is a separate script
when the file is read in that checks now for nulls, odd-ball ascii
codes, and our friend "in-line". I was lucky in this case that the
records were so simple. The alternative would have been to keep the
"-Jan-...-Dec-" chunks and walk through the file 12 times. No big deal I
suppose and it could always be done that way if one had different chunks
to search for.
Anyway, here is the finished script with comments. I hope it helps
others who might have similar issues. I have over 5000 of these files to
do which will now take about ten minutes versus the agony (and days) I'd
have had to endure if there had been no community here to draw upon for
help and if rev was not so darn handy.
By the way the script that adds the return character also puts in a
comma in the right place after the date so that I have another delimiter
to work with and the record in the end is comma delimited with a return
character as the record marker. Much better than the ugly long single
string I started out with.
Thanks All.
------------------------------------------
on mouseUp
put fld 1 into textBlock
put makeOffsets("-",textBlock,1) into varOffsets
sort lines of varOffsets numeric descending
-- this is the only way it works as otherwise the char count gets thrown
-- off. essentially we are working up from the end of the string forward
repeat for each line varRecord in varOffsets
put char varRecord-2 to varRecord-1 of textBlock into eval
if char 1 of eval is a number and char 2 of eval is a number then
put comma after char varRecord+6 of textBlock
put cr before char varRecord-2 of textBlock
else
if char 1 of eval is not a number and char 2 of eval is a
number then
put comma after char varRecord+6 of textBlock
put cr before char varRecord-1 of textBlock
end if
end if
end repeat
put textBlock into fld 1
end mouseUp
function makeOffsets varChunk,textBlock,posStart
if posStart = empty then
put 1 into pos
else
put posStart into pos
end if
repeat until varOffset = 0
put offset(varChunk, textBlock, pos) into varOffset
if varOffset <>0 then
put varOffset+pos&return after newText
-- this was what was mucked-up in the original script
-- have to add the prior pos to the new one since we
-- are using the "skip chars" option and need to add
-- add the prior position to the new relative pos
add varOffset+length(varChunk)+6 to pos
-- i could get away with adding a fixed number in this
-- case since the date was never going to be shorter than
-- six chars + the found offset + chunk, ("-") in this case
else
exit repeat
end if
end repeat
return newText
end makeOffsets
More information about the use-livecode
mailing list