MatchText, MatchChunk and the needle in the haystack
Bryan McCormick
bryan at deepfoo.com
Mon Mar 19 13:24:22 EDT 2007
Jim,
Thanks for the script snippet. It didn't quite work as shown, but it did
get me to think about the problem more carefully. I came up with this:
put
"-Jan-,-Feb-,-Mar-,-Apr-,-May-,-Jun-,-Jul-,-Aug-,-Sep-,-Oct-,-Nov-,-Dec-"
into mthStrings
-- i seemed to need to separate the routine here, running it with the
loops as shown didn't function as expected.
repeat for each item mth in mthStrings
put makeOffsets(mth,textBlock) after varOffsets
end repeat
sorts line of varOffsets numeric
-- note that i added a third param in case i need to "force" the routine
to start elsewhere. it is set to 0 when i run this on the string in
question (which by the way is about 5000 chars long)
function makeOffsets mth,textBlock,posStart
if posStart = empty then
put 0 into pos
else
put posStart into pos
end if
repeat until varOffset = 0
put offset(mth, textBlock, pos) into varOffset
if varOffset <> 0 and varOffset <> posStart then
if pos <> 0 then
put pos&return after newText
end if
add varoffset+length(mth)+1 to pos
else
exit repeat
end if
end repeat
return newText
end makeOffsets
There is another routine that then does some manipulation on the
returned offsets since I need to put the return in BEFORE the date and
as luck would have it the day part of the date (format is
day-month-year) is not always two characters so I had to add in a
routine to check for numerics back from the offset position.
Here is the odd thing though. As far as I can see the script should work
perfectly on a string without any delims and a bunch of dates in it.
Oddly this is not the case.
It mostly works (which means I've made a mistake or the file isn't quite
as neat as I think it is) but gets thrown off and does not find offsets
that it should. It does not seem to matter how long or short the record
is nor does it happen consistently in the same place. But it always
happens. I've looked for possible length errors (did I overshoot a
record) but that does not seem possible or the whole thing would be broken.
What happens is, randomly it seems, some lines contain multiple records
in a single string.
Thoughts greatly appreciated.
I could (and probably will) write another routine for expediency to walk
through the lines of the partially correct records to see if there is
another date line item in it, but I have to say I am stumped as to how
it could be skipping over some records and then finding them just fine
after the error occurs.
I checked for random oddball chars and confirmed that the dates not
found are in fact properly formatted as x or xx-JAN-xx.
And oh yes, I am able to find the offset("-Nov-", fld 1) in the field
that the resulting partially recovered list is placed in. So it does not
appear to be an offset bug, not one that I can see anyway.
More information about the use-livecode
mailing list