MatchText, MatchChunk and the needle in the haystack

Bryan McCormick bryan at deepfoo.com
Mon Mar 19 13:24:22 EDT 2007


Jim,

Thanks for the script snippet. It didn't quite work as shown, but it did 
get me to think about the problem more carefully. I came up with this:

put 
"-Jan-,-Feb-,-Mar-,-Apr-,-May-,-Jun-,-Jul-,-Aug-,-Sep-,-Oct-,-Nov-,-Dec-" 
into mthStrings


-- i seemed to need to separate the routine here, running it with the 
loops as shown didn't function as expected.


  repeat for each item mth in mthStrings
      put makeOffsets(mth,textBlock) after varOffsets
   end repeat

sorts line of varOffsets numeric

-- note that i added a third param in case i need to "force" the routine 
to start elsewhere. it is set to 0 when i run this on the string in 
question (which by the way is about 5000 chars long)

function makeOffsets mth,textBlock,posStart
   if posStart = empty then
     put 0 into pos
   else
     put posStart into pos
   end if
   repeat until varOffset = 0
     put offset(mth, textBlock, pos) into varOffset
     if varOffset <> 0 and varOffset <> posStart then
       if pos  <> 0 then
         put pos&return after newText
       end if
       add varoffset+length(mth)+1 to pos
     else
       exit repeat
     end if
   end repeat
   return newText
end makeOffsets



There is another routine that then does some manipulation on the 
returned offsets since I need to put the return in BEFORE the date and 
as luck would have it the day part of the date (format is 
day-month-year) is not always two characters so I had to add in a 
routine to check for numerics back from the offset position.

Here is the odd thing though. As far as I can see the script should work 
perfectly on a string without any delims and a bunch of dates in it. 
Oddly this is not the case.

It mostly works (which means I've made a mistake or the file isn't quite 
as neat as I think it is) but gets thrown off and does not find offsets 
that it should. It does not seem to matter how long or short the record 
is nor does it happen consistently in the same place. But it always 
happens. I've looked for possible length errors (did I overshoot a 
record) but that does not seem possible or the whole thing would be broken.

What happens is, randomly it seems, some lines contain multiple records 
in a single string.

Thoughts greatly appreciated.

I could (and probably will) write another routine for expediency to walk 
through the lines of the partially correct records to see if there is 
another date line item in it, but I have to say I am stumped as to how 
it could be skipping over some records and then finding them just fine 
after the error occurs.

I checked for random oddball chars and confirmed that the dates not 
found are in fact properly formatted as x or xx-JAN-xx.

And oh yes, I am able to find the offset("-Nov-", fld 1) in the field 
that the resulting partially recovered list is placed in. So it does not 
appear to be an offset bug, not one that I can see anyway.




More information about the use-livecode mailing list