Optimize This!

Mark Smith mark at maseurope.net
Mon May 1 06:10:02 EDT 2006


Todd, the only problems I see are 1) that you might miss ocurrences  
where the looked for string crosses chunk boundaries, and 2) You  
never close the file, and as I understand it from the docs, Rev will  
only close the file for you when the application quits.

Scotts point about the size of chunk you grab is a good one, though  
the optimal size probably depends on how much memory your  machine has.

Anyway, I've done this, which is untested, but grabs larger chunks,  
searches the chunks, (rather than starting another read) and closes  
the file when it returns...

function getPositionInBinaryFile pPath,pString,pStart,pOccurrence
   if pStart is empty then put 1 into pStart
   if pOccurrence is empty then put 1 into pOcurrence
   put length(pString) into strLen
   put 100000 into tSize
   put 0 into tFoundCount
   put 0 into tPos

    open file pPath for binary read

   repeat
     read from file pPath at pStart for tSize
     put it into tChunk
     if tChunk is empty then
       close file pPath
       return 0
     end if

     put 0 into charsToSkip
     repeat
       get offset(pString,tChunk,charsToSkip)
       if it > 0 then
         add 1 to tFoundCount
         if tFoundCount = pOcurrence then
           put it + charsToSkip + pStart into tPos
           close file pPath
           return tPos
         else
           add it to charsToSkip
         end if
       end if
     end repeat

     put pStart + tSize - strLen into pStart -- so we don't miss any  
occurrences that cross chunk boundaries
   end repeat
end getPositionInBinaryFile

On 1 May 2006, at 07:28, Todd Geist wrote:

> Hello Everyone,
>
> I had the need to search large binary files for a string. The files  
> could be over a gigabyte in size, so I decided not to load the  
> whole file into ram but digest in chunks instead.  This is the  
> routine I came up with, it seems to work very quickly but I am  
> wondering if some of you might be able to speed it up.
>
>
> FUNCTION GetPositionInBinaryFile pPath, pString, pStart, pOccurrence
>     open file pPath for binary read
>     IF pStart = "" THEN
>         PUT 1 into pStart
>     END IF
>     IF pOccurrence = "" THEN
>         PUT 1 into pOccurrence
>     END IF
>     put 20000 into tSize
>     put 0 into tFoundCount
>     REPEAT
>         read from file pPath at pStart for tSize
>         put Offset(pString, it) into n
>         IF it is "" THEN
>             Return "0"
>         END IF
>         IF n > 0 THEN
>             put tFoundCount + 1 into tFoundCount
>             IF tFoundCount = pOccurrence THEN
>                 put pStart + n into tPos
>                 Return tPos
>             ELSE
>                 put pStart + n + 1 into pStart
>             END IF
>         ELSE
>             put pStart + tSize into pStart
>         END IF
>     END REPEAT
> END GetPositionInBinaryFile
> close file pPath
>
> what do you think?  Did I miss some obvious easier way?
>
> Thanks
>
> Todd
>
> -- 
>
> Todd Geist
> ______________________________________
> g e i s t   i n t e r a c t i v e
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution




More information about the use-livecode mailing list