Optimize This!
Mark Smith
mark at maseurope.net
Mon May 1 06:10:02 EDT 2006
Todd, the only problems I see are 1) that you might miss ocurrences
where the looked for string crosses chunk boundaries, and 2) You
never close the file, and as I understand it from the docs, Rev will
only close the file for you when the application quits.
Scotts point about the size of chunk you grab is a good one, though
the optimal size probably depends on how much memory your machine has.
Anyway, I've done this, which is untested, but grabs larger chunks,
searches the chunks, (rather than starting another read) and closes
the file when it returns...
function getPositionInBinaryFile pPath,pString,pStart,pOccurrence
if pStart is empty then put 1 into pStart
if pOccurrence is empty then put 1 into pOcurrence
put length(pString) into strLen
put 100000 into tSize
put 0 into tFoundCount
put 0 into tPos
open file pPath for binary read
repeat
read from file pPath at pStart for tSize
put it into tChunk
if tChunk is empty then
close file pPath
return 0
end if
put 0 into charsToSkip
repeat
get offset(pString,tChunk,charsToSkip)
if it > 0 then
add 1 to tFoundCount
if tFoundCount = pOcurrence then
put it + charsToSkip + pStart into tPos
close file pPath
return tPos
else
add it to charsToSkip
end if
end if
end repeat
put pStart + tSize - strLen into pStart -- so we don't miss any
occurrences that cross chunk boundaries
end repeat
end getPositionInBinaryFile
On 1 May 2006, at 07:28, Todd Geist wrote:
> Hello Everyone,
>
> I had the need to search large binary files for a string. The files
> could be over a gigabyte in size, so I decided not to load the
> whole file into ram but digest in chunks instead. This is the
> routine I came up with, it seems to work very quickly but I am
> wondering if some of you might be able to speed it up.
>
>
> FUNCTION GetPositionInBinaryFile pPath, pString, pStart, pOccurrence
> open file pPath for binary read
> IF pStart = "" THEN
> PUT 1 into pStart
> END IF
> IF pOccurrence = "" THEN
> PUT 1 into pOccurrence
> END IF
> put 20000 into tSize
> put 0 into tFoundCount
> REPEAT
> read from file pPath at pStart for tSize
> put Offset(pString, it) into n
> IF it is "" THEN
> Return "0"
> END IF
> IF n > 0 THEN
> put tFoundCount + 1 into tFoundCount
> IF tFoundCount = pOccurrence THEN
> put pStart + n into tPos
> Return tPos
> ELSE
> put pStart + n + 1 into pStart
> END IF
> ELSE
> put pStart + tSize into pStart
> END IF
> END REPEAT
> END GetPositionInBinaryFile
> close file pPath
>
> what do you think? Did I miss some obvious easier way?
>
> Thanks
>
> Todd
>
> --
>
> Todd Geist
> ______________________________________
> g e i s t i n t e r a c t i v e
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
More information about the use-livecode
mailing list