Digging Huge Files
katir at hindu.org
Mon Aug 3 23:11:38 EDT 2009
I'm wanting to process both error logs and access logs on our web sites.
I have logs rotating out weekly, so that means we have 7 access_log
files each ranging about 250-300MB in size.
It is my understanding that
put url "file:/someGiantFile.txt" into tAccessLog
will load the entire variable into memory. I might try that on one of
our G5 quad towers, but prefer to work on my power book.
So I assume that one could use the read command
open file (fld "path") for read # path to local 245MB access_log.1 file
put 1 into tStart
put 20000 into tStep
put 1000 into tChunkSize
repeat until tAccessLogFileChunk is empty
read from file (fld "path") for tChunkSize lines
put it into tAccessLogFileChunk
put processLogs(tAccessLogFileChunk) & cr after tOutput
put (tStart + tStep) into tStart
put tStart & cr after tOutput
ask file "Where should we save this?" with "LogResults.txt"
put it into tURL
put tOutput into url ("file:" & tURL)
function processLogs tAccessLogFileChunk
put empty into tFoundLines
repeat for each line x in tAccessLogFileChunk
if x contains "revolution" then
put x & cr after tFoundLInes
any comments on optimizing this?
Actually it was pretty speedy, until a minute to get to the save the
results out... I could add paths
to all six access files for a week and probably get all the results for
a search out in under 5 minutes for 1.5Gig...So, that's really not too bad.
More information about the Use-livecode