Digging Huge Files

Sivakatirswami katir at hindu.org
Mon Aug 3 23:11:38 EDT 2009


I'm wanting to process both error logs and access logs on our web sites.

I have logs rotating out weekly, so that means we have 7 access_log 
files each ranging about 250-300MB in size.

It is my understanding that

put url "file:/someGiantFile.txt" into tAccessLog

will load the entire variable into memory.  I might try that on one of 
our G5 quad towers, but prefer to work on my power book.

So I assume that one could use the read command

This works:

on mouseUp
   open file (fld "path") for read # path to local 245MB access_log.1 file

   put 1 into tStart
   put 20000 into tStep
  put 1000 into tChunkSize

  repeat  until tAccessLogFileChunk is empty
    
  read from file (fld "path") for tChunkSize lines
  put it into tAccessLogFileChunk
  put processLogs(tAccessLogFileChunk) & cr after tOutput
  put (tStart + tStep) into tStart
  put tStart & cr after tOutput
end repeat
ask file "Where should we save this?" with "LogResults.txt"
put it into tURL
put tOutput into url ("file:" & tURL)
end mouseUp

function processLogs tAccessLogFileChunk
   put empty into tFoundLines
   repeat for each line x in tAccessLogFileChunk
      if x contains "revolution" then
         put x & cr after tFoundLInes
      end if
   end repeat
   return tFoundLines
End processLogs



any comments on optimizing this?

Actually it was pretty speedy, until a minute to get to the save the 
results out... I could add paths
to all six access files for a week and probably get all the results for 
a search out in under 5 minutes for 1.5Gig...So, that's really not too bad.

Sivakatirswami








More information about the use-livecode mailing list