Digging Huge Files

Sivakatirswami katir at hindu.org
Tue Aug 4 22:48:07 EDT 2009


Howard Bornstein wrote:
>> <http://quality.runrev.com/qacenter/show_bug.cgi?id=5078>
>>
>> Says it was fixed in 2.9.
>>
>>
>>     
> I can confirm that this has been fixed and the 2 Gig limit is no longer in
> place.
>
>   
OK I built my script: it digs 6 access log files whose paths are in a 
field. each one is about 250MB in size. But, ad this is really weird. my 
algorithm for getting summary results must be off...

My resultant file of hits one PDF is always 4409 line long

but the summary results change each time I run the script even if I 
don't change the script!


one run

line: 4409

87.194.52.60 - - [05/Jul/2009:03:04:31 -0700] "GET 
/archives/2009/7-9/pdf/Hinduism-Today_Jul-Aug-Sep_2009.pdf HTTP/1.1" 206 
36757938 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; 
rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"

Summary:
Downloaded with Revolution HT Navigator: 629
Complete Downloads via HT site: 1693
Partial Downloads 206's (mostly failures, some successes): 6801

second run

87.194.52.60 - - [05/Jul/2009:03:04:31 -0700] "GET 
/archives/2009/7-9/pdf/Hinduism-Today_Jul-Aug-Sep_2009.pdf HTTP/1.1" 206 
36757938 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; 
rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"

Summary:
Downloaded with Revolution HT Navigator: 1039
Complete Downloads via HT site: 1283
Partial Downloads 206's (mostly failures, some successes): 6801

The 206 partial code is always the same but the hits for Rev downloads 
and 200 completed keep changing... I can't figure it out.

global tStart
local tPartials,tRevHits,tCompleted

on mouseUp
   put ticks() into tStart
   repeat for each line y in fld "path"
      put y & cr after tOutput
      open file y for read
      put 20000 into tStep
      put 1000 into tChunkSize
      put 1 & cr into tAccessLogFileChunk
        # need to initialize this each time otherwise is empty and 
skipped on next repeat
      repeat  until tAccessLogFileChunk is empty
         read from file y for tChunkSize lines
         put it into tAccessLogFileChunk
         put processLogs(tAccessLogFileChunk) & cr after tOutput
      end repeat
      close file y
   end repeat
   repeat 5 times
      replace (cr&cr) with cr in tOutput
   end repeat
  
   put cr & "Summary: " & Cr & "Downloaded with Revolution HT Navigator: 
" & tRevHits & cr after tOutput
   put "Complete Downloads via HT site: " & (tCompleted - tRevHits) & cr 
after tOutput
   put "Partial Downloads 206's (mostly failures, some successes): " & 
tPartials  after tOutput
  
   ask file "Where should we save this?" with "HT PDF July Downloads-Aug 
3 -June 28 .txt"
   put it into tURL
   put tOutput into url ("file:" & tURL)
   calcTime(tStart)
end mouseUp

function processLogs tAccessLogFileChunk
   put empty into tFoundLines
   repeat for each line x in tAccessLogFileChunk
      if x contains "Hinduism-Today_Jul-Aug-Sep_2009.pdf" then
         put x into y
         if y contains "Revolution" then add 1 to tRevHits
         put  ("1"&quote&"200 ") into tCompleteCode
         put  ("1"&quote&"206 ") into tPartialCode
         if y contains tCompleteCode then add 1 to tCompleted
         if y contains tPartialCode then add 1 to tPartials
         put empty into y
         put x & cr after tFoundLInes
      end if
     
     
   end repeat
   return tFoundLines
End processLogs


btw... time to dig 6 files 1.5 GB: 3 min 10 secs






More information about the use-livecode mailing list