Digging Huge Files
Sivakatirswami
katir at hindu.org
Tue Aug 4 22:48:07 EDT 2009
Howard Bornstein wrote:
>> <http://quality.runrev.com/qacenter/show_bug.cgi?id=5078>
>>
>> Says it was fixed in 2.9.
>>
>>
>>
> I can confirm that this has been fixed and the 2 Gig limit is no longer in
> place.
>
>
OK I built my script: it digs 6 access log files whose paths are in a
field. each one is about 250MB in size. But, ad this is really weird. my
algorithm for getting summary results must be off...
My resultant file of hits one PDF is always 4409 line long
but the summary results change each time I run the script even if I
don't change the script!
one run
line: 4409
87.194.52.60 - - [05/Jul/2009:03:04:31 -0700] "GET
/archives/2009/7-9/pdf/Hinduism-Today_Jul-Aug-Sep_2009.pdf HTTP/1.1" 206
36757938 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB;
rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"
Summary:
Downloaded with Revolution HT Navigator: 629
Complete Downloads via HT site: 1693
Partial Downloads 206's (mostly failures, some successes): 6801
second run
87.194.52.60 - - [05/Jul/2009:03:04:31 -0700] "GET
/archives/2009/7-9/pdf/Hinduism-Today_Jul-Aug-Sep_2009.pdf HTTP/1.1" 206
36757938 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB;
rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"
Summary:
Downloaded with Revolution HT Navigator: 1039
Complete Downloads via HT site: 1283
Partial Downloads 206's (mostly failures, some successes): 6801
The 206 partial code is always the same but the hits for Rev downloads
and 200 completed keep changing... I can't figure it out.
global tStart
local tPartials,tRevHits,tCompleted
on mouseUp
put ticks() into tStart
repeat for each line y in fld "path"
put y & cr after tOutput
open file y for read
put 20000 into tStep
put 1000 into tChunkSize
put 1 & cr into tAccessLogFileChunk
# need to initialize this each time otherwise is empty and
skipped on next repeat
repeat until tAccessLogFileChunk is empty
read from file y for tChunkSize lines
put it into tAccessLogFileChunk
put processLogs(tAccessLogFileChunk) & cr after tOutput
end repeat
close file y
end repeat
repeat 5 times
replace (cr&cr) with cr in tOutput
end repeat
put cr & "Summary: " & Cr & "Downloaded with Revolution HT Navigator:
" & tRevHits & cr after tOutput
put "Complete Downloads via HT site: " & (tCompleted - tRevHits) & cr
after tOutput
put "Partial Downloads 206's (mostly failures, some successes): " &
tPartials after tOutput
ask file "Where should we save this?" with "HT PDF July Downloads-Aug
3 -June 28 .txt"
put it into tURL
put tOutput into url ("file:" & tURL)
calcTime(tStart)
end mouseUp
function processLogs tAccessLogFileChunk
put empty into tFoundLines
repeat for each line x in tAccessLogFileChunk
if x contains "Hinduism-Today_Jul-Aug-Sep_2009.pdf" then
put x into y
if y contains "Revolution" then add 1 to tRevHits
put ("1""e&"200 ") into tCompleteCode
put ("1""e&"206 ") into tPartialCode
if y contains tCompleteCode then add 1 to tCompleted
if y contains tPartialCode then add 1 to tPartials
put empty into y
put x & cr after tFoundLInes
end if
end repeat
return tFoundLines
End processLogs
btw... time to dig 6 files 1.5 GB: 3 min 10 secs
More information about the use-livecode
mailing list