Read and Analyze Giant Files

Thu Nov 7 19:42:01 EST 2002

At 8:42 PM -0800 11/7/02, Sannyasin Sivakatirswami wrote:
>We are trying to use Rev (or MC) to analyze a web site access log
>that is 3 million lines long, a 300 meg (or more) file.
>
>If I try a shell script (interpreted) or pascal program (compiled)
>each runs in about 2 minutes on this file but an xTalk script takes
>a very long time, maybe it hangs forever?
[...]

>#!/usr/local/bin/mc
> on startup
>   put empty into the_message
>   put 0 into the_counter
>   read from stdin until empty
>   put it into the_message
>   repeat for each line this_line in the_message
>     if (this_line contains "mystic_mouse") then
>       put the_counter + 1 into the_counter
>     end if
>   end repeat
>   put the_counter
> end startup

Meh. Reading that large a file all at once I would guess is why you're
experiencing such slowness.

Is it possible for the target string to appear more than once in a single
line? If not, try something like this:

repeat
  read from stdin until "mystic_mouse"
  if the result is not empty then add 1 to the_counter -- found it
  else exit repeat -- encountered end of file, no more occurrences
end repeat
put the_counter

--
Jeanne A. E. DeVoto ~ jeanne at runrev.com
Runtime Revolution Limited - The Solution for Software Development
http://www.runrev.com/