the large file challenge

Sadhunathan Nadesan sadhu at castandcrew.com
Sun Nov 10 13:06:00 EST 2002


| If we're allowed to read the whole thing into RAM and the goal is the count
| the occurences of the string "mystic_mouse", then to optimize speed we can
| just remove the redundant read commands and use offset to search for us:
| 
| #!/usr/local/bin/mc
| on startup
|   put "/gig/tmp/log/xaa" into the_file
|   put url ("file:"&the_file) into the_text
|   put 0 into the_counter
|   put 1 into tPointer
|   --
|   repeat for each line this_line in the_text
|     get offset("mystic_mouse", the_text, tPointer)
|     if it = 0 then exit repeat
|     add 1 to the_counter
|     add it to tPointer
|   end repeat
|   put the_counter
| end startup
| 
| This is off the top of my head.  If it runs I'd be interested in how it
| compares.


Richard,

Thanks much for the code and suggestions.  We aren't allowed to read
the whole thing into memory because the real access file is 300meg
and my poor little Linux box has only 128meg RAM.  One of the great things
about Linux of course is that it will run fine on minimal hardware.

Anyway, alas, the program failed with this message

mc: out of memory
0

Ok, on to the next suggestion!

Sadhu



More information about the metacard mailing list