the large file challenge

Sadhunathan Nadesan sadhu at castandcrew.com
Tue Nov 12 12:28:01 EST 2002


| >>> So ! MC as far so fast than Pascal ! Is'nt it great ? And, thanks again
| >>> to Scott, for that too !
| >> 
| >> It's enough to make a Java programmer cry. ;)
| > 
| > Java ? Help me to remember... Are you speaking, Richard, in about this
| > dead marketed toy that crashes any time he search some more ram to eat ?
| 
| If you're thinking of the one with the slow development cycle and the even
| slower runtime speed, yep, that's the critter.
| 
| Anyone care to write this challenge algorithm in Java for laughs?  Or would
| we need Raney to add a new time token in addition to "seconds", "ticks", and
| "milliseconds":  "eons".


Someone mentioned about C vs Pascal too, can't find that at the moment but
a couple of thoughts.  It seems to me the main revelation of this excercise
is that everything came out about the same.  I think that defies the 
conventional thinking that interpreted languages are slower than compiled.

Now, we could probably go back to the drawing board and eek a few less
seconds out of each alrgorithm.  For example Scott mentioned using binary
read (then you have to put in the extra code he discussed) and we could probably
improve the Pascal with a similar approach with block reads; however, 
leaving them all doing line reads makes them all fairly comparable and
again, I think it's surprising they take about the same time.  One
point should be to try to write it in as few as lines as possible since that
is generally an advantage of the 4GL's.

I don't have a Java version yet but below is a C version.  It's a bit longer
than really need be because of copying a routine from our libraries at work
rather than using a C intrinsic, and again, the result is, about the same time.
Obviously harder to write.  I'm guessing Perl and Java might be in the same ballpark
too and maybe I will pursue that.  As far as the pascal i'm using the "free pascal"
compiler out of europe, no particular optimizations, compiled with just a
pc386 command.  The C is the gnu ansi c compiler invoked with a -o (optimize)
command.  

Here's the latest round of times


bash 1:44
pascal 2:04
C 2:28
MC 2:10

goodness, C is slowest of all?!?



#include <stdio.h>
#include <string.h>

/*============================================================================*/

int fgetnline(FILE *, char *, unsigned int);

/*============================================================================*/

int main(
int	argc,
char	*argv[]
)
{
     char	pattern[] = "mystic_mouse";
     char	buf[300];
     int	count = 0;


     while (fgetnline(stdin, buf, sizeof(buf)) != EOF) {
	  if (strstr(buf, pattern) != NULL)
	       count++;
     }

     (void) fprintf(stdout, "%d\n", count);

     return 0;
}

/* ========================================================================= */

int fgetnline(
FILE	*fp,		/* IN:  Stream to read from. */
char	*buf,		/* OUT: Buffer to fill.      */
unsigned int	bufsize	/* IN:  Size of <buf>.       */
)
{
     int		c;
     unsigned int	count = 0;


     while ((c = fgetc(fp)) != EOF && c != '\n')
	  if (bufsize == 0 || ++count < bufsize)
	       *buf++ = (char) c;

     *buf = '\0';

     return c;
}

/*============================================================================*/



More information about the metacard mailing list