Fast Searches

Tue Jan 11 14:06:54 EST 2005

Many thanks!

Now I've got a better idea of how the repeat for each can solve this, 
although I must say, I've also become curious about Valentina data 
bases and SQL and will probably continue to look into that, too.  
Nonetheless, these solutions have greatly increased speed!

Thanks,

Ray Horsley
Developer, LinkIt! Software

On Tuesday, January 11, 2005, at 07:15  AM, Wil Dijkstra wrote:

> Now I better understand your datastructure & problem. Don't think it's 
> necessary to delve into Valentina. Just try this:
>
> repeat for each line myline in mydata
>   add 1 to myArray [word4 of myLine]
> end repeat
> put keys (myArray) into myList
> put the number of lines of myList
>
> It puts the number of different students into the message box
>
> Wil Dijkstra
>
>
> ----------
> From:   metacard-bounces at lists.runrev.com on behalf of Ray Horsley
> Reply To:       Discussions on Metacard
> Sent:   Tuesday, January 11, 2005 18:32
> To:     Discussions on Metacard
> Subject:        Re: Fast Searches
>
> Thanks for such quick reponses!
>
> Each line of my data represents an elementary school student's answer
> to a test question.  The line has about 18 tab delimited items in it
> (such as the question number, date it was answered, number of points
> it's worth, whether the student got it right or wrong, etc).  The 4th
> item is the studnet's personal info and looks something like
> Smith,John,William,12345.  Each student has anywhere from 10 to 20
> lines of data, and the goal is to get a total count of how many
> students there are.
>
> Richard's idea of the repeat for each structure is super fast, but I'm
> having a hard time applying it here to get the total student count.  I
> think the best way is to delve into the Valentina data base idea.  Any
> suggestions on how to get started with this are very much appreciated.
>
> Thanks,
>
>
> Ray Horsley
> Developer, LinkIt! Software
>
>
> On Tuesday, January 11, 2005, at 05:38  AM, Richard Gaskin wrote:
>
> > Ray Horsley wrote:
> >> I'm working with large amounts of data, say 50,000 tab delimited
> >> lines, where the 4th item in each line is the same for every 20 or 
> so
> >> lines.  Does anybody have a fast way of determining how many unique
> >> 4th items there are in the 50,000 lines?
> >> A repeat loop examining each line is certainly out of the question,
> >> and I've tried using various functions such as itemOffset after
> >> sorting ascending and descending, but this too is turning out to be
> >> kind of slow.
> >
> > I've had good luck using "repeat for each" on data sets up to 40,000
> > lines.  Sure, there's a pause, but it's not so bad -- give it a whirl
> > and you might be pleasantly surprised.
> >
> > --
> >  Richard Gaskin
> >  Fourth World Media Corporation
> >  ___________________________________________________________
> >  Ambassador at FourthWorld.com       http://www.FourthWorld.com
> > _______________________________________________
> > metacard mailing list
> > metacard at lists.runrev.com
> > http://lists.runrev.com/mailman/listinfo/metacard
> >
>
> _______________________________________________
> metacard mailing list
> metacard at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/metacard
>
>
> _______________________________________________
> metacard mailing list
> metacard at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/metacard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 4575 bytes
Desc: not available
Url : http://lists.runrev.com/pipermail/metacard/attachments/20050111/8d85f37f/attachment-0001.bin