Fast Searches

Tue Jan 11 09:02:14 EST 2005

> Thanks for such quick reponses!
> 
> Each line of my data represents an elementary school student's answer
> to a test question.  The line has about 18 tab delimited items in it
> (such as the question number, date it was answered, number of points
> it's worth, whether the student got it right or wrong, etc).  The 4th
> item is the studnet's personal info and looks something like
> Smith,John,William,12345.  Each student has anywhere from 10 to 20
> lines of data, and the goal is to get a total count of how many
> students there are.
> 
> Richard's idea of the repeat for each structure is super fast, but I'm
> having a hard time applying it here to get the total student count.  I
> think the best way is to delve into the Valentina data base idea.  Any
> suggestions on how to get started with this are very much appreciated.
> 
> Thanks,
> 
> 
> Ray Horsley
> Developer, LinkIt! Software

You might try this. This doesn't use Valentia, but it should do the job
fairly quickly. 

 put fld "Mydata" into tempdata
 sort lines of tempdata by item 4 of each
 put 0 into nstudents
 put empty into prevstudent
 repeat for each line i in tempdata
   if item 4 of i <> prevstudent then
     add 1 to nstudents
     put item 4 of I into prevstudent
   end if
 end repeat

You put the data into a temporary variable so the operations on it will go
faster. The "repeat for each" structure is also very fast, then it is simply
a comparison job. Nothing increments unless a change is detected.

Let us know.

Regards,

Raymond E. Griffith

> 
> 
> On Tuesday, January 11, 2005, at 05:38  AM, Richard Gaskin wrote:
> 
>> Ray Horsley wrote:
>>> I'm working with large amounts of data, say 50,000 tab delimited
>>> lines, where the 4th item in each line is the same for every 20 or so
>>> lines.  Does anybody have a fast way of determining how many unique
>>> 4th items there are in the 50,000 lines?
>>> A repeat loop examining each line is certainly out of the question,
>>> and I've tried using various functions such as itemOffset after
>>> sorting ascending and descending, but this too is turning out to be
>>> kind of slow.
>> 
>> I've had good luck using "repeat for each" on data sets up to 40,000
>> lines.  Sure, there's a pause, but it's not so bad -- give it a whirl
>> and you might be pleasantly surprised.
>> 
>> -- 
>>  Richard Gaskin
>>  Fourth World Media Corporation
>>  ___________________________________________________________
>>  Ambassador at FourthWorld.com       http://www.FourthWorld.com
>> _______________________________________________
>> metacard mailing list
>> metacard at lists.runrev.com
>> http://lists.runrev.com/mailman/listinfo/metacard
>> 
> 
> _______________________________________________
> metacard mailing list
> metacard at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/metacard