Fast Searches
Raymond E. Griffith
rgriffit at ctc.net
Tue Jan 11 09:02:14 EST 2005
> Thanks for such quick reponses!
>
> Each line of my data represents an elementary school student's answer
> to a test question. The line has about 18 tab delimited items in it
> (such as the question number, date it was answered, number of points
> it's worth, whether the student got it right or wrong, etc). The 4th
> item is the studnet's personal info and looks something like
> Smith,John,William,12345. Each student has anywhere from 10 to 20
> lines of data, and the goal is to get a total count of how many
> students there are.
>
> Richard's idea of the repeat for each structure is super fast, but I'm
> having a hard time applying it here to get the total student count. I
> think the best way is to delve into the Valentina data base idea. Any
> suggestions on how to get started with this are very much appreciated.
>
> Thanks,
>
>
> Ray Horsley
> Developer, LinkIt! Software
You might try this. This doesn't use Valentia, but it should do the job
fairly quickly.
put fld "Mydata" into tempdata
sort lines of tempdata by item 4 of each
put 0 into nstudents
put empty into prevstudent
repeat for each line i in tempdata
if item 4 of i <> prevstudent then
add 1 to nstudents
put item 4 of I into prevstudent
end if
end repeat
You put the data into a temporary variable so the operations on it will go
faster. The "repeat for each" structure is also very fast, then it is simply
a comparison job. Nothing increments unless a change is detected.
Let us know.
Regards,
Raymond E. Griffith
>
>
> On Tuesday, January 11, 2005, at 05:38 AM, Richard Gaskin wrote:
>
>> Ray Horsley wrote:
>>> I'm working with large amounts of data, say 50,000 tab delimited
>>> lines, where the 4th item in each line is the same for every 20 or so
>>> lines. Does anybody have a fast way of determining how many unique
>>> 4th items there are in the 50,000 lines?
>>> A repeat loop examining each line is certainly out of the question,
>>> and I've tried using various functions such as itemOffset after
>>> sorting ascending and descending, but this too is turning out to be
>>> kind of slow.
>>
>> I've had good luck using "repeat for each" on data sets up to 40,000
>> lines. Sure, there's a pause, but it's not so bad -- give it a whirl
>> and you might be pleasantly surprised.
>>
>> --
>> Richard Gaskin
>> Fourth World Media Corporation
>> ___________________________________________________________
>> Ambassador at FourthWorld.com http://www.FourthWorld.com
>> _______________________________________________
>> metacard mailing list
>> metacard at lists.runrev.com
>> http://lists.runrev.com/mailman/listinfo/metacard
>>
>
> _______________________________________________
> metacard mailing list
> metacard at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/metacard
More information about the metacard
mailing list