Comparing big lists

Gregory Lypny gregory.lypny at videotron.ca
Sat Apr 27 16:14:01 EDT 2002


Thanks for the suggestion, Scott.  I'll give it a shot.  I've also tried 
looping over the lines of bigList (i.e., a nested repeat), simply using 
the 'in' operator:  if x is in y, then...  It takes about 6 minutes on a 
modest (300 mHz) iBook running OS X, but I'm hoping for an improvement,

     Regards,

          Greg

On 27/4/2002 12:08 PM, metacard-request at lists.runrev.com wrote: 

>Message: 2
>Date: Fri, 26 Apr 2002 12:48:53 -0600 (MDT)
>From: Scott Raney <raney at metacard.com>
>To: metacard at lists.runrev.com
>Subject: Re: Comparing big lists
>Reply-To: metacard at lists.runrev.com
>
>On: Thu, 25 Apr 2002 Gregory Lypny <gregory.lypny at videotron.ca> wrote:
>
>>   Thought I would pick your brains on the topic of comparing two big
>> lists.  Both are tab delimited.  bigList has about 100,000 lines and
>> 6 items (columns) per line.  smallList is about 15,000 lines and 2
>> items per line.  I want to identify the lines in bigList in which
>> the third item is the same as the second item in a line in
>> smallList, and then pull out the intersection.  I used something
>> like this, which works fine.
>
>>     set the itemDelimiter to tab
>>               repeat for each line j of smallList
>>                    put lineOffset(item 2 of j, bigList) into thisLine
>>                    if thisLine is not 0 then put j & tab & \
>>                         line thisLine of bigList  & return after mergedList
>>               end repeat
>>     delete last character of mergedList  -- Get rid of the trailing Return
>
>> Using the lineOffset function seemed the obvious choice to me, but I'm
>> also interested in other approaches.
>
>LineOffset on such a big variable is going to be pretty expensive.
>Another option would be to us split to build an array out of smallList
>and the loop over each line in big list and see if there is an array
>index for it.  Split takes awhile and will use up a good bit of
>memory, but makes the lookups *much* faster.  You could save some of
>that space by building up an array of just the relevant items in one
>list or the other by looping over the lines and creating one array
>index for each.
>  Regards,
>    Scott
>
>>     Regards,
>>         Greg
>
>********************************************************
>Scott Raney  raney at metacard.com  http://www.metacard.com
>MetaCard: You know, there's an easier way to do that...



More information about the metacard mailing list