how to compare 2 very large textfiles

Pete pete at mollysrevenge.com
Thu Oct 6 17:19:24 EDT 2011


Thanks for the report back on the speed Alex.  I guess its academic if the
speed is down to 100msecs but I'm wondering if a binary search technique
would be better or worse (assuming the lists were sorted of course).

How did you create the two lists for your test?  I'd like to try the binary
search but stuck with an easy way to generate two large files like that!

Pete
Molly's Revenge <http://www.mollysrevenge.com>




On Thu, Oct 6, 2011 at 1:17 PM, Alex Tweedly <alex at tweedly.net> wrote:

> Much faster.
>
> I tried the original script (with typo fixed) on 7000 lines of varying
> length between 100 and 300 chars - took about 2 minutes to run. The array
> version (again with typo fixed) took around 100 msec.
>
> -- Alex.
>
> On 06/10/2011 20:16, Scott Rossi wrote:
>
>> FWIW, I tried a quick test of Matthias's script using two fields with 5000
>> lines of 256 chars each.  I tried using "i is not among the lines of" and
>> "i
>> is not in" with identical results.  Processing time was 1 min 6 secs in
>> both
>> cases (Mac Intel Core2 Duo).  Perhaps the array option posted is faster.
>>
>> Regards,
>>
>> Scott Rossi
>> Creative Director
>> Tactile Media, UX Design
>>
>>
>>
>> Recently, Michael Kann wrote:
>>
>>  Matthias,
>>>
>>> Your script should take a few seconds at most. There must be something
>>> else
>>> going on to slow you down. If you want to post the script itself and a
>>> few
>>> lines of data perhaps someone can figure it out.
>>>
>>> Mike
>>>
>>> --- On Wed, 10/5/11, Matthias Rebbe<matthias_livecode_**150811 at m-r-d.de<matthias_livecode_150811 at m-r-d.de>>
>>>  wrote:
>>>
>>> From: Matthias Rebbe<matthias_livecode_**150811 at m-r-d.de<matthias_livecode_150811 at m-r-d.de>
>>> >
>>> Subject: how to compare 2 very large textfiles
>>> To: "How to use LiveCode"<use-livecode at lists.**runrev.com<use-livecode at lists.runrev.com>
>>> >
>>> Date: Wednesday, October 5, 2011, 5:00 PM
>>>
>>>
>>> Hi,
>>>
>>> i need to compare two very large text files with about 5000 - 7000 lines
>>> each
>>> with a lines size of up to 256 chars.
>>>
>>> I need to find out if there are lines missing in either file a or file b.
>>>
>>> What is the best way to do this with good speed?
>>>
>>> I tried to check each line in file a and if the line is in file b.
>>> And after that, i check for each line in file b and try to find out
>>> if the line is in file a.
>>>
>>> With large files it takes about 10 to 15 minutes to do the complete
>>> check.
>>>
>>> My script looks like this
>>>
>>> repeat for each line i in tTextA
>>> if i is not among the lines of tTextB then put i&return after tMissingInB
>>> end repeat
>>>
>>> repeat for each line i in tTextB
>>> if i is not among the lines of tTextA then put i&retrurn after
>>> tMissingInA
>>> end repeat
>>>
>>> Is there a better (faster) way?
>>>
>>> Regards,
>>>
>>> Matthias
>>>
>>
>>
>> ______________________________**_________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>
>>
>>
>
> ______________________________**_________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>
>
>



More information about the use-livecode mailing list