how to compare 2 very large textfiles

Alex Tweedly alex at tweedly.net
Thu Oct 6 16:17:48 EDT 2011


Much faster.

I tried the original script (with typo fixed) on 7000 lines of varying 
length between 100 and 300 chars - took about 2 minutes to run. The 
array version (again with typo fixed) took around 100 msec.

-- Alex.

On 06/10/2011 20:16, Scott Rossi wrote:
> FWIW, I tried a quick test of Matthias's script using two fields with 5000
> lines of 256 chars each.  I tried using "i is not among the lines of" and "i
> is not in" with identical results.  Processing time was 1 min 6 secs in both
> cases (Mac Intel Core2 Duo).  Perhaps the array option posted is faster.
>
> Regards,
>
> Scott Rossi
> Creative Director
> Tactile Media, UX Design
>
>
>
> Recently, Michael Kann wrote:
>
>> Matthias,
>>
>> Your script should take a few seconds at most. There must be something else
>> going on to slow you down. If you want to post the script itself and a few
>> lines of data perhaps someone can figure it out.
>>
>> Mike
>>
>> --- On Wed, 10/5/11, Matthias Rebbe<matthias_livecode_150811 at m-r-d.de>  wrote:
>>
>> From: Matthias Rebbe<matthias_livecode_150811 at m-r-d.de>
>> Subject: how to compare 2 very large textfiles
>> To: "How to use LiveCode"<use-livecode at lists.runrev.com>
>> Date: Wednesday, October 5, 2011, 5:00 PM
>>
>> Hi,
>>
>> i need to compare two very large text files with about 5000 - 7000 lines each
>> with a lines size of up to 256 chars.
>>
>> I need to find out if there are lines missing in either file a or file b.
>>
>> What is the best way to do this with good speed?
>>
>> I tried to check each line in file a and if the line is in file b.
>> And after that, i check for each line in file b and try to find out
>> if the line is in file a.
>>
>> With large files it takes about 10 to 15 minutes to do the complete check.
>>
>> My script looks like this
>>
>> repeat for each line i in tTextA
>> if i is not among the lines of tTextB then put i&return after tMissingInB
>> end repeat
>>
>> repeat for each line i in tTextB
>> if i is not among the lines of tTextA then put i&retrurn after tMissingInA
>> end repeat
>>
>> Is there a better (faster) way?
>>
>> Regards,
>>
>> Matthias
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>





More information about the use-livecode mailing list