how to compare 2 very large textfiles

Scott Rossi scott at tactilemedia.com
Thu Oct 6 15:16:56 EDT 2011


FWIW, I tried a quick test of Matthias's script using two fields with 5000
lines of 256 chars each.  I tried using "i is not among the lines of" and "i
is not in" with identical results.  Processing time was 1 min 6 secs in both
cases (Mac Intel Core2 Duo).  Perhaps the array option posted is faster.

Regards,

Scott Rossi
Creative Director
Tactile Media, UX Design



Recently, Michael Kann wrote:

> Matthias,
> 
> Your script should take a few seconds at most. There must be something else
> going on to slow you down. If you want to post the script itself and a few
> lines of data perhaps someone can figure it out.
> 
> Mike
> 
> --- On Wed, 10/5/11, Matthias Rebbe <matthias_livecode_150811 at m-r-d.de> wrote:
> 
> From: Matthias Rebbe <matthias_livecode_150811 at m-r-d.de>
> Subject: how to compare 2 very large textfiles
> To: "How to use LiveCode" <use-livecode at lists.runrev.com>
> Date: Wednesday, October 5, 2011, 5:00 PM
> 
> Hi,
> 
> i need to compare two very large text files with about 5000 - 7000 lines each
> with a lines size of up to 256 chars.
> 
> I need to find out if there are lines missing in either file a or file b.
> 
> What is the best way to do this with good speed?
> 
> I tried to check each line in file a and if the line is in file b.
> And after that, i check for each line in file b and try to find out
> if the line is in file a.
> 
> With large files it takes about 10 to 15 minutes to do the complete check.
> 
> My script looks like this
> 
> repeat for each line i in tTextA
> if i is not among the lines of tTextB then put i &return after tMissingInB
> end repeat
> 
> repeat for each line i in tTextB
> if i is not among the lines of tTextA then put i &retrurn after tMissingInA
> end repeat
> 
> Is there a better (faster) way?
> 
> Regards,
> 
> Matthias






More information about the use-livecode mailing list